Introduction to Consumption Policies

A Consumption Policy is a policy expression that resides on a partitionable slot classad, as advertised by the startd on an HTCondor execute node, which governs the amount of resources used by a job match against that slot.

Each kind of resource (or resource "asset") has a corresponding Consumption Policy. In a typical partitionable slot (p-slot), three resources are always defined: Cpus, Memory and Disk, which might advertise Consumption Policies as configured in this simple example:

# enable use of consumption policies
CONSUMPTION_POLICY = True

# define a simple consumption policy
# (note, "target" refers to the scope of the candidate job classad being considered for a match)
CONSUMPTION_CPUS = target.RequestCpus
CONSUMPTION_MEMORY = target.RequestMemory
CONSUMPTION_DISK = target.RequestDisk

The HTCondor negotiator matchmaking logic is aware of a Consumption Policy (CP) detected on a p-slot. When a job matches against a p-slot with a CP, the amount of each resource dictated by its consumption policy is deducted from that p-slot. The p-slot then remains available to match with another job. In other words: Consumption Policies allow multiple jobs to be matched against a single partitionable slot during a negotiation cycle. When the HTCondor startd is allocating a claim to a new match, the same Consumption Policy expressions are also evaluated to determine the resources that are subtracted from the partitionable slot (and added to the corresponding new dynamic slot).

Motivation

A partitionable slot presents resources that can service multiple jobs (frequently, all the resources available on a particular execute node). However, the negotiator historically only matches one job to a slot per negotiation cycle -- that is, only a single request can be matched to a p-slot during each cycle, regardless of how many requests might be serviced.

current solutions

The two existing mechanisms for assigning p-slot resources to multiple jobs are:
  1. Assign resources over multiple negotiation cycles
  2. Allow a scheduler daemon to obtain multiple claims against a p-slot (by enabling CLAIM_PARTITIONABLE_LEFTOVERS). This is also referred to as "scheduler splitting."

The chief disadvantage of (1) is that it limits the rate at which a pool's partitionable resources can be allocated, particularly if the interval between negotiation cycles is long. For example, if each p-slot can support 10 jobs, it will require 10 negotiator cycles to fully allocate all the slots resources to those jobs. If the negotiation interval is set to 5 minutes, that means it might take nearly an hour (50 min) to fully utilize the pool's resources.

Using mechanism (2) -- scheduler splitting -- substantially increases the potential rate at which p-slot resources can be allocated, because the scheduler can obtain multiple claims from the slot in a faster loop that isn't rate limited by the negotiation interval.

There are limitations to scheduler splitting. The first is that, unlike the negotiator, the scheduler does not have access to the Concurrency Limit accounting, and so this method cannot work when Concurrency Limits are under consideration.

Additionally, jobs from any different schedulers will not have access to any remaining resources until the next negotiation cycle.

Lastly, in a situation where accounting group quotas and/or submitter shares are smaller than a p-slot's slot weight, the negotiator will never match a p-slot to those requests, and so the scheduler will never have access to the slot so it can execute slot splitting. This results in the possibility of accounting group starvation.

Consumption Policies

Consumption Policies address the limitations described above in the following ways.

Fast resource allocation: When the HTCondor negotiator matches a job against a partitionable slot configured with a Consumption Policy, it will deduct the resource assets (cpu, memory, etc) from that p-slot and keep it in the list. Therefore, a p-slot can be matched against multiple jobs in the same negotiation cycle. This allows p-slots to be fully loaded in a single cycle, instead of matching a single job per cycle. Because this matching happens in the negotiator, it may also be referred to as "negotiator splitting"

Concurrency Limits: The negotiator has access to all Concurrency Limit accounting, and so negotiator splitting via Consumption Policies works properly with all Concurrency Limits.

Multiple schedulers: Because the negotiator has access to jobs from all schedulers, Consumption Policies allow a partitionable slot to service jobs from multiple schedulers in a single negotiation cycle.

Accounting Group Quotas: The cost of matching a job against a slot is traditionally the value of the SlotWeight expression. In a scenario where the slot weights of available p-slots are greater than an accounting group's quota, the jobs in that accounting group will be starved. This kind of scenario becomes increasingly likely in a fine-grained accounting group configuration involving many smaller quotas, or when machines with larger amounts of resources and correspondingly large slot weights are in play.

When a p-slot with a Consumption Policy is matched then its match-cost is the change in the SlotWeight value, from before the match to after. This means that a match is only charged for the portion of the p-slot that it actually used (as measured by the SlotWeight expression), and so p-slots with large SlotWeight values can generally be used by accounting groups with smaller quotas (and likewise by submitters having smaller fairshare values).

trade-offs

One implication of negotiator splitting (Consumption Policies) is that the negotiator is responsible for matching all requests. In an large-scale pool, this may make the negotiation cycle the rate-limiter for resource allocation. A pool can support multiple schedulers, so scheduler splitting (CLAIM_PARTITIONABLE_LEFTOVERS) can scale by increasing the number of schedulers running on the pool.

Consumption Policies and Accounting

match cost

The match cost for a job against a slot with a Consumption Policy is the change in the SlotWeight value from before the match to after. It is this match cost value that is added to the corresponding submitter's usage in the HTCondor accountant. A useful way to think about accounting with Consumption Policies is that the standard role of SlotWeight is replaced with the change in SlotWeight.

For example, consider the memory-driven consumption policy above:

CONSUMPTION_MEMORY = quantize(target.RequestMemory, {1024})
SLOT_WEIGHT = floor(Memory / 1024)

Note that a job may acquire more than one of the "chunks" implied by the above policy. Again, assume the slot has 4096MB of memory, for an initial slot weight of 4. Suppose a job acquires 2048MB of that memory. Then the match will reduce the partitionable slot's weight from 4 to 2, for a match cost of 4-2=2. The submitter's usage will increase by 2.

concurrency limits and accounting groups

Consumption Policies and negotiator p-slot splitting properly respect both accounting group quotas and concurrency limits.

Heterogeneous Resource Models

Consumption Policies are configurable on a per-slot basis, which makes it straightforward to support multiple resource consumption models on a single pool.

For example, the following is a cpu-centric resource consumption policy:

# cpus will generally be exhausted first:
CONSUMPTION_CPUS = target.RequestCpus
CONSUMPTION_MEMORY = quantize(target.RequestMemory, {32})
CONSUMPTION_DISK = quantize(target.RequestDisk, {128})
SLOT_WEIGHT = Cpus

Another slot might be configured with a memory-centric policy:

# memory will generally be exhausted first:
CONSUMPTION_CPUS = 1
CONSUMPTION_MEMORY = quantize(target.RequestMemory, {1024})
CONSUMPTION_DISK = quantize(target.RequestDisk, {128})
SLOT_WEIGHT = floor(Memory / 1024)

Note that the slot weight expression is typically configured to correspond to the "most limiting" resource, and furthermore behaves as a measure of the number of potential matches remaining on the partitionable slot.

Mixed Pools

Consumption Policy logic is enabled by attributes defined on individual partitionable slots, and is detected on a per-slot basis. This allows HTCondor pools to configure slots having Consumption Policies along side slots without them.

Additionally, it is valid to configure slots having Consumpion Policies on the same pool with slots using scheduler splitting (CLAIM_PARTITIONABLE_LEFTOVERS).

NOTE: Configuring the same execute node with both Consumption Policies and CLAIM_PARTITIONABLE_LEFTOVERS is not supported. If an execute node enables Consumption Policies, it cannot enable CLAIM_PARTITIONABLE_LEFTOVERS, and vice versa.

Consumption Policy Examples

In the preceding discussion, examples of a cpu-centric and a memory-centric Consumption Policy were provided. A few other examples are listed here.

Extensible Resources

All resource assets are required to have a Consumption Policy configured. This includes Extensible Resources, if any are also present:
# declare an extensible resource for a claim-based consumption policy
MACHINE_RESOURCE_tokens = 3
# always consume 1 token, and none of anything else
CONSUMPTION_TOKENS = 1
CONSUMPTION_CPUS = 0
CONSUMPTION_MEMORY = 0
CONSUMPTION_DISK = 0
# define cost in terms of available tokens for serving jobs
SLOT_WEIGHT = Tokens

Handle missing request attributes

RequestXxx attributes are not always guaranteed to be present, so Consumption Policy expressions should take this into account. For example a policy that involves Extensible Resources cannot assume jobs will be requesting such a resource.
# declare an extensible resource
MACHINE_RESOURCE_actuators = 8
# policy for actuators that interprets missing request as 'zero', which is a good idiom for extensibe resources that many jobs are likely to not use or care about
CONSUMPTION_ACTUATORS = ifThenElse(target.RequestActuators =!= undefined, target.RequestActuators, 0)

emulate a static slot

This example uses a consumption policy to emulate static slot behavior
# consume all resources - emulate static slot
CONSUMPTION_CPUS = TotalSlotCpus
CONSUMPTION_MEMORY = TotalSlotMemory
# Disk is unreliable -- TotalSlotDisk != Disk even on a virgin slot
CONSUMPTION_DISK = floor(0.9 * TotalSlotDisk)
# define weight to be 1, since this slot supports exactly one match
SLOT_WEIGHT = 1

Multi-Centric Policy

Which resource asset is "most limiting" might be contingent on job requirements. It is possible to define slot weight to take this into account, as in this example:
# Either Cpus or Memory might be limiting, depending on job requirements
CONSUMPTION_CPUS = target.RequestCpus
CONSUMPTION_MEMORY = quantize(target.RequestMemory, {256})
CONSUMPTION_DISK = quantize(target.RequestDisk, {128})
# define slot weight as minimum of remaining-match estimate based on either cpus or memory:
SLOT_WEIGHT = ifThenElse(Cpus < floor(Memory/256), Cpus, floor(Memory/256))