{section: Group Quota Design} ================= {subsection: Motivating Scenarios} ??? What's some good use cases here What didn't the old code do that the new code can? Some questions we'd like customer use cases to address: *: What is the semantic of accounting group quota? *:: That is: what does a group quota regulate/limit? *:: What is the 'unit' associated with a quota? *:: http://erikerlandson.github.com/blog/2012/11/15/rethinking-the-semantics-of-group-quotas-and-slot-weights-claim-capacity-model/ *: What does it mean for groups to be in a hierarchy? *:: How does a parent's quota relate to child quotas? *:: How do 'sibling' groups relate to each other, their parent, and their children (if any)? {subsection: High Level Design and Definitions} The HGQ design is intended to allow administrator to restrict the aggregrate number of slots running jobs submitted by groups of users. These sets of users are organized into hierarchical groups, with the "none" group being the name of the root. The admin is expected to assign a quota to every leaf and interior node in the tree, except for the root. The assigned quotas can be absolute numbers or a floating point number from 0 to 1, which represents a percentage of the immediate parent. If absolute, it represents a weighted number of slots, where the each slot is multiplied by a configurable weight, which defaults to number of cores. All groups named must be predeclared in the config file. Note the quota is independent of user priority. {subsection: Definintions} Can we get crisp definitions of each of the fields in the GroupEntry structure? Here is some annotation from the meeting on fields that didn't already have in-code doc: {code} // these are set from configuration string name; double config_quota; // Could be static (>=1) or dynamic (0config_quota). It may also be the value translated from configured quota to actual (possibly weighted) slot quantity (entry->quota). The quantity finally assigned to a group, after quota computation and surplus sharing and fractional-quota distribution, is referred to as 'allocated' (entry->allocated). {subsection: Algorithm} First, the code builds up a data structure which describes each group, it's position in the tree, the administratively configured quota, whether it is static or dynamic quota, whether this group accepts_surplus or autoregroup. For each group, the current weighted usage is fetched from the accountant, as is the current userprio. The number of running and idle jobs is copied from the submitter ad from each submitter, and summed into the corresponding group structure. Note that the number of running jobs also includes jobs running in flocked-to pools. Each group also contains a list of all the related submitter ads. If autoregroup is on, the submitters are also appended to the root's list of submitter ads. After (weighted) slot quotas are assigned to all the group entries, surplus sharing is computed for all groups in the hierarchy configured to accept surplus. Following surplus sharing, when slot weighting is not enabled, any fractional quota allocations are consolidated and distributed in a round robin fashion. {subsubsection: Surplus Sharing} The primary purpose of surplus sharing is to allow group quotas to "float" locally based on demand. For example, if one configures group A, A.B, and A.C, where gropup A does not share surplus, but A.B and A.C do share surplus, then A.B and A.C can float against each other, while maintaining the constraint that quota(A.B) + quota(A.C) <= quota(A). Surplus quota is always shared at the lowest possible level before being passed upwards. The basic principle for surplus sharing is: surplus quota is distributed among sibling groups in proportion to assigned quota. For example, if group A has twice the quota of group B, group A will be awarded twice the surplus. Some additional points: *: available surplus consists of any surplus shared from the level above in the hierarchy, plus any surplus coming up from sibling sub-trees *: any groups with surplus sharing not enabled do not participate in surplus distribution *: if a group does not need all of its potential surplus, any it does not use will be shared among remaining participating groups *: the parent group of siblings participates in sharing, effectively as another sibling *: any surplus unused after sharing among siblings (and parent) is sent up the hierarchy to be shared at the level above {subsubsection: Fractional Quota Consolidation} When slot weighting is not enabled, fractional quota values for groups are consolidated and distributed in round robin fashion to ensure that all quotas are integer values. *: available remainder for consolidation consists of remainder coming from upper level in hierarchy, combined with any remainder coming up from sibling subtrees *: remainders are not accepted by groups not accepting surplus *: siblings having received remainder least recently are favored in round robin - siblings are ordered by time of last receipt of a remainder *: remainder unused at a level is sent up to parent {subsubsection: Allocation Rounds} Allocation rounds are a method to address the scenario where jobs submitted under an accounting group do not satisfy mutual job/slot requirements for enough slots to achieve their quota. When GROUP_QUOTA_MAX_ALLOCATION_ROUNDS > 1, then each group that has not met its allocated quota has its 'requested' value re-set to be equal to whatever its current (weighted) usage is. (i.e. it is assumed that no further jobs under that group will match slots until next negotiation cycle). This frees up the unused quota for other groups that may be able to use it as surplus. The following steps are iterated GROUP_QUOTA_MAX_ALLOCATION_ROUNDS times: 1: (starting after 1st round) re-set 'requested' values to current usage 1: (re)compute quota allocations 1: allow all groups to renegotiate {subsubsection: Round Robin Rate} Round robin rate is a method to address the 'overlapping effective pool' problem: this is a scenario where the jobs in two or more accounting groups are in fact competing for a subset of the total available resources. For example, if a pool has 100 linux machines and 100 windows machines, and 200 jobs from 2 accounting groups are competing only for the linux machines. Without intervention, the first group to negotiate can acquire all 100 linux machines and starve the 2nd group. To address this problem, there is a loop around negotiation that operates like so: 1: (initialize all quota limits at zero) 1: increase each quota limit by the round robin rate (up to allocated quota) 1: run negotiation with those limits 1: repeat Round robin rate is convigured via: GROUP_QUOTA_ROUND_ROBIN_RATE, which defaults to "infinity", which emulates legacy behavior. (note: There is some interest in developing alternative approaches to allocation rounds and round robin rate that require fewer nested loops on top of basic negotiation) {subsubsection: accounting group negotiation order} we sort the submitters in "starvation order", by GROUP_SORT_EXPR, defaults to the ratio of current group usage / configured group quota Finally, we negotiate with each group in that order, with a quota limited as calculated above. {subsection: Questions} *: How common is it to have demand (submitters) in interior nodes? *:: some downstream customers are known to be interested in jobs submitted against interior nodes *: What about non-homogenous pools? *: Is there a way to do this without relying on the submitter ad's # of idle/running jobs? *:: There may be alternative approaches to surplus-sharing to address this behavior, but it is an open question *: How should this behave in the face of flocking? *: Weighted slots? *:: I have a few thoughts on how weighted slots should be thought about here: *::: http://erikerlandson.github.com/blog/2012/11/15/rethinking-the-semantics-of-group-quotas-and-slot-weights-claim-capacity-model/ *:: and here: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3435