-***How to allow some jobs to claim the whole machine instead of one slot*** +{section: How to allow some jobs to claim the whole machine instead of one slot} Known to work with Condor version: 7.2 @@ -6,44 +6,57 @@ First, you would have whole-machine jobs advertise themselves as such with something like the following in the submit file: - +RequiresWholeMachine = True +{code} ++RequiresWholeMachine = True +{endcode} Then put the following in your Condor configuration file. Make sure it either comes after the other attributes that this appends to (such as START) or that you merge the definitions together. - # require that whole-machine jobs only match to Slot1 - START = ($(START)) && (TARGET.RequiresWholeMachine =!= TRUE || SlotID == 1) +{code} - # have the machine advertise when it is running a whole-machine job - STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS) RequiresWholeMachine +#require that whole-machine jobs only match to Slot1 +START = ($(START)) && (TARGET.RequiresWholeMachine =!= TRUE || SlotID == 1) - # Export the job expr to all other slots - STARTD_SLOT_EXPRS = RequiresWholeMachine +# have the machine advertise when it is running a whole-machine job +STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS) RequiresWholeMachine - # require that no single-cpu jobs may start when a whole-machine job is running - START = ($(START)) && (SlotID == 1 || Slot1_RequiresWholeMachine =!= True) +# Export the job expr to all other slots +STARTD_SLOT_EXPRS = RequiresWholeMachine - # suspend existing single-cpu jobs when there is a whole-machine job - SUSPEND = ($(SUSPEND)) || (SlotID != 1 && Slot1_RequiresWholeMachine =?= True) +# require that no single-cpu jobs may start when a whole-machine job is running +START = ($(START)) && (SlotID == 1 || Slot1_RequiresWholeMachine =!= True) + +# suspend existing single-cpu jobs when there is a whole-machine job +SUSPEND = ($(SUSPEND)) || (SlotID != 1 && Slot1_RequiresWholeMachine =?= True) + +{endcode} Instead of suspending the single-cpu jobs while the whole-machine job runs, you could suspend the whole-machine job while the single-cpu jobs finish. Example: - # advertise the activity of each slot into the ads of the other slots, - # so the SUSPEND expression can see it - STARTD_SLOT_EXPRS = $(STARTD_SLOT_EXPRS) Activity +{code} +# advertise the activity of each slot into the ads of the other slots, +# so the SUSPEND expression can see it +STARTD_SLOT_EXPRS = $(STARTD_SLOT_EXPRS) Activity - # Suspend the whole-machine job until the other slots are empty - SUSPEND = ($(SUSPEND)) || (SlotID == 1 && Slot1_RequiresWholeMachine =?= True && \ +# Suspend the whole-machine job until the other slots are empty +SUSPEND = ($(SUSPEND)) || (SlotID == 1 && Slot1_RequiresWholeMachine =?= True && \ (Slot2_Activity =?= "Busy" || Slot3_Activity =?= "Busy" || ... ) ) +{endcode} + You might want to steer whole-machine jobs towards machines that are completely vacant, especially on the slots only for single-cpu jobs. Here's a simple example that just avoids machines with a high load: - NEGOTIATOR_PRE_JOB_RANK = -TARGET.LoadAvg*(MY.RequiresWholeMachine =?= True) +{code} +NEGOTIATOR_PRE_JOB_RANK = -TARGET.LoadAvg*(MY.RequiresWholeMachine =?= True) +{endcode} A more complicated expression would look at the attributes of the other slots when forming the rank: - STARTD_SLOT_EXPRS = $(STARTD_SLOT_EXPRS) Activity +{code} +STARTD_SLOT_EXPRS = $(STARTD_SLOT_EXPRS) Activity - NEGOTIATOR_PRE_JOB_RANK = (MY.RequiresWholeMachine =?= True) * \ +NEGOTIATOR_PRE_JOB_RANK = (MY.RequiresWholeMachine =?= True) * \ (Slot2_Activity =!= "Busy" + Slot3_Activity =!= "Busy" + ... ) +{endcode}