Known to work with Condor version: 7.4 -The simplest way to achieve this is to simply set NUM_CPUS=1 so that each machine just advertises a single slot. However, this prevents you from supporting a mix of single-cpu and whole-machine jobs. The following example supports both single-cpu and whole-machine jobs. +The simplest way to achieve this is this: + +{code} +SLOT_TYPE_1 = cpus=100% +NUM_SLOTS_TYPE_1 = 1 +{endcode} + +With that configuration, each machine just advertises a single slot. However, this prevents you from supporting a mix of single-cpu and whole-machine jobs. The following example supports both single-cpu and whole-machine jobs. First, you would have whole-machine jobs advertise themselves as such with something like the following in the submit file: @@ -119,3 +126,41 @@ The above policies rely on job suspension. Should the jobs be "charged" for the time they spend in a suspended state? This affects the user's fair-share priority and the accumulated number of hours reported by condor_userprio. As of Condor 7.4, the default behavior is to charge the jobs for time they spend in a suspended state. There is a configuration variable, NEGOTIATOR_DISCOUNT_SUSPENDED_RESOURCES that can be used to get the opposite behavior. Should the whole-machine slot charge more than the single-core slots? The policy for this is determined by =SlotWeight=. By default, this is equal to the number of cores associated with the slot, so usage reported in condor_userprio will count the whole-machine slot on an 8-core machine as 8 times the usage reported for a single-core slot. + +There are a few things to keep in mind when interpreting the output of condor_status. The =TotalCpus= and =TotalMemory= attributes are double what the machine actually has, because we are double-counting these resources across the single-core and whole-machine slots. + +When looking at condor_status, the extra slot representing the whole machine is visible. Notice that it appears in the Owner state, because the start expression we have configured is false unless presented with a whole-machine job. We could modify the IS_OWNER expression to make the slot appear as Unclaimed rather than Owner, but we find that Owner is a nice reminder that this slot is special and is not available to run "normal" jobs. + +{verbatim} +Name OpSys Arch State Activity LoadAv Mem ActvtyTime + +slot1@execute.node LINUX X86_64 Unclaimed Idle 0.750 2006 0+00:00:04 +slot2@execute.node LINUX X86_64 Unclaimed Idle 0.000 2006 0+00:00:05 +slot3@execute.node LINUX X86_64 Unclaimed Idle 0.000 2006 0+00:00:06 +slot4@execute.node LINUX X86_64 Unclaimed Idle 0.000 2006 0+00:00:07 +slot5@execute.node LINUX X86_64 Unclaimed Idle 0.000 2006 0+00:00:08 +slot6@execute.node LINUX X86_64 Unclaimed Idle 0.000 2006 0+00:00:09 +slot7@execute.node LINUX X86_64 Unclaimed Idle 0.000 2006 0+00:00:10 +slot8@execute.node LINUX X86_64 Unclaimed Idle 0.000 2006 0+00:00:03 +slot9@execute.node LINUX X86_64 Owner Idle 0.000 16054 0+00:00:08 +{endverbatim} + +To filter out the whole-machine slot, one could use a constraint such as the following: + +{code} +condor_status -constraint 'IsWholeMachineSlot =!= True' +{endcode} + +When the whole-machine slot is claimed, the other slots will appear in the Owner state, because they are configured to not start any new jobs while the whole-machine slot is claimed. Again, we could make them appear as Unclaimed by changing the IS_OWNER expression, but the Owner state serves as a useful reminder that these slots are reserved during this time. + +{verbatim} +slot1@execute.node LINUX X86_64 Owner Idle 0.870 2006 0+00:00:04 +slot2@execute.node LINUX X86_64 Owner Idle 0.000 2006 0+00:00:05 +slot3@execute.node LINUX X86_64 Owner Idle 0.000 2006 0+00:00:06 +slot4@execute.node LINUX X86_64 Owner Idle 0.000 2006 0+00:00:07 +slot5@execute.node LINUX X86_64 Owner Idle 0.000 2006 0+00:00:08 +slot6@execute.node LINUX X86_64 Owner Idle 0.000 2006 0+00:00:09 +slot7@execute.node LINUX X86_64 Owner Idle 0.000 2006 0+00:00:10 +slot8@execute.node LINUX X86_64 Owner Idle 0.000 2006 0+00:00:03 +slot9@execute.node LINUX X86_64 Claimed Busy 0.000 16054 0+00:00:04 +{endverbatim}