How to suspend jobs in place of preemption




NOTE: Current versions of HTCondor include the command-line tool condor_suspend to easily suspend jobs. See the condor_suspend manual page.

The below is kept just for historical purposes





Known to work with HTCondor version: 7.0

HTCondor can suspend a process and then later resume right where it left off. This is similar to standard universe checkpointing, except it cannot move the suspended job from one machine to another. The mechanism works with all types of jobs, without any special preparation of the job.

There are two cases where you might prefer suspension in place of killing jobs. One is preemption caused by other activity on the machine (e.g. a user coming back to their desktop). The other is preemption caused by higher priority jobs getting matched to machines that are already busy. Both of these cases are covered below.

How to suspend jobs instead of killing them when higher priority jobs need to run

There is a section in the HTCondor Manual on this topic here. The example in the manual requires that you divide jobs into two classes: high priority and low priority. When the high priority jobs arrive on a machine, existing low priority jobs get suspended.

Below is an example in which the relative priority of the jobs to be suspended and the other class of jobs is left up to the usual mechanisms of startd RANK and user priority. You still have to divide the jobs into two classes: normal jobs and those that should be suspended when they are preempted by normal jobs. This is useful, for example, when you have a special class of jobs which are long running and not checkpointable, but you don't want them to unfairly hog the pool.

# You may want to advertise double the amount of system memory
# if you have enough virtual memory to allow the foreground job
# to consume all of memory while the suspended job gets pushed
# into swap memory.  There is currently no convenient way to
# tell HTCondor you want to oversubscribe your memory, so you
# have to hard-code the amount of memory you want to advertise
# by uncommenting and filling in the following:
# Memory = TWICE_YOUR_SYSTEM_MEMORY

NUM_CPUS = 2

# So that the suspension slot can see the state
# of the other slot, we need to have some things
# advertised about each slot in the ClassAds of
# all the other slots on the same machine:
STARTD_SLOT_EXPRS = $(STARTD_SLOT_EXPRS) State, CurrentRank

# For informational purposes, put IsSuspensionSlot
# in the startd ClassAd:
STARTD_ATTRS = $(STARTD_ATTRS) IsSuspensionSlot

# Slot 1 is the "normal" batch slot
SLOT1_IsSuspensionSlot = False
# Slot 2 suspends its job, rather than preempting it
SLOT2_IsSuspensionSlot = True


START    = SlotID == 1 && ($(SLOT1_START)) || \
           SlotID == 2 && ($(SLOT2_START))
CONTINUE = SlotID == 1 && ($(SLOT1_CONTINUE)) || \
           SlotID == 2 && ($(SLOT2_CONTINUE))
PREEMPT  = SlotID == 1 && ($(SLOT1_PREEMPT)) || \
           SlotID == 2 && ($(SLOT2_PREEMPT))
SUSPEND  = SlotID == 1 && ($(SLOT1_SUSPEND)) || \
           SlotID == 2 && ($(SLOT2_SUSPEND))


# The purpose of the following expression is to prevent a
# job from starting on slot 1 if it has less priority to run
# than the job already running on slot 2, because once we let
# a job run on slot 1, the slot 2 job will be suspended.
# This expression refers to attributes that are only defined
# when requirements are being evaluated by the Negotiator:
# SubmittorPrio [sic] and RemoteUserPrio

SLOT1_HAS_PRIO = SubmittorPrio =?= UNDEFINED || \
                vm2_RemoteUserPrio =?= UNDEFINED || \
                SubmittorPrio < 1.2 * vm2_RemoteUserPrio || \
                vm2_CurrentRank =?= UNDEFINED || \
                MY.Rank > vm2_CurrentRank

# Slot 1 is a normal execution slot
SLOT1_START    = TARGET.IsSuspensionJob =!= true && ($(SLOT1_HAS_PRIO))
SLOT1_CONTINUE = True
SLOT1_PREEMPT  = False
SLOT1_SUSPEND  = False

# Slot 2 is for jobs that get suspended while slot 1 is busy
SLOT2_START    = TARGET.IsSuspensionJob =?= true
SLOT2_CONTINUE = slot1_State =?= "Unclaimed" || slot1_State =?= "Owner"
SLOT2_PREEMPT  = FALSE
SLOT2_SUSPEND  = slot1_State =?= "Claimed"

To submit a suspension job, you could put something like the following in your submit file:

+IsSuspensionJob = True
requirements = TARGET.IsSuspensionSlot

The example policy above does not prevent preemption of suspension jobs by other suspension jobs. It only prevents preemption of suspension jobs by other normal jobs. If you want to prevent that, you could do something like this:

# Do not preempt suspension jobs (for up to 24 hours)
MaxJobRetirementTime = (MY.IsSuspensionSlot =?= True) * 3600 * 24

How to suspend jobs instead of killing them when the machine is busy

You just need to make sure WANT_SUSPEND and SUSPEND are true in the cases where you want suspension. The following example is similar to the default policy that ships in HTCondor's configuration file. It has been made more aggressive in what types of jobs it suspends (everything) and for how long it suspends them before giving up and killing them.

KeyboardBusy            = (KeyboardIdle < $(MINUTE))
# Suspend jobs for up to one day
MaxSuspendTime          = (3600 * 24)
ContinueIdleTime        =  60 * $(MINUTE)


# Suspend jobs if:
# 1) the keyboard has been touched, OR
# 2a) The cpu has been busy for more than 2 minutes, AND
# 2b) the job has been running for more than 90 seconds
SUSPEND = ( $(KeyboardBusy) || \
            ( (CpuBusyTime > 2 * $(MINUTE)) \
              && $(ActivationTimer) > 90 ) )

WANT_SUSPEND = SUSPEND


# Preempt jobs if:
# 1) The job is suspended and has been suspended longer than we want
# 2) OR, we don't want to suspend this job, but the conditions to
#    suspend jobs have been met (someone is using the machine)
PREEMPT = ( ((Activity == "Suspended") && \
            ($(ActivityTimer) > $(MaxSuspendTime))) \
           || (SUSPEND && (WANT_SUSPEND == False)) )

CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
             && (KeyboardIdle > $(ContinueIdleTime)) )