How to configure backfill tasks such as BOINC using work fetch

Should work in HTCondor version: 7.2

NOTE: This recipe doesn't work with partitionable slots. The BOINC client will happily run forever. We use machine RANK to preempt the BOINC jobs when more important jobs are available. Machine RANK doesn't work with partitionable slots as of HTCondor 8.0.

When HTCondor is not busy running jobs from users, you may want it to run some other backfill task. HTCondor provides special support for backfilling with BOINC. However, the built-in support assumes that BOINC can decide for itself how many tasks to run in order to fill the idle cpus on a multi-cpu/core machine. At this time (BOINC 6.4), there is no such capability in BOINC. Therefore, the built-in support for BOINC is only really capable of backfilling a single slot. (If you set it up, HTCondor may show multiple slots in the backfill state, but in fact only a single instance of BOINC will be running and it will be running a statically configured number of work units in parallel--typically just one.)

To get around that problem, there is a different way to configure HTCondor to run BOINC backfill (or any other type of backfill task). It uses the startd's fetch-work hook to run one instance of the backfill task per idle HTCondor slot. Here is an example configuration:

BOINC_HOME = /opt/boinc
# the following BOINC settings are used by the boinc fetch work hook
BOINC_Executable = $(BOINC_HOME)/BOINC/boinc
BOINC_InitialDir = $(BOINC_HOME)/var/slot$(BOINC_SLOT)
BOINC_Owner = backfill
BOINC_User = backfill@your.domain
BOINC_Arguments = -no_gui_rpc -allow_multiple_clients -attach_project http://einstein.phys.uwm.edu <insert_id>
BOINC_Output = $(BOINC_InitialDir)/boinc.out
BOINC_Error = $(BOINC_InitialDir)/boinc.err
BOINC_Requirements = \
   RemoteUser =?= "$(BOINC_Owner)" || \
   RemoteUser =?= "$(BOINC_User)" || \
   (State == "Unclaimed" && $(StateTimer) > 1200)

# configure the startd to run BOINC backfill jobs
STARTD_JOB_HOOK_KEYWORD = BOINC
BOINC_HOOK_FETCH_WORK = $(BOINC_HOME)/fetch_work_boinc
RANK = $(RANK) - (Owner =?= "backfill")

The above example starts up BOINC after 20 minutes of the slot being unclaimed. (You will see an entry in the startd logs each time the startd calls out to the fetch-work script. The note says that the job requirements were not met whenever the above requirements expression is not true.)

The BOINC jobs will run as the backfill user. It assumes BOINC has been installed in /opt/boinc/BOINC and directories for each instance of BOINC have been created in /opt/boinc/var/slot1, slot2, etc. The slotX directories should be owned by the backfill user (or whatever user you have configured the slot to run as).

The fetch_work_boinc script can just be a shell script such as the following:

#!/bin/sh

# read the machine ClassAd and extract what we need
# (set _CONDOR_BOINC_SLOT so that BOINC config variables can
#  reference $(BOINC_SLOT))
eval `awk '/^SlotID/ {print "export _CONDOR_BOINC_SLOT="$3}'`

# load the following config variables from the HTCondor configuration
BOINC_ConfigVars="
BOINC_Executable
BOINC_InitialDir
BOINC_Owner
BOINC_User
BOINC_Arguments
BOINC_Output
BOINC_Error
BOINC_Requirements
"

for var in ${BOINC_ConfigVars}; do
  value=`condor_config_val $var`

  # if anything is not defined, bail out
  if [ "$?" != 0 ] || [ "$value" = "" ]; then
    echo "Failed to look up $var in HTCondor configuration." 2>&1
    exit 1
  fi

  eval $var=\'$value\'
done


# generate the BOINC job ClassAd
echo "Cmd = \"${BOINC_Executable}\""
echo "IWD = \"${BOINC_InitialDir}\""
echo "Owner = \"${BOINC_Owner}\""
echo "User = \"${BOINC_User}\""
echo "JobUniverse = 5"
echo "Arguments = \"${BOINC_Arguments}\""
echo "Out = \"${BOINC_Output}\""
echo "Err = \"${BOINC_Error}\""
echo "NiceUser = true"
echo "Requirements = ${BOINC_Requirements}"
echo "MaxJobRetirementTime = 0"
echo "JobLeaseDuration = 604800"
echo "RequestCpus = 1"
echo "RequestDisk = 512"
echo "RequestMemory = 512"

Inside of each of the slot_X directories, you should create a file named global_prefs_override.xml. The contents of that file depend on the version of BOINC you are using. Here is an example for an 8-core machine running BOINC 6.2.15. You will at least need to adjust max_ncpus_pct so that the correct number of cpus per slot are consumed (typically you would have one cpu per slot).

<global_preferences>
    <max_ncpus_pct>0.125</max_ncpus_pct>
    <run_if_user_active>1</run_if_user_active>
    <disk_max_used_gb>10</disk_max_used_gb>
    <disk_max_used_pct>50</disk_max_used_pct>
    <disk_min_free_gb>0.1</disk_min_free_gb>
    <vm_max_used_pct>75</vm_max_used_pct>
    <ram_max_used_busy_pct>50</ram_max_used_busy_pct>
    <ram_max_used_idle_pct>90</ram_max_used_idle_pct>
    <cpu_usage_limit>100</cpu_usage_limit>
</global_preferences>

How to create static backfill slots that defer to a partitionable slot

Currently workfetch does not work with partitionable slots, but you can still run backfill jobs on machines that are configured to use partitionable slots. The way to do this is by creating a set of static slots that are configured to share the same resources as the partitionable slot, but to run jobs (or workfetch) only when the partitionable slot is not using the resouces. The configuration looks like this

# each backfill slot will have the following resources
BackfillCpusPerSlot = 1
BackfillMemoryPerSlot = 1000

# First tell the startd we have double the amount of cpus and memory than we actually have
# and advertise some additional information into the slots (such as the amount
# of Cpus, Memory leftover in the pslot.
#
num_cpus = 2 * $(DETECTED_CPUS)
memory = 2 * $(DETECTED_MEMORY)
startd_attrs = $(startd_attrs) BackfillSlot preempt want_vacate
startd_slot_attrs = $(startd_slot_attrs) Cpus Memory
# Decrease startd polling internal so backfill jobs are killed quickly when
# production jobs arrive.
polling_interval = 2

# Set up a pslot for the production jobs, adding a START requirement to
# prohibit accepting backfill jobs.  If you already have a p-slot configuration
# you only need to add the slot_type_1_start expression below. It prevents BOINC
# workfetch jobs from starting on the p-slot.
#
slot_type_1_partitionable = true
slot_type_1 = cpus=$(DETECTED_CPUS) memory=$(DETECTED_MEMORY) gpus=100% disk=50% swap=50%
num_slots_type_1 = 1
slot_type_1_BackfillSlot = False
slot_type_1_start = $(START) && (Owner != "$(BOINC_owner)")

# don't run workfetch on slot_type_1
slot_type_1_STARTD_JOB_HOOK_KEYWORD =

#
# Set up a set of static slots for backfill jobs.  The number of static slots
# is determined by how many CPUs/Memory required per slot.  Set the START expression
# to only accept backfill jobs, and only if the pslot has enough free resources.
# Preempt backfill slots one at a time as unused resources in the pslot disappear.
# Give this slots a custom name of "backfillX@foo", and a custom attribute of
# BackfillSlot=True.  Disable vacate on these slots, so that jobs are immediately killed
# upon preemption (we want the resources freed up asap for the production jobs).
#
NumBackfillSlots = min( { $(DETECTED_CPUS) / $(BackfillCpusPerSlot), $(DETECTED_MEMORY) / $(BackfillMemoryPerSlot) } )
Backfill_Cpus_Exhausted = (slot1_Cpus / $(BackfillCpusPerSlot) < (SlotID - 1))
Backfill_Memory_Exhausted = (slot1_Memory / $(BackfillMemoryPerSlot) < (SlotID- 1))
Backfill_Resources_Exhausted = ( $(Backfill_Cpus_Exhausted) || $(Backfill_Memory_Exhausted) )
slot_type_2_partitionable = false
slot_type_2_name_prefix = backfill
slot_type_2 = cpus=$(BackfillCpusPerSlot) memory=$(BackfillMemoryPerSlot), gpus=0
num_slots_type_2 = $INT(NumBackfillSlots)
slot_type_2_BackfillSlot = True
slot_type_2_start = (Owner == "$(BOINC_owner)") && ($(Backfill_Resources_Exhausted) == False)
slot_type_2_preempt = $(Backfill_Resources_Exhausted)
slot_type_2_want_vacate = False

# check once a minute for "better" BOINC jobs or when idle
slot_type_2_FetchWorkDelay = 60

How to submit backfill jobs as a user

The previous example showed how to configure an execute node to generate its own low-priority tasks to backfill the time that is unused by jobs from users. Another way to backfill the system is to simply submit jobs the usual way, but give them lower priority than all other jobs. One convenient way to do this is to use the nice_user submit-file command. This automatically reduces the priority of the job, but if you want the nice-user jobs to be preempted, be sure that your job management policy allows this.

Example submit file:

executable = my_backfill_task
nice_user = true
on_exit_remove = false
queue 1000

The above example creates 1000 low priority jobs. If the jobs ever exit, they remain in the queue and will run again when they next get matched to a machine.

Example preemption policy that allows preemption of nice-user jobs:

PREEMPTION_REQUIREMENTS = ($(PREEMPTION_REQUIREMENTS)) || TARGET.NiceUser =?= True

Check your RANK expression and make sure it never gives NiceUser jobs a higher rank than other jobs. If all jobs have the same rank, that is fine. You can just let ordinary user-priority preemption take place. If some jobs have higher rank than others, then just make sure the expression does not grant NiceUser jobs a high rank.

Example RANK expression that gives some jobs a high rank but ensures that NiceUser jobs are excluded:

RANK = (TARGET.NiceUser =!= True) * (TARGET.IsHighPrioJob =?= True) * 10