How to limit disk usage of jobs

Known to work with HTCondor version: 7.0

HTCondor monitors how much disk space jobs consume in the scratch directory created for the job on the execute machine when the job runs. This scratch directory is typically only used by jobs which turn on HTCondor's file transfer mode (should_transfer_files=true). For such jobs, the scratch directory is the current working directory and they might write their output files into that directory while they are running.

One problem that can happen is that one job on a multi-cpu system uses up so much space that all other jobs fail due to lack of space. If the partition containing HTCondor's EXECUTE directory is shared by other tasks (including perhaps HTCondor), a full partition could cause additional things to fail as well.

How to preempt (evict) a job that uses too much disk space

The following configuration settings should be put in the config file of the execute machines (or the whole pool). The reason a number must be inserted for MAX_DISK_USAGE_KB instead of using the Disk attribute of the machine ClassAd is that the Disk attribute measures the amount of free space on the disk, not the amount of space promised to the job.

MAX_DISK_USAGE_KB = insert_number_here!
DISK_EXCEEDED = DiskUsage > $(MAX_DISK_USAGE_KB)

PREEMPT = ($(PREEMPT)) || ($(DISK_EXCEEDED))

WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(DISK_EXCEEDED)) =!= TRUE

How to configure a separate disk partition for each batch slot

The most effective way to control how much space jobs use is to put the execute directory for each slot on its own disk partition. Then you don't have to worry about a malformed job consuming massive amounts of disk space before PREEMPT has a chance to operate. Assuming you have already created the necessary partitions, you can configure HTCondor to use them like this:

SLOT1_EXECUTE = /path/to/execute1
SLOT2_EXECUTE = /path/to/execute2
...

How to hold/remove a job that uses too much disk

Jobs can hold or remove themselves by specifying a periodic_hold or periodic_remove expression. The schedd can also hold or remove jobs as dictated by the configuration expressions SYSTEM_PERIODIC_HOLD or SYSTEM_PERIODIC_REMOVE. These are all submit-side controls, whereas the PREEMPT example above is an execute-side control. One problem with the PREEMPT example is that it doesn't do a very good job of communicating to the job owner why the job was evicted. Putting the job on hold may help communicate better. Then the user knows to resubmit the job with disk memory requirements or investigate why the job used more disk than it should have. The following example configuration shows how to put jobs on hold from the submit-side when they use too much disk.

# When a job matches, insert the machine disk space into the
# job ClassAd so periodic_remove can refer to it.
MachineDiskString = "$$(Disk)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineDiskString

SYSTEM_PERIODIC_HOLD = MATCH_EXP_MachineDisk =!= UNDEFINED && \
                       DiskUsage > int(MATCH_EXP_MachineDiskString)

What if you care about free disk space, not disk space used?

The above sections are helpful when you are comparing against the disk space used, but sometimes it is better to worry about the free disk space. In large environments, especially when some of the execute nodes are desktop machines, hard drives aren't always the same size. In such cases, you might prefer to be concerned about the remaining space on the disk. The policy settings are similar, with a few small changes.

How to preempt (evict) a job when the disk is near full

As above, the following configuration settings should be put in the config file of the execute machines (or the whole pool). The TotalDisk ClassAd provides the amount of disk space free, not the total capacity of the disk.

MIN_FREE_DISK_KB = insert_number_here
DISK_EXCEEDED = TotalDisk < $(MIN_FREE_DISK_KB)

PREEMPT = ($(PREEMPT)) || ($(DISK_EXCEEDED))

WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(DISK_EXCEEDED)) =!= TRUE

How to hold/remove a job running on a near-full disk

Jobs can hold or remove themselves by specifying a periodic_hold or periodic_remove expression. The schedd can also hold or remove jobs as dictated by the configuration expressions SYSTEM_PERIODIC_HOLD or SYSTEM_PERIODIC_REMOVE. These are all submit-side controls, whereas the PREEMPT example above is an execute-side control. One problem with the PREEMPT example is that it doesn't do a very good job of communicating to the job owner why the job was evicted. Putting the job on hold may help communicate better. Then the user knows to resubmit the job with disk requirements or investigate why the job used more disk than it should have. The following example configuration shows how to put jobs on hold from the submit-side when the disk is nearly full.

# When a job matches, insert the machine disk space into the
# job ClassAd so periodic_remove can refer to it.
MachineTotalDiskString = "$$(TotalDisk)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineTotalDiskString

MIN_FREE_DISK_FOR_HOLD = ( MATCH_TotalDisk - insert_a_value_in_KB_here )

SYSTEM_PERIODIC_HOLD = MATCH_EXP_MachineTotalDisk =!= UNDEFINED && \
                      ( DiskUsage > $(MIN_FREE_DISK_FOR_HOLD) )