HTCondorWiki: How To Limit Disk Usage Of Jobs

Page History

How to limit disk usage of jobs

Known to work with Condor version: 7.0

Condor monitors how much disk space jobs consume in the scratch directory created for the job on the execute machine when the job runs. This scratch directory is typically only used by jobs which turn on Condor's file transfer mode (should_transfer_files=true). For such jobs, the scratch directory is the current working directory and they might write their output files into that directory while they are running.

One problem that can happen is that one job on a multi-cpu system uses up so much space that all other jobs fail due to lack of space. If the partition containing Condor's EXECUTE directory is shared by other tasks (including perhaps Condor), a full partition could cause additional things to fail as well.

How to preempt (evict) a job that uses too much disk space

The following configuration settings should be put in the config file of the execute machines (or the whole pool). The reason a number must be inserted for MAX_DISK_USAGE_KB instead of using the Disk attribute of the machine ClassAd is that the Disk attribute measures the amount of free space on the disk, not the amount of space promised to the job.

MAX_DISK_USAGE_KB = insert_number_here!
DISK_EXCEEDED = DiskUsage > $(MAX_DISK_USAGE_KB)

PREEMPT = ($(PREEMPT)) || ($(DISK_EXCEEDED))

WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(DISK_EXCEEDED)) =!= TRUE

How to configure a separate disk partition for each batch slot

The most effective way to control how much space jobs use is to put the execute directory for each slot on its own disk partition. Then you don't have to worry about a malformed job consuming massive amounts of disk space before PREEMPT has a chance to operate. Assuming you have already created the necessary partitions, you can configure Condor to use them like this:

SLOT1_EXECUTE = /path/to/execute1
SLOT2_EXECUTE = /path/to/execute2
...

How to hold/remove a job that uses too much disk

Jobs can hold or remove themselves by specifying a periodic_hold or periodic_remove expression. The schedd can also hold or remove jobs as dictated by the configuration expressions SYSTEM_PERIODIC_HOLD or SYSTEM_PERIODIC_REMOVE. These are all submit-side controls, whereas the PREEMPT example above is an execute-side control. One problem with the PREEMPT example is that it doesn't do a very good job of communicating to the job owner why the job was evicted. Putting the job on hold may help communicate better. Then the user knows to resubmit the job with disk memory requirements or investigate why the job used more disk than it should have. The following example configuration shows how to put jobs on hold from the submit-side when they use too much disk.

# When a job matches, insert the machine disk space into the
# job ClassAd so periodic_remove can refer to it.
MachineDiskString = "$$(Disk)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineDiskString

SYSTEM_PERIODIC_HOLD = MATCH_EXP_MachineDisk =!= UNDEFINED && \
                       DiskUsage > int(MATCH_EXP_MachineDiskString)