HTCondorWiki: How To Limit Memory Usage

How to limit memory usage of jobs

With HTCondor 8.0+ running on Operating Systems like RHEL6 that support cgroups, use the following methods to limit the memory used by jobs. A less powerful, but more portable method is listed below in the section titled: Known to work with HTCondor version: 7.8+

Known to work with HTCondor 8.0+ on Linux distros that fully support cgroups, such as RHEL6

Proviso: This configuration only works when the HTCondor daemons have been started as root, usually by the init process. It will not work with a glided-in HTCondor or a personal condor.

First, configure cgroups as described in the manual in section "Cgroup-Based Process Tracking", in section 3.12: Cgroup-Based Process Tracking

Second, set the configuration parameter CGROUP_MEMORY_LIMIT_POLICY to hard.

Third, set the Request_Memory attribute in the job to the expected maximum physical memory needed by all the processes in the job.

HTCondor will then allocate an appropriately sized slot for your job, and configures the kernel to prevent it from allocating more physical pages for your job than requested. Note this may cause swapping. If the job would otherwise exhaust the memory requested, all the processes in the job will be killed by the out of memory killer, and the job will be put on hold.

Known to work with HTCondor version 7.8+

Users can specify the amount of memory and disk space that their job needs by setting the following attributes in their job ad.

RequestDisk The amount of disk needed by the job in KB
RequestMemory The amount of memory needed by the job in MB

HTCondor matches jobs with resources by referencing the above in the job's Requirements expression, and by provisioning dynamic slots with resources greater than or equal to the requested amounts.

HTCondor monitors the total virtual memory usage of jobs. This includes both physical RAM and disk-based virtual memory allocated by the job. Information about the memory and disk usage of the job is stored in the job add using the following attributes. These attributes are also available for use in STARTD policy expressions.

DiskUsage The amount of disk used in KB
MemoryUsage The amount of memory used in MB
ImageSize Maximum observed memory image size in KB
ResidentSetSize The amount of virtual memory used in KB
ProportionalSetSize when avaliable, this is a better measure of virtual memory used for jobs that have multiple processes that share memory.

By default, MemoryUsage is an expression that converts ResidentSetSize to MB.

How to preempt (evict) jobs that use too much memory

Put the following in your configuration file on the execute machine. This assumes that things like PREEMPT have already been defined further up in the configuration, so put it after the other stuff, or merge it into the other stuff.

# preempt jobs that are more than 10% over memory assigned to the slot.
PREEMPT = $(PREEMPT) || ((MemoryUsage*1.1 > Memory) =!= TRUE)
WANT_SUSPEND = $(WANT_SUSPEND) && ((MemoryUsage*1.1 > Memory) =!= TRUE)

Note that preempted jobs will go back to idle in the job queue and will potentially try to run again if they can match to a machine. If you instead wish to put the jobs on hold when they are evicted, either use the submit-side policy described later or, in HTCondor 7.3+, use the expression WANT_HOLD. One advantage of using WANT_HOLD -vs- the submit-side policy example below is the startd will evaluate these attributes much more frequently than updates are sent to the schedd. An example using WANT_HOLD :

# hold jobs that are more than 10% over memory assigned to the slot.
MEMORY_EXCEEDED = ((MemoryUsage*1.1 > Memory) =!= TRUE)
PREEMPT = ($(PREEMPT)) || $(MEMORY_EXCEEDED)
WANT_SUSPEND = $(WANT_SUSPEND) && $(MEMORY_EXCEEDED)
WANT_HOLD = $(MEMORY_EXCEEDED)
WANT_HOLD_REASON = ifThenElse( $(MEMORY_EXCEEDED), \
               "Your job used too much virtual memory.", \
               undefined )

How to hold/remove a job that uses too much memory

Jobs can hold or remove themselves by specifying a periodic_hold or periodic_remove expression. The schedd can also hold or remove jobs as dictated by the configuration expressions SYSTEM_PERIODIC_HOLD or SYSTEM_PERIODIC_REMOVE. These are all submit-side controls, whereas the PREEMPT example above is an execute-side control. One problem with the PREEMPT example is that it doesn't do a very good job of communicating to the job owner why the job was evicted. Putting the job on hold may help communicate better. Then the user knows to resubmit the job with larger memory requirements or investigate why the job used more memory than it should have. The following example configuration shows how to put jobs on hold from the submit-side when they use too much memory. All of the same issues concerning accurate measurement of working set size apply here just as they did in the PREEMPT example above.


# When a job matches, insert the machine memory into the
# job ClassAd so periodic_remove can refer to it.
MachineMemoryString = "$$(Memory)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineMemoryString

SYSTEM_PERIODIC_HOLD = MATCH_EXP_MachineMemory =!= UNDEFINED && \
                       MemoryUsage > 0.9*int(MATCH_EXP_MachineMemoryString)

How to automatically increase request_memory of a held job

Lets say I have a job with request_memory = 256 Megabytes, but if I go over it will get held according to my SYSTEM_PERIODIC_HOLD policy. I would like to automatically triple the request_memory and then release the job. A setup to do precisely this was posted on the htcondor-users email list.

How to place a hard limit on memory usage by a job in unix

HTCondor doesn't currently provide a configuration setting for this, but you can write your own wrapper script that runs before the job and sets resource limits that are enforced by the operating system. Here is what you put in the configuration file of your execute machines:

USER_JOB_WRAPPER = /path/to/condor_job_wrapper

The file condor_job_wrapper above can be called whatever you want. You should create that file with the following contents:

#!/bin/sh

# change this to the maximum allowed data segment size (in kilobytes)
ulimit -d 1000000

# run the job
exec "$@"

Note that ulimit -m (maximum resident memory size) appears attractive, but it is not actually enforced on many operating systems.

Make sure the wrapper script is executable. Example:

chmod a+x /path/to/condor_job_wrapper