Page History

Turn Off History

How to limit memory usage of jobs

Known to work with HTCondor version: 7.0

HTCondor monitors the total virtual memory usage of jobs. This includes both physical RAM and disk-based virtual memory allocated by the job. Administrators often want to limit the amount of physical RAM used by the job so that it doesn't cause performance problems for other jobs or tasks on the computer. This is difficult to do in a general way, because HTCondor currently lacks a way to measure how much physical RAM the application actually needs verses how much of its memory could be swapped out to the disk without impacting performance. (How much the application actually needs is known as the working set size.)

Example: HTCondor may see that the virtual memory size of a job is 1.5GB when there is only 1GB per slot on the 4 core system. This could be a problem if jobs on the other slots need their full 1GB of expected memory. However, it may be that there simply isn't demand for memory at the moment, so the operating system is letting this job keep more of its memory in physical RAM than it actually needs. If something else comes along and demands more memory, the memory usage of this job might painlessly shift so that only 1.0GB is in physical RAM and the other 0.5GB is on disk, leaving the expected amount of RAM for other jobs without causing poor performance due to thrashing (actively needed data jumping back and forth between disk and RAM).

This is an area we hope to improve in HTCondor. In the mean time, here are some recipes that have proven useful, even though they are not perfect.

How to preempt (evict) jobs that use too much memory

Put the following in your configuration file on the execute machine. This assumes that things like PREEMPT have already been defined further up in the configuration, so put it after the other stuff, or merge it into the other stuff.

# Let a job use up to 90% of the memory allocated to its batch slot
MEMORY_AVAILABLE_MB = (Memory*0.9)


# The working set size is the amount of memory the job actually needs
# in RAM (as opposed to disk-based memory) in order to run without
# thrashing (copying data back and forth between RAM and disk frequently).
# If the job has an attribute "MemoryRequirementsMB", then we use that
# for the working set size.  This is a custom attribute that would have
# to be manually set by the user, and which we trust in place of our
# default assumption.  The default in this example is to arbitrarily
# assume the working set size is 70% of the virtual memory size.
# That will certainly be wrong if the job calls mmap() on a large file,
# but doesn't need the full file in RAM, so in such cases, the user will
# have to set MemoryRequirementsMB.

WORKING_SET_SIZE_MB = ifThenElse( isUndefined(MemoryRequirementsMB), \
                                  ImageSize/1024*0.7, \
                                  MemoryRequirementsMB )

# Here we check if the working set size of the job is greater than
# the RAM allocated to this batch slot.  We also check if the virtual
# memory size of the job is greater than the total virtual memory allocated
# to this batch slot.  If either is true, then memory is exceeded.

MEMORY_EXCEEDED = $(WORKING_SET_SIZE_MB) > $(MEMORY_AVAILABLE_MB)

PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))

WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE

Note that preempted jobs will go back to idle in the job queue and will potentially try to run again if they can match to a machine. If you instead wish to put the jobs on hold when they are evicted, either use the submit-side policy described later or, in HTCondor 7.3+, use the expression WANT_HOLD. One advantage of using WANT_HOLD -vs- the submit-side policy example below is the startd will evaluate these attributes much more frequently than updates are sent to the schedd. An example using WANT_HOLD :

VIRTUAL_MEMORY_AVAILABLE_MB = (VirtualMemory*0.9)
MEMORY_EXCEEDED = ImageSize/1024 > $(VIRTUAL_MEMORY_AVAILABLE_MB)
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE
WANT_HOLD = ($(MEMORY_EXCEEDED))
WANT_HOLD_REASON = \
   ifThenElse( $(MEMORY_EXCEEDED), \
               "Your job used too much virtual memory.", \
               undefined )

How to hold/remove a job that uses too much memory

Jobs can hold or remove themselves by specifying a periodic_hold or periodic_remove expression. The schedd can also hold or remove jobs as dictated by the configuration expressions SYSTEM_PERIODIC_HOLD or SYSTEM_PERIODIC_REMOVE. These are all submit-side controls, whereas the PREEMPT example above is an execute-side control. One problem with the PREEMPT example is that it doesn't do a very good job of communicating to the job owner why the job was evicted. Putting the job on hold may help communicate better. Then the user knows to resubmit the job with larger memory requirements or investigate why the job used more memory than it should have. The following example configuration shows how to put jobs on hold from the submit-side when they use too much memory. All of the same issues concerning accurate measurement of working set size apply here just as they did in the PREEMPT example above.

# The working set size is the amount of memory the job actually needs
# in RAM (as opposed to disk-based memory) in order to run without
# thrashing (copying data back and forth between RAM and disk frequently).
# If the job has an attribute "MemoryRequirementsMB", then we use that
# for the working set size.  This is a custom attribute that would have
# to be manually set by the user, and which we trust in place of our
# default assumption.  The default in this example is to arbitrarily
# assume the working set size is 70% of the virtual memory size.
# That will certainly be wrong if the job calls mmap() on a large file,
# but doesn't need the full file in RAM, so in such cases, the user will
# have to set MemoryRequirementsMB.

WORKING_SET_SIZE_MB = ifThenElse( isUndefined(MemoryRequirementsMB), \
                                  ImageSize/1024*0.7, \
                                  MemoryRequirementsMB )

# When a job matches, insert the machine memory into the
# job ClassAd so periodic_remove can refer to it.
MachineMemoryString = "$$(Memory)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineMemoryString

SYSTEM_PERIODIC_HOLD = MATCH_EXP_MachineMemory =!= UNDEFINED && \
                       $(WORKING_SET_SIZE_MB) > 0.9*int(MATCH_EXP_MachineMemoryString)

How to place a hard limit on memory usage by a job in unix

HTCondor doesn't currently provide a configuration setting for this, but you can write your own wrapper script that runs before the job and sets resource limits that are enforced by the operating system. Here is what you put in the configuration file of your execute machines:

USER_JOB_WRAPPER = /path/to/condor_job_wrapper

The file condor_job_wrapper above can be called whatever you want. You should create that file with the following contents:

#!/bin/sh

# change this to the maximum allowed data segment size (in kilobytes)
ulimit -d 1000000

# run the job
exec "$@"

Note that ulimit -m (maximum resident memory size) appears attractive, but it is not actually enforced on many operating systems.

Make sure the wrapper script is executable. Example:

chmod a+x /path/to/condor_job_wrapper