Page History

Turn Off History

Known to work with version 7.4.0

By default, Condor manages jobs under the assumption that the user wants them to be run as many times as necessary in order to successfully finish. If all goes well, this means the job will only run once. However, various failures can require that the job be restarted in order to succeed. Examples of such failures include:

In some cases, it is desired that jobs not be restarted. The user wants Condor to try to run the job once, and if this attempt fails for any reason, it should not make a second attempt. To achieve this, the following can be put in the job's submit file:

requirements = NumJobStarts == 0
periodic_remove = JobStatus == 1 && NumJobStarts > 0

Note that this does not guarantee that Condor will only start the job once. The NumJobStarts job attribute is updated shortly after the job starts running. Various types of failures can result in the job starting without this attribute being updated (e.g. network failure between submit and execute machine). By setting SHADOW_LAZY_QUEUE_UPDATE=false, the window of time between the job starting and the update of NumJobStarts can be decreased, but this still does not provide a guarantee that the job will never be started more than once.