Page History

Turn Off History

A common problem is when some execute machine is misconfigured or broken in such a way that it still accepts Condor jobs, but can't run them correctly. If jobs exit quickly on this kind of machine, it can quickly eat many of the jobs in your queue. We call this a "black hole" machine. To work around black hole machines, you can do the following in your job submit file:

match_list_length = 5

This tells condor to save the last five machine names in your job ad in the following attributes:

LastMatchName0 = "current-machine"
LastMatchName1 = "next-most-recent-Name"
LastMatchName2 = "next-next-most-recent-Macine"
...

You can then tell Condor that if a job is requeued, not to retry it on a recent machine -- note this starts with LastMatch1, not 0, which is the current machine:

Requirements = target.name =!= LastMatchName1 && target.name =!= LastMatchName2 ...