Page History

Turn Off History

How to ban a machine from executing jobs

Known to work with HTCondor version: 7.0

Suppose jobs are mysteriously failing on a particular machine. Some kind of hardware problem such as memory corruptionis suspected. It is probably a good idea to turn HTCondor off until the problem is solved. In addition, just so no mistakes are made, it may also a good idea to take the machine out of the HTCondor pool, in case HTCondor gets restarted prematurely.

Do this by adding to the HTCondor configuration visible to the condor_collector daemon:

# 2008-04-27: badmachine has a hardware problem, so temporarily blacklist it
HOSTDENY_WRITE = $(HOSTDENY_WRITE) badmachine.domain.name

Issue the command

condor_reconfig -full
The existing machine ClassAd will take time to expire (~15 minutes); restart the condor_collector daemon if the expiration interval is too long to wait.