How to ban a machine from executing jobs
Known to work with HTCondor version: 7.0
Suppose jobs are mysteriously failing on a particular machine. Some kind of hardware problem such as memory corruptionis suspected. It is probably a good idea to turn HTCondor off until the problem is solved. In addition, just so no mistakes are made, it may also a good idea to take the machine out of the HTCondor pool, in case HTCondor gets restarted prematurely.
Do this by adding to the HTCondor configuration visible to the condor_collector daemon:
# 2008-04-27: badmachine has a hardware problem, so temporarily blacklist it HOSTDENY_WRITE = $(HOSTDENY_WRITE) badmachine.domain.name
Issue the command
condor_reconfig -full