Page History

Turn Off History

By default, HTCondor assumes that startds are relatively stable and long lived. A HTCondor startd, representing a machine, will show up in the collector, and thus be visible in the output of condor_status after the first time it sends an update to the collector. The startd will then send periodic updates to the collector every time its state changes (or the value of the START expression changes), and at periodic intervals even when there are no changes. This periodic interval is controlled by the parameter UPDATE_INTERVAL, whose default value is 300 seconds (5 minutes).

If the collector hasn't received an update from a particular startd in a set amount of time, it discards the startd's ad as "stale", and the startd no longer shows up in the collector or in condor_status output. More importantly, it can't be matched to jobs. This allows HTCondor to gracefully disconnect from machines even when there is a network outage or other interruption where the code doesn't get an opportunity to send notice that it is going away. The stale interval is controlled by a parameter in the collector called CLASSAD_LIFETIME, and by default, is set to 900 seconds (15 minutes). Alternately, if the startd sets a machine attribute called ClassAdLifetime, this is used for that startd instead. Note that there is no support in the startd for setting this directly. To set these attribute you must use the following config:

ClassAdLifeTime = 180
STARTD_ATTRS = ClassAdLifeTime

However, some sites run with startds that come and go frequently. This is the case for startds which are launched in the cloud, inside Virtual Machines, or in glideins. In these cases, the 15 minute staleness timer can be too long, for a startd can go away, but the central manager won't know, and will still try to send jobs to it, lowering throughput. If the startd is shut down in a controlled way, with the condor_off command, it will try to send an invalidate request to the collector, and if successful, will cause it to be removed immediately. However, in these environments, it is not always possible to know when the underlying system will pull the plug on the startd.

If your pool is such a site, it is recommended to lower these default values, assuming that you don't have the opportunity to run condor_off.

For the advanced user, the condor_advertise command with the INVALIDATE_AD option can also be used to force the removal of the ad from the collector.