How to configure multi-tier collectors

Known to work in HTCondor version: 7.0.1

This is a technique for increasing the scalability of the HTCondor collector. This has been found to help scale up glidein pools using GSI authentication in order to scale beyond ~5000 slots to ~20000 slots. Other strong authentication methods are similarly CPU intensive, so they should also benefit from this technique. When authenticating across the wide area network, network latency is actually more of a problem for the collector than CPU usage during authentication. The multi-tier collector approach helps distribute the latency and CPU usage across an adjustable number of collectors. The reason why this is particularly relevant to glidein pools is that these pools typically have shorter lived startds than dedicated pools, so new security sessions need to be established more often. In the case mentioned where we needed to configure a multi-tier collector to scale beyond 5k slots, the glideins were restarting (unsynchronized) with an average of about 3 hour lifespans.

The basic idea is to have multiple collectors that each individually serve a portion of the pool. The machine ClassAd that are sent to this collector are forwarded to one main collector (the central manager). The main collector is used for matchmaking purposes. All of these collectors could exist on the same machine, in which case you would want to make sure there are sufficient CPUs/cores.

Assuming you are running the collectors on the same machine, you will either need to configure the condor_shared_port service correctly or you will need to assign a different network port to each collector. The main collector can use the standard port, to keep things simpler.

Running multiple collectors on one single TCP port

Known to work in HTCondor version: 8.6.0+

Here is how you could configure it to create 3 "sub" collectors and to have them forward ClassAds to the main collector. Each sub collector will have a unique condor_shared_port sock name, enabling the central manager to only need a single listen port(by default, it will use well-known HTCondor port 9618 TCP). This configuration example leverages the fact that shared_port is enabled by default in HTCondor v8.6.0+, and also leverages the CollectorNode configuration template included in v8.6.0+.

In the condor_config file on your central manager, include the following:

# Define sub collectors (in this case, we want 3)
# This will result in a condor_collector with sock names
# of collectorchild1, collectorchild2, collectorchild3.
use Experimental:CollectorNode(child1)
use Experimental:CollectorNode(child2)
use Experimental:CollectorNode(child3)

# forward ads to the main collector over localhost (127.0.0.1)
# for scalability purposes.
# (this is ignored by the main collector, since the address matches itself)
CONDOR_VIEW_HOST = 127.0.0.1

Then you would configure a fraction of your pool (execute machines) to use one of the sub collectors. The execute machines can be manually configured, or randomly configured by adding the following into the condor_config on your execute nodes (note: do not put this line into the config of your central manager, just your execute nodes):

# Pick a random collectorchildX, where X is a random number between 1 and 3.
# If you are using more child collectors, then increase the second parameter
# to macro RANDOM_INTEGER() appropriately.  Note it is assumed that CONDOR_HOST is
# defined to be the full hostname (or IP address) of your pool's central manager.
COLLECTOR_HOST = $(CONDOR_HOST)?sock=collectorchild$RANDOM_INTEGER(1,3)

# If someone were to run a tool such as condor_status on an execute node, we want it to
# query the top-level collector so it sees the entire pool.
TOOL.COLLECTOR_HOST = $(CONDOR_HOST)

Your schedds and negotiator should be configured with COLLECTOR_HOST equal to the main collector.

How many sub-collectors are required? One data point: we have successfully used 70 sub-collectors on a single machine to support a pool of 100k GSI-authenticated daemons (masters and startds) with average lifespan of 3h and network round-trip-times of 0.1s (cross-Atlantic). The daemons in the pool were also using these collectors as their CCB servers.

Running each sub collector on an individual port

Known to work in HTCondor version: 7.0.1+

Here is how you could configure it to create 3 "sub" collectors on ports 10002-10004 (arbitrarily chosen) and to have them forward ClassAds to the main collector.

# define sub collectors
COLLECTOR2 = $(COLLECTOR)
COLLECTOR3 = $(COLLECTOR)
COLLECTOR4 = $(COLLECTOR)

# specify the ports for the sub collectors
COLLECTOR2_ARGS = -f -p 10002
COLLECTOR3_ARGS = -f -p 10003
COLLECTOR4_ARGS = -f -p 10004

# specify the logs for the sub collectors
# for 8.4 and later use
if version >= 8.4
  # these can be omitted in 8.6, as the default is $(LOG)/$(LOCAL_NAME)Log
  COLLECTOR2.COLLECTOR_LOG = $(LOG)/Collector2Log
  COLLECTOR3.COLLECTOR_LOG = $(LOG)/Collector3Log
  COLLECTOR4.COLLECTOR_LOG = $(LOG)/Collector3Log
else
  # for very old versions of HTCondor that don't handle LOCALNAME.KNOB correctly
  COLLECTOR2_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector2Log"
  COLLECTOR3_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector3Log"
  COLLECTOR4_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector4Log"
endif

# add sub collectors to the list of things to start
DAEMON_LIST = $(DAEMON_LIST) COLLECTOR2 COLLECTOR3 COLLECTOR4

# forward ads to the main collector over localhost (127.0.0.1)
# for scalability purposes.
# (this is ignored by the main collector, since the address matches itself)
CONDOR_VIEW_HOST = 127.0.0.1

Then you would configure a fraction of your pool (execute machines) to use one of the sub collectors. The execute machines can be manually configured, or randomly configured by using:

COLLECTOR_HOST=$RANDOM_CHOICE(collector.hostname:10002,collector.hostname:10003,collector.hostname:10004)
Your schedds and negotiator should be configured with COLLECTOR_HOST equal to the main collector.

How many sub-collectors are required? One data point: we have successfully used 70 sub-collectors on a single machine to support a pool of 100k GSI-authenticated daemons (masters and startds) with average lifespan of 3h and network round-trip-times of 0.1s (cross-Atlantic). The daemons in the pool were also using these collectors as their CCB servers.