HTCondorWiki: How To Set Up Elastic Compute Cloud Pools

-This document describes an HTCondor configuration known to scale to 5000 EC2 nodes.  (It may also assist in constructing other pools across a wide-area, or otherwise high-latency, network.)  We present the configuration as a series scalability problems and solutions, in order of severity.  This document generally refers to the 7.8 and 7.9 release series.
+This document describes an HTCondor configuration known to scale to 5000 execute nodes in EC2; the central manager and submitter remain hosted on-site.  This document ay also assist in constructing other pools across a wide-area, or otherwise high-latency, network.  We present the configuration as a series scalability problems and solutions, in order of severity.  This document generally refers to the 7.8 and 7.9 release series.
 
 Before we begin, however, a few recommendations (especially for those readers constructing new pools):
 
@@ -6,17 +6,123 @@
 *: Expand slowly; this will allow you tackle one problem at a time.
 *: If possible, use dummy jobs for testing – jobs which behave something like your intended workload, but which don't produce any results you care about.  This allows you to throw away jobs if doing so becomes convenient, as it might if you need to reconfigure something to improve its scalability.  Dummy jobs may simply be real jobs whose results you have already obtained.
 
+{subsection: Network Connectivity}
+
+Because Amazon provides public IP addresses to running instances, you don't need to use CCB.  Instead, with the help of the following script, you can use *TCP_FORWARDING_HOST*, which incurs no additional load on the collector.  (Arrange for this script to be run on execute-node start-up.)  You will, of course, need to adjust (or create) a security group to allow inbound connections; consider using the shared port service (see below) to simplify this.
+
+{code}
+  cat > /etc/condor/config.d/ec2.local <<EOF
+EC2PublicIP = $(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
+EC2InstanceID = "$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/instance-id)"
+STARTD_EXPRS = \$(STARTD_EXPRS) EC2PublicIP EC2InstanceID
+EOF
+  chmod a+r /etc/condor/config.d/ec2.local
+{endcode}
+
+This script requires that *LOCAL_CONFIG_DIR* be set to */etc/condor/config.d*, the default for many distributions.
+
+{section: Example Configuration}
+
+For the execute node(s):
+{code}
+# See the "Network Connectivity" subsection.
+LOCAL_CONFIG_DIR = /etc/condor/config.d
+TCP_FORWARDING_HOST = $(EC2PublicIP)
+
+# See the "Shared Port" section.
+# This reduces the number of inbound ports you need to add to your
+# security group to 1.
+USE_SHARED_PORT = TRUE
+DAEMON_LIST = MASTER, STARTD, SHARED_PORT
+
+# See the "CCB" section.
+# CCB_ADDRESS = $(COLLECTOR_HOST)
+
+# See the "TCP Updates" section.
+# TCP updates are more expensive but more reliable over a WAN.
+UPDATE_COLLECTOR_WITH_TCP = TRUE
+
+# See the "Claim ID Security" section.
+SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
+ALLOW_DAEMON = condor_pool@*, submit-side@matchsession/<submitter-ip-address>
+{endcode}
+
+For the submit node(s):
+{code}
+# See the "Shared Port" section.
+# This reduces the number of inbound ports you need to add to your
+# firewall to 1.
+USE_SHARED_PORT = TRUE
+DAEMON_LIST = MASTER, SCHEDD, SHARED_PORT
+
+# See the "Claim ID Security" section.
+SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
+
+# See the "Increase File Descriptor Limits" section.
+SCHEDD_MAX_FILE_DESCRIPTORS = 20000
+{endcode}
+
+For the central manager:
+{code}
+# See the "Tiered Collectors" section.
+COLLECTOR02 = $(COLLECTOR)
+COLLECTOR03 = $(COLLECTOR)
+COLLECTOR04 = $(COLLECTOR)
+
+# These port numbers are somewhat arbitrary; edit as required.
+COLLECTOR02_ARGS = -f -p 10002
+COLLECTOR03_ARGS = -f -p 10003
+COLLECTOR04_ARGS = -f -p 10004
+
+# The master complains if these aren't set, although they don't	work.
+COLLECTOR02_LOG	= $(LOG)/Collector02Log
+COLLECTOR03_LOG	= $(LOG)/Collector03Log
+COLLECTOR04_LOG	= $(LOG)/Collector04Log
+
+# Setting the log file in the environment, however, does.
+COLLECTOR02_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector02Log"
+COLLECTOR03_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector03Log"
+COLLECTOR04_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/Collector04Log"
+
+DAEMON_LIST = MASTER, NEGOTIATOR, COLLECTOR, COLLECTOR02, COLLECTOR03, COLLECTOR04
+
+CONDOR_VIEW_HOST = $(COLLECTOR_HOST)
+
+# See the "Shared Port" section.
+COLLECTOR.USE_SHARED_PORT = FALSE
+
+# See the "Increase File Descriptor Limits" section.
+COLLECTOR_MAX_FILE_DESCRIPTORS = 20000
+
+# See the "Turn Off Match Notification" section.
+NEGOTIATOR_INFORM_STARTD = FALSE
+{endcode}
+
 {section: Tiered Collectors}
 
-Because the collector blocks on I/O, the number of nodes to which it can promptly respond is limited by the latencies of its connections to them.  Stronger security methods (pool password, Kerberos, SSL, and GSI) require more round-trips between the collector and the startd to establish trust, and may thus intensify this bottleneck.  Using CCB (to work around firewalls or private IP addresses) also increases collector load, especially when also using stronger security methods.
+Because the collector blocks on I/O, the number of nodes to which it can promptly respond is limited by the latencies of its connections to them.  Stronger security methods (pool password, Kerberos, SSL, and GSI) require more round-trips between the collector and the startd to establish trust, and may thus intensify this bottleneck.
 
 However, an HTCondor pool can use more than a single collector.  Instead, a number of secondary collectors communicate with the startds and forward the results to the primary collector, which all the other HTCondor tools and daemons use (as normal).  This allows HTCondor to perform multiple simultaneous I/O operations (and overlap them with computation) by using the operating system's whole-process scheduler.
 
-To implement this tiered collector set up, follow {wiki: HowToConfigCollectors this recipe}.  If you're already using the shared port service, you should disable it for the collector; it can cause multiple problems.  The easiest way to do this is to turn it off entirely on the central manager (which should not be the same machine as the submit node); it may also explicitly disabled for the collector by setting *COLLECTOR.USE_SHARED_PORT* to *FALSE*.
+The recipe for this tiered collector set up is included in the example configurations above; you may also want to read HowToConfigCollectors.  However, the example execute configuration is incomplete: it does not specify the secondary collector to which it should connect.  You may use the solution from HowToConfigCollectors:
+
+{code}COLLECTOR_HOST=$RANDOM_CHOICE(collector.hostname:10002,collector.hostname:10003,collector.hostname:10004){endcode}
+
+but this may cause confusion, as different daemons may connect to different secondary collectors.  To avoid this problem, add script like the following to the start-up sequence of the execute node VMs:
 
-To reduce confusion, you may also want to configure your execute nodes so that all of its HTCondor daemons connect to the same secondary collector.  (This has the added benefit that reconfiguring an execute node won't change to which collector it reports.)  One of way of doing this is by randomly choosing the *COLLECTOR_HOST* during the boot process.  If you don't set it in your other configuration files, you can simply create a file in the config.d directory (specified by *LOCAL_CONFIG_DIR*, usually /etc/condor/config.d) which sets it.
+{code}
+COLLECTOR_PORT=$((($RANDOM % 3) + 10001))
+echo "COLLECTOR_HOST = collector.hostname:${COLLECTOR_PORT}" > /etc/condor/config.d/collector-host.local
+chmod a+r /etc/condor/config.d/collector-host.local
+{endcode}
 
-{section: Shared Port Service}
+{section: CCB}
+
+If you're using an EC2-compatible service which doesn't provide public IPs, you may need to use CCB.  (You may also need to use CCB if you don't control the firewall around your VM instances.)  Using CCB increases the load on the collector, so you may need to increase the number of secondary collectors in your tiered configuration.
+
+Using CCB conjunction with the shared port service further reduces the port usage of HTCondor; thus, in certain circumstances, it may become useful independent of its ability to work around private IPs and firewalls.  See below.
+
+{section: Shared Port}
 
 In its default configuration, HTCondor acquires two ports on the submit node for the lifetime of each running job, one inbound and one outbound.  Although this limits the maximum number of running jobs per submit node (to about 23000), in practice, the firewall tends to become a problem before then.  You can reduce the requirement for inbound ports to the schedd to 1 by using the shared port service, a multiplexer.
 
@@ -24,23 +130,23 @@
 
 Additionally, using CCB in conjunction with the shared port service reduces the total socket on the central manager, as CCB routes all communications with the execute node through its single shared port.
 
-Using the shared port service may mean that the scaling limit of this configuration is RAM on the submit node: each running job requires a shadow, which in the latest version of HTCondor, uses roughly 1200KB.  (Or about 950KB for a 32-bit installation of HTCondor.)
+The shared port service may also be of benefit on the execute node when configuring its firewall.
 
-To use the shared port service, set *USE_SHARED_PORT* to *TRUE* on the submit and execute nodes.  You should not use the shared port service on the central manager; this can cause multiple problems.  You must also add the shared port service to the *DAEMON_LIST*, for instance by setting it to *$(DAEMON_LIST), SHARED_PORT*.
+Using the shared port service may mean that the scaling limit of this configuration is RAM on the submit node: each running job requires a shadow, which in the latest version of HTCondor, uses roughly 1200KB.  (Or about 950KB for a 32-bit installation of HTCondor.)
 
-To enable CCB, set *CCB_ADDRESS* to *$(COLLECTOR_HOST)* on the execute nodes.  Do not enable CCB on the central manager (for the collector); this can cause multiple problems, especially in conjunction with the shared port service.
+The shared port service is enabled in the example configurations above.
 
 {section: TCP Updates}
 
 By default, HTCondor uses UDP to maintain its pool membership list.  On a LAN, doing so is cheaper and very nearly as reliable.  On a WAN, it becomes much more likely that HTCondor will mistakenly think an execute node has gone away.  Doing so is expensive (and confusing), so we recommend using TCP updates.
 
-To enable TCP updates, set *UPDATE_COLLECTOR_WITH_TCP* to *TRUE* on the execute nodes.
+TCP updates are enable in the example configurations above.
 
 {section: Claim ID Security}
 
 By default, HTCondor creates a secure session from scratch for each daemon-to-daemon connection.  However, it's possible to use an existing shared secret (the claim ID) to jump-start the security negotiation in some cases, eliminating a few round-trips over the WAN.
 
-To use claim IDs for security, set *SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION* to *TRUE* for the submit and execute nodes.  On the execute node, you must also set *ALLOW_DAEMON* to *submit-side@matchsession/<x.y.z.w>*, where *x.y.z.w* is the address of your submit node.
+Claim ID security is enabled in the example configurations above.
 
 {section: Increase File Descriptor Limits}
 
@@ -50,13 +156,13 @@
 
 To increase the total number of system-wide file descriptors, echo the new value to _/proc/sys/file-max_.  (You may cat this file for the current value.)  For most distributions, appending *fs.file-max = <number>* to _/etc/sysctl.conf_ will make this change persist across reboots.  A million should be more than enough.
 
-If HTCondor is started as root, setting *[SCHEDD|COLLECTOR]_MAX_FILE_DESCRIPTORS* can increase how many file descriptors those HTCondor daemons can use.  Twenty thousand should suffice.
+The example configurations above, if started as root, will increase the file descriptor limits for the schedd and the collector.
 
 {section: Turn Off Match Notification}
 
 In HTCondor's default configuration, the negotiator notifies a startd when a job matches it.  This can slow job dispatch for the entire pool if the startd has vanished but HTCondor hasn't noticed yet, because the notification has a blocking a time out.  For pools using spot instances (which may frequently vanish without warning), we recommend turning off match notification.
 
-To turn off match notification, set *NEGOTIATOR_INFORM_STARTD* to *FALSE* for the central manager.
+Match notification is disable in the example configurations above.
 
 {section: Manage Disk Space}