{subsection: Basic Guidelines for Large HTCondor Pools} -1. Upgrade to HTCondor 7 if you are still using something older than that. It has many scalability improvements. It is also a good idea to upgrade your configuration based on the defaults that ship with HTCondor 7, because it also contains updated settings that improve scalability. +1. Upgrade to HTCondor 8 if you are still using something older than that. It has many scalability improvements. It is also a good idea to upgrade your configuration based on the defaults that ship with HTCondor 8, because it also contains updated settings that improve scalability. 2. Put central manager (collector + negotiator) on a machine with sufficient memory and 2 cpus/cores primarily dedicated to this service. @@ -20,7 +20,7 @@ 5. Under Windows, you _might_ be able to increase the maximum number of jobs running in the schedd, but only if you also increase desktop heap space adequately. The problem on Windows is that each running job has an instance of condor_shadow, which eats up desktop heap space. Typically, this heap space becomes exhausted with on the order of only ~100 jobs running. See {link: http://www.cs.wisc.edu/condor/manual/v7.0/7_4Condor_on.html#SECTION008413000000000000000 My submit machine cannot have more than 120 jobs running concurrently. Why?} in the FAQ. -6. Put a busy schedd's spool directory on a fast disk with little else using it. +6. Put a busy schedd's spool directory on a fast disk with little else using it. If you have an SSD use the JOB_QUEUE_LOG config knob to put the job_queue.log file, the schedd's database, on the SSD drive. 7. If running a lot of big standard universe jobs, set up multiple checkpoint servers, rather than doing all checkpointing onto the submit node. @@ -53,6 +53,8 @@ *: Each running job has a condor_shadow process, which requires an additional ~500k RAM. (Disclaimer: we have some reports that in different environments/configurations, this requirement can be inflated by a factor of 2.) 32-bit Linux may run out of kernel memory even if there is free "high" memory available. In our experience, with HTCondor 7.3.0, a 32-bit dedicated submit machine cannot run more than 10,000 jobs simultaneously, because of kernel memory constraints. +*: Each shadow processes consumes on the order of 50 file descriptors, mainly because of shared libraries. You may need to dramatically increase the number of file descriptors available to the system or to a user, in order to run a lot of jobs. It is not unusual to configure a large memory system with a million file descriptors. + *: Each vanilla job requires two, occasionally three, network ports on the submit machine. Standard universe jobs require 5. In 2.6 Linux, the ephemeral port range is typically 32768 through 61000, so from a single submit machine this limits you to 14000 simultaneously running vanilla jobs. In Linux, you can increase the ephemeral port range via /proc/sys/net/ipv4/ip_local_port_range. Note that short-running jobs may require more ports, because a non-negligible number of ports will be consumed in the temporary TIME_WAIT state. For this reason, the HTCondor manual conservatively recommends 5 * running jobs. Fortunately, as of HTCondor 7.5.3, the TIME_WAIT issue with short running jobs is largely gone, due to SHADOW_WORKLIFE. Also, as of HTCondor 7.5.0, condor_shared_port can be used to reduce port usage even further. Port usage per running job is negligible if CCB is used to access the execute nodes; otherwise it is 1 (outgoing) port/job. Example calculations: