{section: How to Rendezvous in the HTCondor Parallel Universe} {subsection: Introduction} When running parallel universe jobs, often one node of a job would like to send some small amount of data to the other nodes in the job. This can be an IP address and port, or a filename, or some other object to synchronize on. HTCondor chirp command makes this straightforward. The basic idea is that the sending process writes the information into the job ad with condor_chirp sett_job_attr, and the receivers poll this information from the job classad with condor_chirp get_job_attr. {subsection: Example submit file} Assuming that the dedicated scheduler has been configured, create a submit file like this: {code} universe = parallel executable = run.sh should_transfer_files = yes when_to_transfer_output = on_exit getenv = true output = out.$(Node) error = err.$(Node) log = log machine_count = 2 +WantIOProxy = true queue {endcode} {subsection: The run.sh script} The run.sh script first checks to see if it is the client or server. Arbitrarily, we pick node 0 for the server, and 1 for the client. Node 0 picks a string, inserts it into the job ad attribute called JobServerAddress, and sleeps. Node 1 polls for this value, prints it out when it is available, then exits. Obviously, this value could be the address of a port listening on, or anything else the two processes would like to synchronize on. {code} #!/bin/sh CONDOR_CHIRP=`condor_config_val LIBEXEC`/condor_chirp get_job_attr_blocking() { while /bin/true do JobServerAddress=`$CONDOR_CHIRP get_job_attr $1` if [ $? -ne 0 ]; then echo "Chirp is broken!" 1>&2 exit 1 fi if [ "$JobServerAddress" != "UNDEFINED" ]; then echo "$JobServerAddress" exit 0 fi sleep 2 done } if [[ $_CONDOR_PROCNO = "0" ]] then $CONDOR_CHIRP set_job_attr JobServerAddress '"This is my address"' # Do other server things here... sleep 60 exit 0 fi if [[ $_CONDOR_PROCNO = "1" ]] then JobServerAddress=`get_job_attr_blocking JobServerAddress` if [ $? -ne 0 ]; then echo "Chirp is broken" exit 1 fi # I've got the address, do client things here like connect to the server.. echo "JobServerAddress is $JobServerAddress" exit 0 fi exit 0 {endcode} {subsection: Copying a file from one node to another} Here is a more sophisticated example, using the nc program (netcat) to copy a file from one machine to another. Note that there is no security at all in this file copy, any process on the network can connect and send data to the listening process, so you'd only want to run this on a trusted, protected network. The submit file is the same as above, but the run.sh is a bit more involved: {code} #!/bin/sh CONDOR_CHIRP=`condor_config_val LIBEXEC`/condor_chirp get_job_attr_blocking() { while /bin/true do JobServerAddress=`$CONDOR_CHIRP get_job_attr $1` if [ $? -ne 0 ]; then echo "Chirp is broken!" 1>&2 exit 1 fi if [ "$JobServerAddress" != "UNDEFINED" ]; then echo "$JobServerAddress" exit 0 fi sleep 2 done } HOSTNAME=`hostname` echo "I am $HOSTNAME, node $_CONDOR_PROCNO" if [[ $_CONDOR_PROCNO = "0" ]] then echo "I'm the server" # Start netcat listening on an ephemeral port (0 means kernel picks port) # It will wait for a connection, then write that data to output_file # -d is needed or else it will give up too early. nc -d -l 0 > output_file & # pid of the nc running in the background NCPID=$! # Sleep a bit to ensure nc is running sleep 2 # parse the actual port selected from netstat output NCPORT=` netstat -t -a -p 2>/dev/null | grep " $NCPID/nc" | awk -F: '{print $2}' | awk '{print $1}' ` echo "Listening on $HOSTNAME $NCPORT" $CONDOR_CHIRP set_job_attr JobServerAddress \"${HOSTNAME}\ ${NCPORT}\" # Do other server things here... sleep 60 echo "Here's the output:" cat output_file exit 0 fi if [[ $_CONDOR_PROCNO = "1" ]] then echo "I'm the client" JobServerAddress=`get_job_attr_blocking JobServerAddress` if [ $? -ne 0 ]; then echo "Chirp is broken" exit 1 fi echo "JobServerAddress: $JobServerAddress"; host=`echo $JobServerAddress | tr -d '"' | awk '{print $1}'` port=`echo $JobServerAddress | tr -d '"' | awk '{print $2}'` echo "Sending to $host $port" nc $host $port < /etc/hosts echo "Sent $?" # Don't do any other client stuff, just exit exit 0 fi exit 0 {endcode}