How to get HTCondor and a NAT Firewall to Cooperate

There is an HTCondor machine which is a submit node or an execute node, and you would like to share your new computer resources with others outside your department or division. But, you do not want crooks using your systems to wreak havoc. So, there is a firewall between your HTCondor resource and the Internet. Assume that the firewall is a separate host from the HTCondor resource. It is a bastion host running Linux, and the firewall is iptables, with Network Address Translation (NAT). This assumption allows the description to include explicit commands to run. You or your firewall administrator should translate these instructions to your particular firewall installation.

This example also assumes that you are not using CCB. CCB allows communication between HTCondor daemons in a private network (outgoing connections only) with daemons in a public network (bidirectional connections allowed). It therefore is not a complete solution for a case where there are daemons in two separate private networks communicating. One or the other network must allow bidirectional connections for CCB to help. In the example case described here, we want a submit node which is in a private network to communicate with execute nodes in other private networks. Open Science Grid works this way. This is the private-to-private case that cannot be solved with CCB alone. The solution given here uses port-forwarding to make the submit node effectively public. It allows the execute nodes in the remote private network to use CCB to have bidirectional connectivity with your submit node. Bidirectional connectivity could also be achieved without CCB by also applying the port-forwarding solution to the execute nodes of the remote private network, which may not be possible, either because of your own security concerns or because you do not administer machines on the remote network.

Assume that HTCondor is installed and running with the following set up:

Make changes to file condor_config.local on machine S. To find this configuration file, see the very beginning of the output generated by the command condor_config_val -dump.

USE_SHARED_PORT = True
SHARED_PORT_ARGS = -p 9617
PRIVATE_NETWORK_NAME = mydomain.net
PRIVATE_NETWORK_INTERFACE = eth0
TCP_FORWARDING_HOST = 10.0.0.1
In these configuration settings, the choice of port 9617 is random; it may be any port on the system. 9618 is often chosen; it is the well-known port of the condor_collector daemon. In this example, there is no condor_collector daemon in the 192.168.0.0/24 network that will be contacted from outside 192.168.0.0/24, so 9618 is also a valid port number; avoid port 9618 if you have an internal condor_collector daemon. Note that the configuration variable TCP_FORWARDING_HOST must match the external address of the condor_collector daemon.

On the execute node E, there are similar configuration changes, except for the shared port:

USE_SHARED_PORT = True
SHARED_PORT_ARGS = -p 9616
PRIVATE_NETWORK_NAME = mydomain.net
PRIVATE_NETWORK_INTERFACE = eth0
TCP_FORWARDING_HOST = 10.0.0.1
The use of configuration variable PRIVATE_NETWORK_NAME on S and E allow them to communicate directly, without going through firewall F. The port choice of 9616 is arbitrary.

On firewall F, run the following commands to redirect connections from the Internet to ports 9617 and 9616 on F and to the corresponding ports on S and E:

iptables -t nat -A PREROUTING -p tcp -d 10.0.0.1 --dport 9617 -j DNAT --to-destination 192.168.0.1
iptables -t nat -A PREROUTING -p tcp -d 10.0.0.1 --dport 9616 -j DNAT --to-destination 192.168.0.2
iptables -A POSTROUTING -o eth1 -j SNAT --to-source 10.0.0.1
The first command causes inbound connections to 10.0.0.1 on port 9617 to be rewritten to be connections to 192.168.0.1, port 9617. The second is a similar command, but for 192.168.0.2, port 9616. The third is probably superfluous, as it is likely already in the firewall rules. It causes all outbound connections from 10.0.0.1 to appear as if they emanate from 10.0.0.1.

After making the configuration changes, run condor_reconfig on S and E to incorporate the changes.

This works within HTCondor by wrapping all the information within a string with all the address information. To observe the string, issue the command condor_status -schedd -l, which will output a scheduler ClassAd. This ClassAd will contain a line that looks something like:

MyAddress = "<10.0.0.1:9617?PrivAddr=%3c192.168.0.1:9617%3fsock%3d936_480b_8%3e&PrivNet=mydomain.net&noUDP&sock=936_480b_8>"
This string contains the needed information for another HTCondor client or daemon to contact the condor_schedd daemon and begin using high throughput computing. Examining this string may be helpful in debugging, if you are unable to connect to remote HTCondor services.