Network Related Policy for HTCondor
Introduction
HTCondor is a software that evaluates the mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Current HTCondor system can match the submitted jobs and the available machines in the HTCondor pool in terms of the available computing resources. However, it does little integration and management of the network layer. Scheduling decisions in HTCondor are made without considering the underlying network capacities and conditions. It is highly possible that HTCondor may match submitted jobs with large input file to a remote node with little bandwidth. To handle this problem, we introduce the network related policy for HTCondor. We start with some user case examples, in which the network layer knowledge is taken into consideration when the users submit the jobs.
User Case Example
In this section, we demonstrate three possible user cases, in which the submitted jobs have specific network condition requirements. These user case examples will help the readers understand why HTCondor needs to incorporate network layer knowledge.
- Example 1: The submitted job requires potentially intensive network and data interactions with other machines distributed all around the world. The submitted job running on HTCondor is like a service provided to its subscribers. For instance, the running job could be a face recognition system. Users can submit their individual testing samples to the system independently. The system would process all the incoming queries, analyze them and give recognition result to each request. There is a large scale of users, who would generate a large amount of queries at one time. The throughput of the incoming data can be estimated to an approximate value, thus, the network bandwidth of the remote node running the job must be larger than the possible throughput; otherwise, the users will experience noticeable latency when waiting for the results.
- Example 2: The submitted job requires an IPv6 network. It will also have a lot of communications with the machines in some certain VLAN in the HTCondor pool. It has requirements on both inbound connectivity and outbound connectivity. (Bridge is preferred in network setup stage). For instance, the submitted job is a scientific simulation, which needs to read mass data as input to the simulation from the machines within the same VLAN. The experimental data is distributed among all the machines in the VLAN. Therefore, it is better for the new network space which host the running job to appear as if it was a physical host on the network.
- Example 3: The purpose of submitting the job is to analyze the characteristics of the network traffic for a certain task. The user is interested in the network load, incoming traffic and outgoing traffic during a certain amount of time. It also has a requirement in the bandwidth. For example, the submitted running job could be a social network application. There are a lot of interactions between the server and the users. By observing the network load, we can gain some insights about when the users are the most active/inactive and when the server would have the most burden, etc.
Job ClassAds
This section describes the corresponding Job ClassAds that advertises the user job's preferences and requirements. The list of Job ClassAds are demonstrated below:
- IPProtocol: This attribute indicates which IP protocol the users want to use with their network related jobs. There are two options for this attribute: "IPv4" and "IPv6". The attribute value is supposed to be string, which is double quoted characters. For instance, IPProtocol="IPv6".
- RequestBandwidth: This attribute indicates the required bandwidth the users want to have for the network bandwidth when their submitted jobs are running in the HTCondor pool on the matched machines. The job can be matched on some specific machine only if the machine can provide network bandwidth larger than the required value during the job execution. The attribute value is a real number, and the unit for bandwidth is Mbps. The unit is omitted. For example: BandWidth = 5.5, simply means that the required bandwidth is 5.5Mbps. In reality, RequestMaxBandwidth and RequestMinScheddBandwidth are used.
- NetworkAccounting: This attribute indicates whether the users want the HTCondor starter to invoke the network accounting functionality. The value could be TRUE or FALSE. For example: NetworkAccounting = TRUE. In real scenarios, NetworkLoad, NetworkIn, NetworkOut are provided.
- NetworkSetup: This attribute determines how to setup the network for the purpose of network accounting when the job is running in HTCondor pool. The value can be "Bridge" or "NAT". The users can use InboundConnectivity (True/False) and OutboundConnectivity (True/False) to advertise its preference on the network setup.
- PreferVLAN: This attribute determines which VLAN the users want their jobs to run in. The value of this attribute is string. There should be some predefined VLAN names that are known to the users. For instance, VLAN = "CMS" means the user wants to have its job running on a machine in the CMS network. In this case, only the machines in this specific VLAN are possible to be matched to run the job.
- PreferDomain: This attribute indicates the preferred top-level domain name corresponding to the IP address of the machine that runs the users��� submitted jobs. For instance, Domain = "hcc.unl.edu" indicates that user prefer to use the machines from Holland Computer Center; Domain = "cs.uw.edu" indicates that user prefer to use the machines from CS department of UW.
- SelfDomain: This attribute just advertise the domain name of the user machine where jobs are submitted. The rank and requirement expression can utilize this attribute to indicate different priorities for jobs coming from different sites.
- Latency: This attribute indicates the network latency the user submitted job could be tolerant of. The user prefers to run jobs in a network with latency lower than this value. This attribute should be a real number with unit as second. For instance: Latency = 0.05, means the preferred latency is less than 50ms.
Machine ClassAds
To make it possible for HTCondor to have network related policy integrated when scheduling submitted jobs, we also need to design the corresponding Machine ClassAds that adequately advertises its network related attributes. There are some repetitions or overlaps between the Machine Ad and Job Ad. For the purpose of completeness, we also list those attributes in this section. The list of Machine ClassAds are demonstrated below.
In the actual machine ad, the attributes should be prefixed with "Lark".
- IPProtocol: This attribute indicates which IP protocol the machine is using. The job should have some level of outbound connectivity on the IP protocol specified.
- Possible values: "IPv4", "IPv6" or "IPv4, IPv6".
- Attribute Type: string list.
- AvailBandwidth: This attribute indicates the available bandwidth for the network in which the machine is. The machine can be matched on some specific jobs only if the machine can provide network bandwidth larger than the required value during the job execution. For example: BandWidth = 5.5, simply means that the required bandwidth is 5.5Mbps. In reality, AvailMaxBandwidth and AvailMinScheddBandwidth are used.
- Attribute Type: float, in Mbps.
- NetworkAccounting: This attribute indicates whether the machine can provide network accounting functionality. If enabled, the attributes NetworkLoad, NetworkIn, NetworkOut are provided as updates from the starter to the schedd.
- Attribute Type: boolean.
- VLAN: This attribute indicates which VLAN the machine is in. The value of this attribute is string. There should be some predefined VLAN names that are known to the users. For instance, VLAN = "CMS" means that only when the attribute PreferVLAN = "CMS", the machines can be matched to run the user job.
- Latency: This attribute indicates the network latency the where the machine is in. The attribute should be a real number with unit as second. For instance: Latency = 0.05, means the preferred latency is less than 50ms. The machine can be matched to the job that requires latency larger than this value.
- Attribute Type: float, in units of seconds.
- NetworkType: The type of routing for the IP protocol.
- Possible values: "bridge" or "NAT".
- Attribute Type: string
- AddressType: The method for determining the network address for the job-internal virtual network device. DHCP indicates to get the address via dhclient (NetworkType must be set to "bridge" in this case). "Local" indicates the address will only be valid locally and determined via Unix lockfiles (NetworkType must be set to "local"). If set to "static", then "LarkInnerAddressIPv4" must also be set and "LarkInnerAddressIPv6" may be set. The "static" setting is only meaningful for bridge networks.
- Possible Values: "local", "dhcp", or "static"
- Attribute Type: string
- StartupScript: The path to a script which will be run prior to starting the job. Any local customizations of the network devices should be done here.
- Attribute Type: string; path to an on-disk executable.
- Notes: The script will receive the machine classad via stdin. No additional arguments will be provided. If the script returns non-zero, the job will not be executed.
- CleanupScript: The path to a script which will be run after the job finishes. Any local customizations of the network devices should be undone here.
- Attribute Type: string; path to an on-disk executable.
- Notes: The script will receive the machine classad via stdin. No additional arguments will be provided. If the script returns non-zero, it may be run again by the startd, but the job will continue. TODO: What are the correct failure semantics?
- BridgeInterface: The name of a local ethernet interface to add to the bridge.
- Attribute Type: string
- IptableName: The name of an IPTable chain, which may be used by the StartupScript for customizing job policies. All packets going to and from the job will pass through this chain.
- Attribute Type: string
- ExternalInterface: The name of the system-level virtual interface corresponding to the starter.
- Attribute Type: string
- InternalInterface: The name of the internal starter virtual interface.
- Attribute Type: string
- NetworkAccounting: Whether network accounting is active for this starter.
- Attribute Type: bool
Network Related Policy
In this section, we talk about the network related policy for HTCondor. More accurately, we demonstrate the designed policies corresponding to the user case examples mentioned above in details.
The following three policies match the user case examples respectively:
- Since the incoming job only has requirement on the bandwidth, the policy here is kind of simple. As long as the machine has a larger bandwidth than the user request, it can be matched to the submitted job.
- The machine can be matched to the job if it supports IPv6 and is in the same VLAN the user requests. If the machined is matched to the job, the attribute value of InboundConnectivity and OutboundConnectivity will be used to setup the network. (In this example, bridge is preferred.)
- The machine can be matched to the job if it has a larger bandwidth than the job requires. Also, since the job asks for network accounting, the machine running the job would invoke the network accounting module in the corresponding daemons of HTCondor and report the network load, incoming traffic and outgoing traffic information.