Document Moved to Google Drive

As part of an experiment with Google Drive, this document has been moved to https://docs.google.com/document/d/1BgKK9aHWPJGiePl4Kcv-cl1bbm8MPShG5d8gUf38N_s/edit?usp=sharing

If you think you need edit access, contact Alan (adesmet@cs)

See GoogleDriveWisdom for more on this experiment.

.

.

.

.

.

.

.

.

.

.

.

.


Design Doc: Mixed Mode IPv4/IPv6

#3538: Support mixed IPv4/IPv6 mode

In progress.

Author: Alan De Smet (adesmet@cs).

Goal

Be able to run an HTCondor with some nodes only speaking IPv4, some only speaking IPv6, and some speaking both. Matchmaking must avoid matching an IPv4-only computer to an IPv6-only computer.

Assumptions

Many of these are things we would hope to change in the future, but are not required for the initial implementation of mixed-mode.

Open design questions

Problem: Sinful versus condor_sockaddr

We currently have 2 objects that represent an address, Sinful and condor_sockaddr. Both can serial/deserialize a Sinful string.

Sinful is a relatively simple set of key/value pairs (plus an unlabelled basic address). It doesn't know how to convert things to system API structures (sockaddr_in/sockaddr_in6), it's really just storing a pile of strings. On the up side, it knows how to store CCB addresses, private network addresses, and shared port socket information. It is also easily extensible. It knows nothing about IPv6 addresses, although an IPv6 addresses can probably be passed through the object safely since they are just another string.

condor_sockaddr essentially wraps sockaddr_in/sockaddr_in6. It only knows how to hold a single address; CCB, shared port, private networks, and the like don't work.

The two were created in parallel around the same time. Code tends to use with one or the other. I'm slightly mystified that the system is working as well as it does.

These need to be unified. Mixed mode is going to require advertising at least 2 different addresses simultaneously, and connection attempts may want to try both. We want to be future proof against multiple addresses per protocol in the future. This points to using Sinful as our exclusive or nearly exclusive object for passing around contact information, relegating condor_sockaddr to low level, last minute work (to take advantage of the sockaddr_in/sockaddr_in6 support. Probably keep condor_sockaddr to holding a single address. I believe condor_sockaddr has permeated higher levels than it should have given.

Plan

  1. Dual mode collector
    1. 1-2 weeks 2013-07-08 Teach collector to respond to commands on both IPv4 and IPv6. (Likely side effect: all daemons do.)
    2. 0-1 weeks 2013-07-15 Teach collector to hold both IPv4 and IPv6 ads. (Probably no work at all.)
    • Not required: advertising both addresses (see next step)

  2. ?? ????-??-?? Resolve Sinful/condor_sockaddr *::

  3. 1-3 weeks ????-??-?? Dual address sinful strings
    • Allow encoding both IPv4 and IPv6 addresses in a sinful string. Needs to simultaneously work with CCB and shared port. If we bother with backward compatbility, make the IPv4 address the backward compatible one.
    • May need to support different port numbers for different addresses; trying to synchronize port numbers across TCP and UDP for both IPv4 and IPv6 seems like a pain in the ass.
    • Design such that if in the future we wanted to have arbitrarily long lists of addresses, we could do so while maintaining backward compatbility (older versions would just use the head of a list). This seems likely to be functionality we'll want in the future, and while we're messing with the sinful strings in the time to ensure the future-proofing is done.
    • Rejected: MyAddress (implicitly IPv4) and new attribute MyAddressIPv6. It means multiple places need to look in multiple attributes to find the appropriate address. We want to be able to hand a sinful string around and assume it's everything you need to contact someone. This is the same decision we made for CCB and shared port.

  4. 1-3 weeks ????-??-?? Teach collector to advertise both IPv4 and IPv6 addresses. (Likely side effect: all daemons do.)
    • If not implemented in previous step, the code that identifies a daemon's own MyAddress should now identify both addresses if appropriate. This may have been done as a side effect of the "Dual address sinful strings" step.

  5. 1-3 weeks ????-??-?? Teach startd to listen on and advertise both IPv4 and IPv6 addresses. (Likely side effect: all daemons do.)
    • Likely just done at the DaemonCore level where the attributes are automatically added. Indeed, likely even lower, at the "Gimme a sinful string for myself" level.
    • There will be breakage at this point, as the negotiator begins trying to match IPv4-only hosts to IPv6-only hosts.

  6. 0-2 weeks ????-??-?? Teach schedd to listen on and advertise both IPv4 and IPv6 addresses. (Possibly done by "teach startd" step.)
    • There will be breakage at this point, as the negotiator begins trying to match IPv4-only hosts to IPv6-only hosts.

  7. Teach negotiator to make matchmaking decisions based on IPv4 and IPv6 capabilities of the involved parties.
    1. 1 week ????-??-?? New function: CanCommunicate, takes two sinful strings, returns true or false.
    2. 2 weeks ????-??-?? Negotiator implicitly adds CanCommunicate to Requirements for the Machine (or the Job; whichever).
      • Rejected: Explicitly add. Risks trashing autoclusters, as every machine and job has a unique sinful string.
    3. Rejected: add SpeaksIPv6=TRUE; SpeaksIPv4=TRUE then adding to Job/Machine adds (Requirements=SpeaksIPv6 && TARGET.SpeaksIPv6)||(Requirements=SpeaksIPv4 && TARGET.SpeaksIPv4). Not backward compatible. Could assume anyone lacking both only speaks IPv4, but breaks existance IPv6 pools. Not future proof: no hope for subtle things like "are these hosts on the same subnet and thus able to use link local IPv6?"

  8. 0-2 weeks ????-??-?? Teach other daemons to listen on and advertise both IPv4 and IPv6 addresses. (Possibly done by "teach startds" step).

  9. 1-3 weeks ????-??-?? Gracefully handle impossible connection requests.
    1. Daemons that only handle one protocol shouldn't ever have a reason to connect to the other. But just in case, ensure the failure is graceful: complain to the log and carry on. EXCEPTing is absolutely not acceptable.
    2. Command line tools could easily be asked to do the impossible. "condor_q -name ipv6-only-host.example.com" on an IPv4 host, for example. We need to give a reasonable message to the end user.