Page History

Turn Off History

Migration of CHTC from EL 6 to EL7

Every few years, when CHTC decides to migrate execute machines from one OS to another, we try to go to great effort to lessen the burden on users. This page tries to document the EL6 to EL7 migration to memorialize these issues for subsequent transitions in CHTC and for other sites.

Requirements and Assumptions

The transition must be gradual, and staged, so that users with old jobs that cannot run on the old OS can run to completion on a subset of the pool at the same time that users with jobs that can only run on the new OS can make progress.

If a user specifies nothing extraordinary, their jobs will run on the "default" OS, which initially is the old, and at some point in time is switched to the new.

During the transition, some jobs will only run on the old os, some jobs only on the new, and some (many?) on both, and the user needs to be able to opt into any of these three cases.

This transition applies to

The HTCondor religion is that execute nodes advertise what they are, and it is the responsibility of the jobs to select what the need. If a job selects the wrong OS to run on, that's on the job, the startd should not care. However, pragmatically, we may need to implement policy on the execute node in the case where it is the only place that CHTC has control.

The problem with defaulting

Ideally, we'd just ask the users to put a clause in their requirements expression, selecting a particular platform, e.g.

requirements = OpSysMajor == 7

or

requirements = ((OpSysMajor == 7) || (OpSysMajor == 6))

However, there are a couple of problems with that. The first problem is defaulting. When a has requirements that don't mention OpSysMajor, the submit side needs to add in a clause picking one. This is what condor_submit does today with OpSys and Arch. This is hard-coded into the condor_submit executable, and is difficult to replicate in configuration, but it is possible, though ugly. We can simulate this today in config which looks for the string OpSysMajor in requirements, and if it is there, we can assume that the job is correctly selecting a version. One can construct all kinds of expressions for which this isn't true, but it should work in practice, as it does for OpSys and Arch.