Full ports are unique explorations into the unknown--have a bottle of Absinthe and a low moral standard. This is a record of the work necessary to perform a full port of HTCondor to RHEL 6 x86_64. I started with the full port from RHEL 5 x86_64. --------------------------------- Section 0: Perform a clipped port --------------------------------- Do whatever is necessary to make this happen. ---------------------------------------------- Section 1: Preparation of a new glibc external ---------------------------------------------- ******************************* Locate and see how glibc builds ******************************* We must locate the glibc revision that comes standard with the platform. This is different than locating the "official" glibc release since which is distro independent. It turns out each linux distro patches glibc according to its wishes and we need to preserve those patches. Historically, the is enough difference between the x86 and x86_64 patches or builds that the same external CANNOT be used for both. It is very time intensive to discover if a vendor patched glibc source tree can be built/run properly on the another architecture. 1. Find the glibc source rpm associated with rhel 6 x86_64. yumdownloader --source glibc "Install" it as a regular user, not root: rpm -ivh glibc-2.12-1.25.el6.src.rpm 2. Prepare it so the patches are produced saving the output: cd /home/psilord/rpmbuild/SPECS rpmbuild -bp glibc.spec 3. Save the resulting tarball: cd /home/psilord/rpmbuild/BUILD cp -Rp glibc-2.12-2-gc4ccff1 glibc-2.12-2-x86_64 tar czf glibc-2.12-2-x86_64.tar.gz glibc-2.12-2-x86_64 4. Now, build the glibc source package in order to find out how it is being configured. The purpose of this is we have to compile glibc the same way in our externals. Or at least as close as we can get it. We intend to discover the configure line which builds glibc itself. cd /home/psilord/rpmbuild/SPECS rpmbuild -bc glibc.spec |& tee build.out ************************* Create the glibc external ************************* 0. Figure out how cmake detects the GLIBC_VERSION variable, make sure it is right. 1. Edit CMakeLists.txt to add a new case for the detected GLIBC_VERSION. Ensure no conflicts with previous version numbers happen. Fill it in similarly to the last recent one, taking into consideration the new configure flags determined by inspection of build.out from the previous step. You may need to comment out the GLIBC_PATCH step for now. 2. Create a directory in the externals based upon the name of the GLIBC_VERSION. Your patches, if any, will go here. 3. Place the required tarball into /p/condor/repository/externals, ensure it has permissions of 644. 4. Test build it with 'make glibc' after the build has been configured. **************************************************************************** Patch glibc so it honors --enable-static-nss & other features specific to us **************************************************************************** In the RHEL 5 port of HTCondor, we had patched glibc to make --enable-static-nss function and also turned off the ncsd support. This is because we needed a truly static binary, and the ncsd socket was kept permanently open which deferred check pointing permanently. So, we need to determine how much of this is still true and still needs patching in the new glibc. 1. It turns out the patchfile to enable static nss from glibc external 2.7-18 works for the rhel6 glibc. 2. Turns out glibc 2.12 apparently fixed the nscd bug so I don't need the patch. 3. TODO Add a patch which removes the warning about using nss function in statically linked executables. It sucks and just causes rust. [I never did this after I finished the port, alas.] ************************** Check glibc external build ************************** Ok, now we have to inspect the produced libraries and ensure that we are shipping everything we should be shipping. This include libc.a and a pile of resolver libraries like libnss_files.a, libnss_dns.a, and libresolv.a. Since glibc is an evolving library, these names may change or there might be additional libraries to ship along with it. So some of the above steps may be redone upon future knowledge discovery. *************************************************************** Start a build of HTCondor and see what breaks. Fix incrementally. *************************************************************** Ok, lots of stuff broke. The gist of what broke (prototypes being different, etc) looks to be because wherever we had GLIBC27 as a preprocessor check we likely also need GLIBC212. I need to check to see if the GLIBC27 behavior still applies to GLIBC212. 1. GLIBC212 addition steps: Every place I found a reference to GLIBC27, I checked to see if it was valid on GLIBC212 and added GLIBC212 if it was. Oh, it franks. That's quicker than I thought. Meh, onward ho. ********************************** Fixing the Test Suite so it builds ********************************** Problem 1: condor_compile/ld let through a pile of new flags that need dealing with. Such as --as-needed, --no-as-needed, -ldl, and -dynamic-linker <arg>. Also, -lm and -lcrypt can't be found in a static linking context. This one might be a distro problem. I can find it via other means although it is a hack to do so... NOTE: Hrm, it seem "yum install glibc-static" will provide them for me, this needs to be installed on any machine wishing to condor_compile and produce static binaries. After some hacking only in our ld, all of the tests compile. I was surprised by this since at this time there were no multiply defined symbol errors or missing symbol errors, which is often something that can go wrong at this stage of the port. --as-needed and friend, -ldl, -dynamic-linker <arg> all were gotten rid of since they deal with dynamic linking, which we aren't doing. ********************************************* Running the test suite and seeing what breaks ********************************************* ./batch_test -b -c -d . Problem 1: All the standard universe jobs go on hold with a version mismatch. That's curious. So, I've updated the logging information in the shadow log to be more explicit about the version when they fail so I can determine what's wrong. I added debugging info into the log line to be more verbose. SOLUTION: It seems the test suite was brain dead and was mixing my newly created binaries with binaries from the local install on the batlab machine which are decidedly not a native full port. I set my path correctly and the problem went away. Oh well, I'll still leave the debugging in there and maybe even make it more precise for the message that ends up in the jobad. Can't hurt to have more correct information in there... Problem 2: The test suite uses whatever HTCondor binaries it finds in its path instead of the ones you just built. The solution is obviously to put the ones you just built into the path. However, the hold reason from HTCondor about the version mismatch between the shadow and the stduniv job was poor, so I fixed it up to contain the version of the shadow, the job, and the full path to the shadow. Now, when a clueful user sees this, they can get a good idea of what went wrong and why. ******************************** Test Suite Passes. Code complete ******************************** Commit and push. Done.