Ticket #1135: "Real" dynamic binary releases

The "dynamic" binaries we release today aren't really. Only libc is dynamically linked. Everything else is static. This tickets describes the motivations and challenges to make Condor dynamically linked:

Motivations to Keep Condor Static (the status quo)

Motivation to Make Condor Dynamically Linked

All Externals needed by Condor 7.6.x and 7.7.x

blahp
  part of glite, with custom modifications
  need to continue building for now
boost 1.39.0
  available on rhel6 (1.41.0-11.el6)
  available on rhel5 (1.33.1-10.el5)
  not installed on nmi's rhel3 machines
  available on debian 5.0 (1.34.1-14 )
  available on debian 6.0 (1.42.0-4)
  not available on mac os 10.5
  not available on mac os 10.6
coredumper 2011.05.24-r31
  not available on rhel6, rhel5
cream
  part of glite, not available on any official platforms
  maybe in future platforms
curl 7.19.6
  available on rhel6 (7.19.7-16.el6)
  available on rhel5 (7.15.5-9.el5_6.2)
  available on rhel3 (7.10.6)
  available on debian 5.0 (7.18.2)
  available on debian 6.0 (7.21.0)
  available on mac os 10.5 (7.16.3)
  available on mac os 10.6 (7.19.7)
drmaa 1.6
  probably need to continue building ourselves
expat 2.0.1
  available on rhel6 (2.0.1-9.1.el6)
  available on rhel5 (1.95.8-8.3.el5_5.3)
  available on rhel3 (1.95.???)
  available on debian 5.0 (2.0.1-4)
  available on debian 5.0 (2.0.1-7)
  available on mac os 10.5 (2.0.0)
  available on mac os 10.6 (2.0.1)
glibc
globus 5.0.1
  available on rhel6, via epel
  available on rhel5, via epel
  available on debian 6.0
  We make a couple modifications to Globus, which we should push
  upstream. Until then, we may probably need to keep building ourselves
gsoap 2.7.10
  available on rhel6 (2.7.16-2.el6), via epel
  available on rhel5 (2.7.13-3.el5), via epel
  available on debian 5.0 (2.7.9l-0.2)
  available on debian 6.0 (2.7.9l-0.2)
  not installed on nmi's rhel3
  not available on mac os 10.5
  not available on mac os 10.6
hadoop 0.21.0
krb5 1.4.3
  available on all linux/mac, library versioning limits binary portability
  (specifically, rhel3->rhel4)
  available on rhel6 (1.8.2-3.el6, 1.9-9.el6)
  available on rhel5 (1.6.1-55.el5_6.1)
libdeltacloud 0.8
  not available on any offical platforms
libvirt 0.6.2
  available on rhel6
  available on rhel5 (0.2.3-9)
  available on debian 5.0 (0.4.6-10)
  available on debian 6.0 (0.8.3-5)
libxml2 2.7.3
  available on all platforms we care about, except rhel3 (where it's too
  old)
openssl 0.9.8h
  available on all linux/mac, library versioning limits binary portability
pcre 7.6
  available on rhel6 (7.8-3.1.el6)
  available on rhel5 (6.6-6.el5_6.1)
  not installed on nmi's rhel3 machine
  available on debian 5.0 (7.6-2.1 )
  available on debian 6.0 (8.02-1.1)
  not available on mac os 10.5
  libraries available on mac os 10.6, but no headers (7.9 2009-04-11)
postgresql 8.2.3
  available on rhel6, rhel5, deb5, deb6
  now only used for a contrib module
qpid 0.8-RC3
  available on rhel6
  not available on rhel5
unicoregahp
  not available anywhere, it's our code
voms 1.9.10_4
  available on rhel6 (1.9.19.2-2), via epel
  available on rhel5 (1.9.19.2-1.el5), via epel
  available in debian squeeze (unstable)
  part of glite
wso2 2.1.0
  not available on rhel6,rhel5
  only used for a contrib module
zlib 1.2.3
  available on all linux/mac

Proposal

There exists two viewpoints concerning shared libraries and Condor. The first viewpoint is that the external dependencies of Condor, like kerberos, openssl, etc, should be shared libraries against which Condor links. The second viewpoint is that Condor's internal organization itself should consist of shared libraries, so there would be libdaemoncore.so, libcondor.so, etc, etc. The former will be known an External Shared Libraries, the latter as Internal Shared Libraries. Some challenges we face are common to both, and some unique to each.

External Shared Libraries

  1. Default Proper Builds: For libraries which are pervasive and important such as kerberos, stop building them at all; instead build against the system-wide ones.

  2. RPATH: Set the RPATH in Condor to search $ORIGIN/../lib or possibly $ORIGIN/../lib/condor.

    • Open question: Should our custom search path come before or after the system-wide libraries? Do we even get a choice? If before, libraries we trust to work are used by default, but they are more likely to be out of date or lack useful local system changes. If after, we're likely to be more up to date and have useful local system changes, but we can't be as confident in their stability.
    • RPATH needs to be fully understood--especially the security and political ramifications of using it.

  3. LD_LIBRARY_PATH: Build libraries we use as shared libraries. Install them to lib/condor (relative to the root of a Condor install). We use lib/condor instead of just lib because some installs, say from RPM or deb, will make the root /, and our libraries may conflict with standard system-wide ones.

  4. Late Binding: We write a wrapper around the usage of the library and when we need it, we dlopen/dlsym/etc the function. We'd specify in the configration file the locations of the shared libraries specific to each external.

  5. Hybdrid Option: We could mix and match the above options per external depending upon our requirements.

Internal Shared Libraries

There are two ways we can spplit internal libraries:

  1. Into tiny little pieces like libcondorio.so, libcondorconfig.so, libcedar.so, etc
  2. Into daemoncore and nondaemoncore libraries.

Proposal

The internal architecture of Condor will be split into three pieces:

  1. New Classads
  2. Daemoncore
  3. Everything that isn't daemoncore

The reason for this proposal is that it will be easier to maintain just three internal libraries. It turns out there is a hefty amount of work to detangle the internal libraries from each other which really isn't worth it in practice at this time.

We propose the discovery method of the internal libraries should be RPATH, or LD_LIBRARY_PATH, depending upon use case.

Down sides

Use Cases

Unexplored Ideas/Tasks

Upgrading Condor

The recommended practices for upgrading Condor will have to change away from the current model of "just copy over the binaries". This is because if the daemons were running, we can't atomically copy over all of the binaries and internal shared lirbaries in such a manner as a Condor process being invoked sees a consistent view of its internal text segments.

Also let it be known that it will be more difficult to provide custom built test binaries to people since the discovery method of the shared libraries might be different in different contexts.

See also

User feedback

Con

Pro

Derived Tickets

These tickets already have a different parent in the tracking system, but also need this ticket as the parent.

Milestones

[Append remarks]

Remarks:

2010-Jan-25 14:15:11 by adesmet:
The bit about "$ORIGIN/../lib/condor" is a later addition. Alain Roy reminded us about our RPM installs which install into /usr/{s}bin. Searching /usr/lib would be bad; we shouldn't install our carry along libraries there since many (kerberos, Globus) could conflict with distribution provided ones. The easiest solution would seem to be to use lib/condor (relative to our install root). That works in both cases.


2010-Jan-25 18:03:21 by adesmet:
Current state: I'm sending out queries to condor-devel, condor-users, and OSG for feedback.


2010-Feb-05 11:46:38 by tstclair:
The afore mentioned example libboost example is a complete farse. More often then not you will hit library issues, in which case it's advised to stay current with what a distro releases and avoid using local libs where possible. It is rare that distro's break binary compatibility.


2010-Feb-05 15:13:22 by adesmet:
I think the arguments in the Debian article aren't directly relevant to us. I believe their concern is that the RPATH will be set to various system locations (/usr/bin, etc) when it shouldn't. A reasonable concern, especially since libtool likes doing it, but we're not proposing that. It might be relevant in that there may be unusual libraries in $ORIGIN/../lib/condor. We could hit a problem if users start putting other libraries in that directory, to which I say, "If it hurts when you do that, stop doing it." The other concern is that perhaps our library's dependencies will conflict with what the OS provides, and increased use of dynamic libraries while simultaneously carrying our own baggage could cause problems. libcrypto (from OpenSSL) depends on libz. Our goal is to use the system libcrypto, but to use our own copies of libraries in general. libz is a libary we carry along and may link dynamically in the future, sticking our local copy into $ORIGIN/../lib/condor. We ship libz.so.1.2.3 and load it in, but we also load the system OpenSSL, which may be expecting libz.so.1.4.0. Two different libz's in memory is a recipe for madness.

This might be an argument for using the system libz, especially if libcrypto requires libz in all cases; thus is must be present if libcrypto is present. But what if the next release of OpenSSL adds a dependency on libpcre or libcurl for currently unforeseen reason? We drag those libraries along as well, and they're currently not quite as reliably present.

Of course, this really has nothing to do with RPATH and instead is the price of dragging along copies of relatively common libraries. We can hit the same problem if we just dynamically link libcrypto but statically link everything else like we do today.


2010-Apr-27 10:38:04 by tstclair:
What is going on with this?
2010-Oct-20 16:03:30 by jfrey:
Bulk change of target version from v070504 to v070505 using ./ticket-target-mover.
2011-Jan-27 14:46:04 by danb:
Bulk change of target version from v070505 to v070506 using ./ticket-target-mover.


2011-Jan-28 08:28:29 by tstclair:
We only release dynamic builds now. Updating to assign to jfrey for comments and close out.
2011-Feb-01 14:49:30 by tannenba:
Bulk change of target version from v070506 to NULL using ./ticket-target-mover.


2011-Feb-01 17:37:04 by bbockelm:
Can this ticket be closed? It's implicitly been done in the PROPER build in cmake. If someone want beautifully linked libraries, you can currently build your own.

Two comments about the contents of the ticket though:


2011-Feb-02 08:38:38 by tstclair:
Brian is correct, it should be easy to build normal PROPER packages now (for anyone). Does UW still want to be in cross platform binary packaging business. It is inherently dangerous, with questionable benefits at best.


2011-May-31 15:38:10 by tstclair:
So for note, re discussion on 5/31/2011. The conversation around RPATH would be limited to UW-Tarballs only and not proliferate to .rpm || .deb packages, which would still use the existing native mechanisms.

While distro maintainers will still use PROPER.


2011-May-31 15:55:01 by psilord:
Brian, the reason this ticket is not being resolved is because we're proposing making our tarball builds use real dynamic binaries. There are a lot of issues around doing that and this ticket will track all of them.


2011-Jun-07 10:42:15 by psilord:
Due to thrashing, I did not finish typing in my information. I believe I can finish it today.


2011-Jun-07 15:57:56 by psilord:
I've written the lion's share of what we spoke about in the meetings into this document.


2011-Aug-08 13:20:11 by tannenba:
Jaime and Cathrin need to talk so that this work and the work in #2132 are coordinated.
[Append remarks]

Properties:

Type: enhance           Last Change: 2012-Oct-17 12:16
Status: new          Created: 2010-Jan-25 13:45
Fixed Version:            Broken Version:  
Priority:          Subsystem:  
Assigned To: jfrey           Derived From:  
Creator: psilord  Rust:  
Customer Group: other  Visibility: public 
Notify: psilord@cs.wisc.edu, jfrey@cs.wisc.edu, cweiss@cs.wisc.edu, tstclair@redhat.com  Due Date:  

Derived Tickets:

#374   dlopen Kerberos
#501   Turn the Condor static libraries on Windows into DLLs
#1021   libvirt version mismatch from distro (x,y,z)
#1136   Eliminate "static" releases of Condor
#1874   Use OS-provided security libraries
#2083   Dynamically link libvirt in UW builds
#2378   Standalone checkpointing test failing on 32-bit rhel5
#2389   Dynamically link Globus and VOMS libraries in UW builds
#2390   Dynamically link ClassAds library
#2482   Dynamically link externals on Mac OS X
#2539   Alter rpath to support multiple Condor installations
#2627   Rhel 3 failures with dynamic kerberos libraries

Related Check-ins:

2011-Sep-09 11:14   Check-in [27140]: Fix file permissions on system libraries in release tarballs. #1135 On opensuse, the system's libssl and libcrypto shared libararies don't have owner write permission enabled. All files in the Condor release need to have this permission enabled. (By Jaime Frey )
2011-Sep-08 14:01   Check-in [27127]: Include system libraries in release tarballs. #1135 To make it easier to run Condor on different flavors of linux, we now include the system security libraries from the build machine directly in the release tarballs, under lib/condor/. The rpath in the binaries now puts /lib[64] and /usr/lib[64] before [...] (By Jaime Frey )
2011-Aug-15 22:10   Check-in [26789]: Don't set RPATH when making native packages. #1135 We don't need it for native packages and we're setting it wrong for some platforms (should be $ORIGIN/../lib64/condor). (By Jaime Frey )
2011-Aug-03 15:38   Check-in [26610]: Fix extraction of systemlibs tarfile in test glue. #1135 (By Jaime Frey )
2011-Aug-02 15:55   Check-in [26608]: Save system libraries when building and use them when testing. #1135 Save the system's openssl and kerberos libraries as part of the build results. When testing, extract these libraries into <release>/lib/condor if they can't be found on the system. (By Jaime Frey )
2011-Aug-01 11:23   Check-in [26606]: Set rpath in to $ORIGIN/../lib/condor in UW builds. #1135 As we move to dynamically linking more OS-provided libraries, we'll run into cases where the necessary libraries are missing, due to packages not being installed, not being available, or having incompatible versioning. This lets us supply libraries [...] (By Jaime Frey )
2011-Jul-26 15:32   Check-in [22595]: Revert "Build libvirt external only on rhel3. #1135" This reverts commit a89ca6485e51f5f75f02019af771364b0796defa. rhel 5 and debian 5 don't appear to have standard libvirt-devel packages, so we still need to build our own on these platforms. (By Jaime Frey )
2011-Jun-17 10:51   Check-in [22222]: Remove zlib external, phase 2. #1135 The zlib external is now gone. For standard universe, we copy the system's libz.a for ues as libcondor_z.a. Everything else dynamically links with the system libz. (By Jaime Frey )
2011-Jun-13 13:50   Check-in [22221]: Remove zlib external, phase 1. #1135 Disable building of zlib on all platforms. (By Jaime Frey )
2011-Jun-13 13:50   Check-in [22220]: Build libvirt external only on rhel3. #1135 All of our other linux platforms have libvirt available as an OS-provided package. (By Jaime Frey )