Page History

Turn Off History

Standard Universe

Notes from Peter Keller's 2011-11-30 brain-dump.

libsyscall

The standard universe requires 'condorized' binaries, which are "statically-linked" binaries (see below for the standard universe definition thereof) created by condor_compile, which, despite the name, is a linker, rather than a compiler. It creates binaries with the API stack on the right.


normal                condorized
---------------       ------------------
| application |       |  application   |
|             |       |   |------------|
|             |       |   | libsyscall |
---------------       |------------|   |
|    libc     |       |    libc    |   |
---------------       ------------------
|   kernel    |       |   kernel       |
---------------       ------------------

The interposition layer, libsyscall, comes before libc on the link line; the application thus calls its functions in preference to libc's. (Note that libsyscall does not override every libc function; only those necessary for checkpointing and/or remote I/O.)

When starting a standard universe job, the standard universe -specific shadow establishes a connection with the startd. This connection is inherited by the standard universe -specific starter, which in turn bequeaths it to the application (specifically, libsyscall). The starter's connection is used for startup, shutdown, and suspend/resume. All other communication to the shadow comes from libsyscall.

There may be machinery in libsyscall to make it signal-atomic.

"Statically-Linked"

Because the OS-provided information on (in-memory) segment boundaries is frequently (but inconsistently) wrong, the checkpointer must guess them. Because it runs in the signal handler, it is impossibly tricky to handle segfaults; therefore, it must guess them correctly. To make this possible, a standard universe application must keep all its voilatile data on the stack or in the .data segment, and it must be laid out in memory as follows.


---------------
| environment |
|-------------| <- guessed; may be an unmapped page here
|  the stack  |
|             |
|--\/\/\/\/\/-| <- guessed
|             |
|-/\/\/\/\/\--| <- sbrk
|    .data    |
|-------------|
| .bss&.bzero |
|-------------| <- _data_start (linker symbol)
|    .text    |
---------------

This layout is known to Linux as the "vm compat" personality. To force an application to run this way, you can use the setarch program. (To run a standard universe program in stand-alone mode, setarch <arch> -L -B -R.) The standard universe starter will otherwise take care of this.

These general requirements have some specific consequences:

No calls to mmap().

Because the kernel lies about segment boundaries, we can't permit calls to mmap(). (With one exception: requests for anonymous segments can be satisfied by sbrk().) We can't just intercept calls to mmap() and record the results because ldd has its own mmap() implementation. This implies the following.

No dynamic linking.

Obvious, but also includes calls to dlopen(). This implies patches to Ulrich Drepper's glibc, because it likes to dlopen() libnss (to get different resolvers). The patches allows glibc to be built entirely statically, but necessitate our own copy.

The gcc and g++ runtimes must also be static.

This is implied by the above, but worth calling out because it can require you to recompile the compiler. Many distributions now have (optional) packages that include the static runtimes, instead.

"VM compat" layout.

As mentioned above, the application must be laid out in memory in a specific way; in particular, the .bss (and/or .bzero) segments must (directly) abut the .data segment. This used to always be true; hence the 'compat' in the personality name.

No VA randomization or ExecShield.

Both virtual address randomization and ExecShield (which used to be but no longer are the same thing, apparently) screw up the required in-memory layout and can not be used.

Consistent .vdso section address.

The .vdso segment -- used by Linux to speed system calls -- is not checkpointed (being kernel memory space) but must be mapped at the exact same address on restart.

RPCs