{section: Standard Universe} Notes from Peter Keller's 2011-11-30 brain-dump. See also CondorSyscallLibCommandLine {subsection: libsyscall} The standard universe requires 'condorized' binaries, which are "statically-linked" binaries (see below for the standard universe definition thereof) created by _condor_compile_, which, despite the name, is a linker, rather than a compiler. It creates binaries with the API stack on the right.
normal condorized --------------- ------------------ | application | | application | | | | |------------| | | | | libsyscall | --------------- |------------| | | libc | | libc | | --------------- ------------------ | kernel | | kernel | --------------- ------------------The interposition layer, _libsyscall_, comes before libc on the link line; the application thus calls its functions in preference to libc's. (Note that libsyscall does not override every libc function; only those necessary for checkpointing and/or remote I/O.) When starting a standard universe job, the standard universe -specific shadow establishes a connection with the startd. This connection is inherited by the standard universe -specific starter, which in turn bequeaths it to the application (specifically, libsyscall). The starter's connection is used for startup, shutdown, and suspend/resume. All other communication to the shadow comes from libsyscall. There may be machinery in libsyscall to make it signal-atomic. {subsection: "Statically-Linked"} Because the OS-provided information on (in-memory) segment boundaries is frequently (but inconsistently) wrong, the checkpointer must guess them. Because it runs in the signal handler, it is impossibly tricky to handle segfaults; therefore, it must guess them correctly. To make this possible, a standard universe application must keep all its voilatile data on the stack or in the .data segment, and it must be laid out in memory as follows.
--------------- | environment | |-------------| <- guessed; may be an unmapped page here | the stack | | | |--\/\/\/\/\/-| <- guessed | | |-/\/\/\/\/\--| <- sbrk | .data | |-------------| | .bss&.bzero | |-------------| <- _data_start (linker symbol) | .text | ---------------This layout is known to Linux as the "vm compat" personality. To force an application to run this way, you can use the _setarch_ program. (To run a standard universe program in stand-alone mode, _setarch
(16:29:56) Pete Keller: Compression of checkpoints is handled by using an alternate memory heap that is allocated by a raw call to syscall(SYS_mmap, ...). The alternate heap is not checkpointed, restored, or book kept. (16:31:41) Pete Keller: When restarting a checkpointed process, we move to an alternate stack defined in the bzero segment so we can restore the STACK segment properly and still be able to have a stack of our own to finish the work. (16:32:58) Pete Keller: Also, when resuming, we resume, via the longjmp() INTO the signal handler of the previous control flow, then return from the signal handler back to the regular code. (16:34:36) Pete Keller: A caveat: Glibc encrypts the stack pointer in the jmpbuf structure. The macros PTR_ENCRYPT and PTR_DECRYPT() in machdep.h deal with the nastiness. (16:34:50) Pete Keller: I forgot to mention why the test programs are what they are. :( (16:35:28) Pete Keller: Most of them are commented as to what they test. the job_rsc-all-syscalls_std.c test is a VERY IMPORTANT one that does a unit test of all remote i/o or other interposed libc calls. If anything fails in there, it is very bad. (16:35:45) Pete Keller: The job_rsc_* and job_ckpt_* tests are for stduniv. (16:37:00) Pete Keller: The "sanity" ckpt test seemingly doesn't test anything. However, that program alone is responsible for finding a hugh number of bugs. It just happens to tickle a lot of subsystems in glibc, even though it is fucking simple. Don't ever remove that one from the test suite even if you don't know what it does. :) (16:37:15) Pete Keller: All tests are there for a reason at one time or another. :) (16:37:54) Pete Keller: Most of what the tests test are obvious and decently commented. A few are redundant, but that's ok. (16:38:38) Pete Keller: A big thing to realize in the testing of stduniv, is that it is only reliable to a statistical confidence value. running a test 10 times and seeing it succeed doesn't mean squat. You must see it run 10 million times and in changing environments. (16:39:33) Pete Keller: stduniv is nearly undebuggable, the only means to truly know if it works is to determine if the test program did the correct thing (ALWAYS via external verification to the test program) and then run it in the millions of times to ensure you don't shake out crazy little segfaulting bugs. (16:40:17) Pete Keller: The current set of tests do a pretty good job of it, though.