HTCondorWiki: Standard Universe

 
 The .vdso segment -- used by Linux to speed system calls -- is not checkpointed (being kernel memory space) but must be mapped at the exact same address on restart.
 
-{subsection: RPCs}
+{subsection: Remote I/O}
+
+To perform remote I/O, libsyscall interposes itself between the application and libc.  The set of functions in libsyscall which duplicate (part of) the libc API are called switches, and generally behave identically.  (So much so that most of them can be automatically generated; see _stubgen_, below.)  They're called switches because they switch between four different behaviours based on the global state.  The global state can be either 'mapped' or 'unmapped' and either 'local' or 'remote'.  The latter should be obvious.  The former not only ensures that local and remote FDs don't collide, but simplifies checkpointing.  (Rather than try to change FDs in the application, libsyscall remembers enough about all the FDs it opened on the application's behalf to reopen them when the application restarts.  It might get different FDs, but it will ensure that the application never sees them.)
+
+When a standard universe job calls, say, open(), libsyscall will start in mapped-remote mode.  It will ask the virtual file table (see below) for the translated FD.  The virtual file table returns the translated FD (and, if the job's submit file specified that the file in question ought to be stored locally, switches the mode to local).  The open() switch then changes the mode to unmapped (having unmapped the FD) and calls itself again.  This time, it will take the unmapped-remote path and make an RPC.  Or, if the file was specified to be local, the unmapped-local path; the unmapped-local path makes the corresponding system call directly.  The mode changes will be undone as the call-stack unwinds.
+
+In stand-alone mode, libsyscall defaults to mapped-local instead of mapped-remote, but otherwise functions identically.
+
+The remote system calls proper are implemented as 'senders' and 'receivers', which are almost entirely automatically-generated.  Senders exist in the starter and libsyscall; the receivers exist only in the shadow.
+
+Some switches can not be automatically generated, and are known as "special" switches.  For Linux, there are quite a few of these, mostly handling the [x|l|f]stat family, and others which have different kernel- and user- space data structures; other specials include functions normally implemented as macros; and some switches are special because they don't behave exactly the same as the normal system call.  (For example, gettimeofday() caches the offset between the local clock and the shadow's clock, because many applications use gettimeofday() to self-profile their inner loops.  Likewise, isatty() should always return false when running under Condor, but this can make some Fortran run-times behave oddly.)  There are very few "special" senders or receivers.  Normal switches, senders, and receivers are generated from a '.tmpl' file by /stubgen/.
+
+The new cmake dependencies are all screwed up, so when mucking about with this code, it's wise to make clean to ensure that all your changes are propogated everywhere.
+
+It's sometimes necessary to call glibc functions from the switches (rather than make a sytem call directly).  /stubgen/ supports this as a primitive, and will do horrible things with nm to extract (and rename) the libc function(s) as appropriate.
+
+A 'remap' defines a bunch of BS function names that glibc has a (weak) symbol for to make sure the proper switch is called.  (That is, all glibc entry points for the same function must call the same libsyscall function.)  Make having a 'real function' of the appropriate name that just wraps the switch, we ensure that the linker does what we want.
+
+The remote I/O system also implements 'pseudo' system calls, which are handled entirely by the shadow.  This includes the suspend and resume operations.
+
+Remote I/O RPC #s are defined in syscall_numbers.h and have nothing to do with what anybody else uses for system call numbers.  do_remote_syscall() is a big switch that calls the appropriate reciever.
+
+We turn off checkpointing when sockets or pipes are open, and sockets are opened by the application, not the shadow.
+
+Arguments are marshalled via the overloaded function sock->code(), implemented in condor_io/stream.cpp; this code should generally not need to be changed.
+
+If you start a standard universe job with -_condor-aggravate-bugs, the virtual file table will deliberately avoid identity mappings.  This can expose bugs in the interpositioning layer, as invalid FDs become more likely.
+
+The virtual file table also checks if a file is both read and written by an application, and will warn if one is, as this can cause inconsistencies across a restart.
+
+{subsection: Checkpointing}
+
+One of the things in libsyscall is a replacement for crt0.o, which has a special main() function.  This function configures the signal handlers for SIGTSTP and SIGUSR2 (which checkpoint & stop, or checkpoint & resume).  It also determines if it's a restart or not.  Because we're overriding main(), global constructors can be called multiple times per run, but that's probably not important.
+
+The basic checkpointing routine is to:
+
+  1. Call setjmp().
+  2a. If this is a return from longjmp(), clean up and exit the signal handler.
+  2b. Find the segment boundaries (using machdep).
+  3. Open FD to checkpoint server or file.
+  4. Write pages.
+  5. Either exit or call longjmp().
+
+To restart, read the pages back into memory (sbrk()ing as necessary) and then longjmp() to the stored jump buffer.  The OS will take care of restoring application state from the jump buffer and the return out of the signal handler.
+
+{subsubection: Checkpoint Server}
+
+The checkpoint server is grad-student code, and the '2' files are the important ones.  It's not daemon core and has no security whatsoever.  It listens on a few canonical sockets.  The shadow gets a token (capability) from the checkpoint server and gives it to the starter/job, which can then use that token to fetch its checkpoint image.  The schedd can also talk to the checkpoint server, but only to remove a checkpoint image when the job is removed.  The shadow generally manages the checkpoint server for the standard universe.
+
+{subsection: Debugging}
+
+No valgrind/purify/memcheck will work, due to obvious abuses.
+
+gdb gets confused in the checkpoint routines.
+
+dprintf() in the checkpoint library is not the same dprintf() as the rest of condor.
+
+-_condor_D_[CKPT|ALWAYS|FULL_DEBUG] are your friends when executing a job.