This is just notes on checkpointing options, both that exist for Condor and in other systems: *: HTCondor's standard universe: Links in a custom library that replaces system calls with wrapped versions so state can be saved; checkpoints (and restores) inside of a signal handler. *: {link: http://dmtcp.sourceforge.net/ DMTCP (Distributed MultiThreaded CheckPointing} (See DmtcpCondor ): Uses ptrace(?) *: {link: http://criu.org CRIU (Checkpoint/Restore In Userspace)} - See also {link: https://lwn.net/Articles/525675/ This LWN article}: Some kernel changes *: {link: https://ftg.lbl.gov/projects/CheckpointRestart/ BLCR (Berkeley Lab Checkpoint/Restart)}: requires kernal changes *: {link: https://ckpt.wiki.kernel.org/index.php/Main_Page c/r (Linux Checkpoint/Restart)} - No news since 2010. {link: https://lwn.net/Articles/375855/ LWN article} *: {link: http://cryopid.berlios.de/ CryoPID} - No news since 2005 *: {link: http://checkpointing.org/ Checkpointing} -List of others, circa 2011 *: {link: http://openvz.org/Main_Page OpenVZ} - Check on. May be built on CRIU *: {link: http://systems.cs.columbia.edu/projects/zap/ ZAP} - Circa 2008