Things to mention: *: condor_hold actually kills the DAGMan process; condor_release starts a new process (but the same Condor ID) *: lock file (including Joe's Unique PID thing) *: re-reading node job userlogs *: FD problems on wide DAGs *: a bunch of places where we do special stuff in recovery mode *: recovery mode is totally separate from a rescue DAG; in fact, you can be in recovery mode while running a rescue DAG.