*: condor_hold actually kills the DAGMan process; condor_release starts a new process (but the same Condor ID) *: lock file (including Joe's Unique PID thing) *: re-reading node job userlogs -*: FD problems on wide DAGs +*: FD problems on wide DAGs (throttles don't help us in recovery mode) *: a bunch of places where we do special stuff in recovery mode *: recovery mode is totally separate from a rescue DAG; in fact, you can be in recovery mode while running a rescue DAG. *: recovery mode really places a lot of constraints on the rest of the DAGMan code (e.g., need node names in submit events; inter-submit sleeps if using multiple logs; no macros in log file names for node jobs; probably a bunch more that I can't think of at the moment)