Note: not yet complete!

(Needs lots of changes because of workflow log!)

DAGMan monitors the state of node jobs that are in the queue by reading the "workflow log file". Written by shadow, etc.; DAGMan also writes POST script terminated events and "fake" events for DAG-level NOOP jobs (Miron doesn't like this). Note: the "workflow log" is just a special case of a user log file, like the files specified with "log = ..." in a submit file. The main important aspect is that it consolidates events for all relevant jobs into a single file; it also excludes some events that DAGMan doesn't care about.

This is an area in which the DAGMan implementation has changed quite a bit. There have been three main "phases" of DAGMan's interaction with log files:

  1. Initially DAGMan only read a single log file; it was required that every node job specify this log file with the "log = ..." command in the submit file.
  2. Then we changed to allowing any log file to be specified for a given node job; DAGMan read multiple log files and consolidated all of them into a single event stream.
  3. Now DAGMan again reads only a single log file (the "workflow log"); however, the workflow log is independent from any log file specified in a node job's submit file.

A consequence of this history is that the DAGMan code for reading and dealing with events is probably somewhat more complex than it really needs to be.

(More updates needed below.)

DAGMan monitors the state of submitted jobs solely by reading the user logs for the node jobs.

Things to mention: