Note: not yet complete!

The way this is handled in the code is probably not as clean as it could be, because PRE and POST scripts got added to DAGMan after the initial code was written. Because of that, I guess, PRE scripts are handled outside of the ready queue data structure, which is kind of awkward.

There are a number of important data structures relating to job ordering:

When DAGMan starts up, the ready queue is empty. Dag::Bootstrap() calls Dag::StartNode() on all jobs with empty waiting queues. Dag::StartNode() either runs the node's PRE script (if there is one) or puts the node into the ready queue. If a job does have a PRE script, and the PRE script succeeds, Dag::PreScriptReaper() puts the job into the ready queue. (If the PRE script fails, Dag::PreScriptReaper() marks the job as failed, except in special cases.)

Once a job is in the ready queue, it will eventually get submitted by Dag::SubmitReadyJobs(), which is called from condor_event_timer() in dagman_main.cpp; condor_event_timer() is called by daemoncore (every 5 seconds by default). Note that Dag::SubmitReadyJobs() will only submit a certain number of jobs each time it is called (that number is configurable). If the attempt to submit the job fails, Dag::SubmitReadyJobs() calls Dag::ProcessFailedSubmit(), which puts the job back into the ready queue.

When a job finishes, DAGMan sees the job's terminated event in the appropriate log file, and calls Dag::ProcessTerminatedEvent(). If the job failed, Dag::ProcessTerminatedEvent() calls Job::TerminateFailure(), which marks the job as failed. Dag::ProcessTerminatedEvent() then calls Dag::ProcessJobProcEnd(), whether the job succeeded or failed. Dag::ProcessJobProcEnd() takes a number of possible actions, such as initiating a retry for the node, starting the node's POST script, waiting for other job procs to finish if the cluster contains more than one proc, or marking the node as successful.

TEMPTEMP -- talk about post script

When a node finishes, we call Dag::TerminateJob() on it; that method goes through the list of this node's children and removes the just-finished node from the children's waiting queues. For each child whose waiting queue becomes empty, it calls Dag::StartNode(), and the cycle continues.

condor_event_timer() in dagman_main.cpp gets called every five (by default) seconds. In that function, we call Dag::SubmitReadyJobs() to submit any jobs that are ready; ready any new node job events (see ???); output the status of the DAG; and check whether the DAG is finished.