Once a job is in the ready queue, it will eventually get submitted by =Dag::SubmitReadyJobs()=, which is called from =condor_event_timer()= in _dagman_main.cpp_; =condor_event_timer()= is called by daemoncore (every 5 seconds by default). Note that =Dag::SubmitReadyJobs()= will only submit a certain number of jobs each time it is called (that number is configurable). If the attempt to submit the job fails, =Dag::SubmitReadyJobs()= calls =Dag::ProcessFailedSubmit()=, which puts the job back into the ready queue. +When a job finishes, DAGMan sees the job's terminated event in the appropriate log file, and calls =Dag::ProcessTerminatedEvent()=. If the job failed, =Dag::ProcessTerminatedEvent()= calls =Job::TerminateFailure()=, which marks the job as failed. =Dag::ProcessTerminatedEvent()= then calls =Dag::ProcessJobProcEnd()=, whether the job succeeded or failed. =Dag::ProcessJobProcEnd()= takes a number of possible actions, such as initiating a retry for the node, starting the node's POST script, waiting for other job procs to finish if the cluster contains more than one proc, or marking the node as successful. + +TEMPTEMP -- talk about post script + +When a node finishes, we call =Dag::TerminateJob()= on it; that method goes through the list of this node's children and removes the just-finished node from the children's waiting queues. For each child whose waiting queue becomes empty, it calls =Dag::StartNode()=, and the cycle continues. + + =condor_event_timer()= in _dagman_main.cpp_ gets called every five (by default) seconds. In that function, we call =Dag::SubmitReadyJobs()= to submit any jobs that are ready; ready any new node job events (see ???); output the status of the DAG; and check whether the DAG is finished.