Check-in Number: 28896
Date: 2011-Dec-22 11:00:44 (local)
2011-Dec-22 17:00:44 (UTC)
User:Kent Wenger
Branch:
Comment:
Merged [28526], [28527], [28529], [28530], [28531], [28534], [28535], [28538], [28542], [28543], [28550], [28551], [28552], [28554], [28555], [28556], [28560], [28563], [28564], [28565], [28567], [28568], [28569], [28570], [28571], [28576], [28577], [28584], [28585], [28586], [28588], [28589], [28590], [28591], [28592], [28593], [28595], [28596], [28604], [28605], [28606], [28607], [28609], [28610], [28611], [28612], [28613], [28614], [28615], [28616], [28620], [28621], [28624], [28625], [28629], [28636], [28637], [28638], [28707], [28708], [28755], [28822], [28825], [28826], [28855], [28861], [28863], [28895], Gittrac #1482: merged V7_7-allow_cleanup_final_dag_node-merge-branch to master.

Merge branch 'V7_7-allow_cleanup_final_dag_node-merge-branch'

Conflicts: doc/version-history/7-7.history.tex src/condor_dagman/dag.cpp

Tickets:
#1482 Allow a cleanup/final DAG node that is always run
Inspections:
Check-ins:
[28526] Disable standard universe warnings about dlopening gethostbyname and friends. We always link nss statically, so there's nothing to be done.
[28527] Use the new system-wide GCC_DIAG macros in the negotiator #2584
[28529] Apply disable-static-link patch to rhel5 also
[28530] Merged [28528], Merge branch 'master' of /p/condor/repository/CONDOR_SRC
[28531] Remove warnings in the building of tests #2584
[28534] in condor_softkill, don't treat a failure to get the exe name as a fatal error, go ahead and PostMessage ===VersionHistory:Pending===
[28535] Add -Werror to fedora_14 builds
[28538] Fix a bunch of std uni stub-generated warnings #2584
[28542] Merged [28541], Merge branch 'V7_7_4-branch'
[28543] remove WINDOWS_SOFTKILL_DEBUG_LOG from condor_config.generic ===VersionHistory:None===
[28550] set WINDOWS_SOFTKILL_LOG in batch_test ===VersionHistory:None===
[28551] Merged [28526], [28527], [28529], [28530], [28531], [28547], [28548], Merge branch 'master' of ssh://chopin.cs.wisc.edu/p/condor/repository/CONDOR_SRC
[28552] Fix a bunch of compiler warnings. #2584
[28554] Enhanced condor_status -sort to accept generalized expressions ===GT:Fixed=== #2661
[28555] Documentation of generalized expression support for condor_status -sort ===GT=== #2661
[28556] ===VersionHistory:Complete=== ===GT=== #2661
[28560] Merged [28559], Merge branch 'V7_7_4-branch'
[28563] Fix cream gahp warnings #2584
[28564] Fix warnings #2584
[28565] Remove blue gene simulations support
[28567] Remove warnings from rsc test code #2584
[28568] Insure ec2_access_key_id and ec2_secret_access_key are not directories at submit time, #2619
[28569] 7.7 version history: ec2_secret_access_key & ec2_access_key_id can no longer be directories
[28570] Initial commit of #2015, filesystem namespaces
[28571] Add better error message for unparseable condor_q constraints #2688 condor_q -const foo=bar now errors: -- Parse error in constraint expression "foo=bar"
[28576] Fix warning in last commit #2688
[28577] Merged [28575], Merge branch 'master' of /p/condor/repository/CONDOR_SRC
[28584] Add -expand option to condor_config_val -dump #2687
[28585] Merged [28583], Merge branch 'master' of /p/condor/repository/CONDOR_SRC
[28586] Document -expand option to condor_config_val #2687
[28588] Merged [28587], Merge branch 'V7_6-branch'
[28589] Remove renice_self files and code, inline into std starter #2584 This was only used in one place, and deprecated by DaemonCore.
[28590] Check return code of fgets #2584
[28591] Capture return code from write(2) #2584
[28592] Commented out unused argc/argv in main
[28593] Commented out unused user_arg/listener in globus_l_gass_server_ez_close_callback
[28595] Capture return value of write #2584
[28596] Check return code for chdir #2584
[28604] More warning fixes #2584
[28605] Merged [28597], [28598], [28599], [28600], [28601], [28602], [28603], Merge branch 'master' of /p/condor/repository/CONDOR_SRC
[28606] Check fgets return code in idle_time.cpp
[28607] Check fgets return code in view_server.cpp
[28609] Check fchmod & fchown return code in credd.cpp
[28610] Check pipe return code in proxymanager.cpp
[28611] Fix warning in my_popen #2584
[28612] Check chown return code in dprintf.cpp
[28613] Add missing return to non-void function #2584
[28614] Check return codes from write to satisfy warnings
[28615] Remove warning #2584
[28616] Remove yet more warnings #2584
[28620] More warnings #2584
[28621] Merged [28526], [28527], [28529], [28530], [28531], [28534], [28535], [28538], [28542], [28543], [28550], [28551], [28617], [28618], [28619], Merge branch 'master' of /p/condor/repository/CONDOR_SRC
[28624] Save return code of write(2)
[28625] Merged [28526], [28527], [28529], [28530], [28531], [28534], [28535], [28538], [28542], [28543], [28550], [28551], [28552], [28554], [28555], [28556], [28560], [28563], [28564], [28565], [28567], [28568], [28569], [28570], [28571], [28575], [28622], [28623], Merge branch 'master' of /p/condor/repository/CONDOR_SRC [...]
[28629] Merged [28341], [28355], [28374], [28384], [28398], [28409], [28421], [28432], [28438], [28445], [28487], [28491], [28497], [28502], [28517], [28541], [28559], [28574], [28582], [28628], Merge branch 'V7_7_4-branch'
[28636] Identify those attributes which are in the Scheduler ClassAd when the verbosity level of STATISTICS_TO_PUBLISH is high enough. ===GT=== #2197
[28637] Merged [28526], [28527], [28529], [28530], [28531], [28534], [28535], [28538], [28542], [28543], [28550], [28551], [28552], [28554], [28555], [28556], [28560], [28563], [28564], [28565], [28567], [28568], [28569], [28570], [28571], [28576], [28577], [28584], [28585], [28586], [28588], [28589], [28590], [...]
[28638] Remove 3 statistics attributes that are no longer. ===GT=== #2197
[28707] Merged [28608], [28639], Merge branch 'V7_7-allow_cleanup_final_dag_node-branch' into V7_7-allow_cleanup_final_dag_node-merge-branch Gittrac #1482: in the process of combining the final node implementation with the latest version of the master -- lots of bad merge conflicts, but I have it compiling and [...]
[28708] This change was purged from the git repository.
[28755] Gittrac #1482: cleaned up dagman_main.cpp after merge.
[28822] Gittrac #1482: added 'always run post' and 'pre skip' to the final node tests.
[28825] Gittrac #1482: fixed DoneCycle() method in Dag class; added a warning if a final node is marked as done; various other minor cleanups.
[28826] Gittrac #1482: finished manual merge of rescue-DAG-writing code -- found one small error.
[28855] Gittrac #1482: TEMPTEMP -- cleaned up the FinishedRunning methods; added tests for DAG that's empty other than final node, cycle + final node (currently fails) and DAG halt + final node; found an error in the DAG abort test, and fixed a problem in the abort/final node implementation found by the fixed [...]
[28861] Gittrac #1482: all tests now pass.
[28863] Gittrac #1482: hopefully final pre-review cleanup.
[28895] Gittrac #1482: added version history item about this so at least something is there for the merge -- still needs full docs.
Files:
doc/version-history/7-7.history.tex      21908b6b -> be81a7cd    
src/condor_dagman/dag.cpp      21908b6b -> be81a7cd    
src/condor_dagman/dag.h      21908b6b -> be81a7cd    
src/condor_dagman/dagman_commands.cpp      21908b6b -> be81a7cd    
src/condor_dagman/dagman_commands.h      21908b6b -> be81a7cd    
src/condor_dagman/dagman_main.cpp      21908b6b -> be81a7cd    
src/condor_dagman/dagman_main.h      21908b6b -> be81a7cd    
src/condor_dagman/dagman_submit.cpp      21908b6b -> be81a7cd    
src/condor_dagman/debug.cpp      21908b6b -> be81a7cd    
src/condor_dagman/job.cpp      21908b6b -> be81a7cd    
src/condor_dagman/job.h      21908b6b -> be81a7cd    
src/condor_dagman/parse.cpp      21908b6b -> be81a7cd    
src/condor_dagman/script.cpp      21908b6b -> be81a7cd    
src/condor_dagman/script.h      21908b6b -> be81a7cd    
src/condor_dagman/scriptQ.cpp      21908b6b -> be81a7cd    
src/condor_tests/CMakeLists.txt      21908b6b -> be81a7cd    
src/condor_tests/job_dagman_final-A-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-A-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-A-nodeB.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-A-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-A-script.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-A.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-A.run      added-> be81a7cd
src/condor_tests/job_dagman_final-B-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-B-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-B-nodeB.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-B-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-B-script.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-B.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-B.run      added-> be81a7cd
src/condor_tests/job_dagman_final-C-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-C-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-C-nodeA.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-C-nodeB.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-C-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-C-script.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-C.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-C.run      added-> be81a7cd
src/condor_tests/job_dagman_final-D-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-D-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-D-nodeB.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-D-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-D-script.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-D.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-D.run      added-> be81a7cd
src/condor_tests/job_dagman_final-E-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-E-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-E-nodeB1.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-E-nodeB1.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-E-nodeB2.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-E-nodeB2.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-E-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-E-script.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-E.config      added-> be81a7cd
src/condor_tests/job_dagman_final-E.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-E.run      added-> be81a7cd
src/condor_tests/job_dagman_final-F-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-F-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-F-nodeB1.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-F-nodeB2.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-F-nodeB3.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-F-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-F-script.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-F.config      added-> be81a7cd
src/condor_tests/job_dagman_final-F.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-F.run      added-> be81a7cd
src/condor_tests/job_dagman_final-G-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-G-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-G.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-G.run      added-> be81a7cd
src/condor_tests/job_dagman_final-H-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-H-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-H-nodeB1.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-H-nodeB2.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-H-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-H.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-H.run      added-> be81a7cd
src/condor_tests/job_dagman_final-I-node.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-I-nodeA.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-I-nodeB.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-I-nodeC.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-I-nodeD.cmd      added-> be81a7cd
src/condor_tests/job_dagman_final-I-script.pl      added-> be81a7cd
src/condor_tests/job_dagman_final-I.dag      added-> be81a7cd
src/condor_tests/job_dagman_final-I.run      added-> be81a7cd
src/condor_tests/job_dagman_final.dag      added-> be81a7cd
src/condor_tests/job_dagman_halt-A.run      21908b6b -> be81a7cd    
src/condor_tests/job_dagman_submit_fails_post-nodeA-post.pl      21908b6b -> be81a7cd    
src/condor_tests/job_dagman_submit_fails_post.dag      21908b6b -> be81a7cd    
diff --git a/doc/version-history/7-7.history.tex b/doc/version-history/7-7.history.tex
index f7b6ececa497d29b9a2c294fa9bb6a7df60b08a8..354d578813debcf280bd7f09b930214a3faab068 100644
a/​doc/​version-history/​7-7.​history.​texb/​doc/​version-history/​7-7.​history.​tex
123 its·​children·​are·​still·​running.​123 its·​children·​are·​still·​running.​
124 \Ticket{2463}124 \Ticket{2463}
125 125
126 %·​Note:​·​this·​is·​just·​kind·​of·​a·​stub·​so·​there's·​something·​in·​the·​version
127 %·​history·​when·​I·​merge·​this·​branch·​to·​the·​master·​--·​I·​need·​to·​fully
128 %·​document·​this·​new·​feature·​ASAP.​··​wenger·​2011-12-22
129 \item·​Added·​the·​new·​FINAL·​node·​feature·​to·​\Condor{dagman}·​--·​this·​allows
130 the·​user·​to·​declare·​a·​FINAL·​node·​in·​the·​DAG,​·​which·​is·​always·​run·​at·​the
131 end·​of·​the·​workflow,​·​whether·​it·​ended·​successfully·​or·​not.​
132 \Ticket{1482}
133
126 \end{itemize}134 \end{itemize}
127 135
128 \noindent·​Configuration·​Variable·​and·​ClassAd·​Attribute·​Additions·​and·​Changes:​136 \noindent·​Configuration·​Variable·​and·​ClassAd·​Attribute·​Additions·​and·​Changes:​
diff --git a/src/condor_dagman/dag.cpp b/src/condor_dagman/dag.cpp
index 772e9db26a366c061706f011f3a5a1c4d46f1f02..ded0e16a9218070f63292ec87c591d2bfc03b493 100644
a/​src/​condor_dagman/​dag.​cppb/​src/​condor_dagman/​dag.​cpp
88 »   ​_splices··············​(200,​·​hashFuncMyString,​·​rejectDuplicateKeys)​,​88 »   ​_splices··············​(200,​·​hashFuncMyString,​·​rejectDuplicateKeys)​,​
89 »   ​_dagFiles·············​(dagFiles)​,​89 »   ​_dagFiles·············​(dagFiles)​,​
90 »   ​_useDagDir············​(useDagDir)​,​90 »   ​_useDagDir············​(useDagDir)​,​
91 »   ​_final_job·​(0)​,​
91 »   ​_nodeNameHash»  ​»   ​··​(NODE_HASH_SIZE,​·​MyStringHash,​·​rejectDuplicateKeys)​,​92 »   ​_nodeNameHash»  ​»   ​··​(NODE_HASH_SIZE,​·​MyStringHash,​·​rejectDuplicateKeys)​,​
92 »   ​_nodeIDHash»»   ​»   ​··​(NODE_HASH_SIZE,​·​hashFuncInt,​·​rejectDuplicateKeys)​,​93 »   ​_nodeIDHash»»   ​»   ​··​(NODE_HASH_SIZE,​·​hashFuncInt,​·​rejectDuplicateKeys)​,​
93 »   ​_condorIDHash»  ​»   ​··​(NODE_HASH_SIZE,​·​hashFuncInt,​·​rejectDuplicateKeys)​,​94 »   ​_condorIDHash»  ​»   ​··​(NODE_HASH_SIZE,​·​hashFuncInt,​·​rejectDuplicateKeys)​,​
184 »   ​_recovery·​=·​false;​185 »   ​_recovery·​=·​false;​
185 »   ​_abortOnScarySubmit·​=·​true;​186 »   ​_abortOnScarySubmit·​=·​true;​
186 »   ​_configFile·​=·​NULL;​187 »   ​_configFile·​=·​NULL;​
188 »   ​_runningFinalNode·​=·​false;​
187 »   ​»   ​189 »   ​»   ​
188 »   ​»   ​/​/​·​Don't·​print·​any·​waiting·​node·​reports·​until·​we're·​done·​with190 »   ​»   ​/​/​·​Don't·​print·​any·​waiting·​node·​reports·​until·​we're·​done·​with
189 »   ​»   ​/​/​·​recovery·​mode.​191 »   ​»   ​/​/​·​recovery·​mode.​
194 »   ​_dagIsHalted·​=·​false;​196 »   ​_dagIsHalted·​=·​false;​
195 »   ​_dagFiles.​rewind()​;​197 »   ​_dagFiles.​rewind()​;​
196 »   ​_haltFile·​=·​HaltFileName(·​_dagFiles.​next()​·​)​;​198 »   ​_haltFile·​=·​HaltFileName(·​_dagFiles.​next()​·​)​;​
199 »   ​_dagStatus·​=·​DAG_STATUS_OK;​
197 200
198 »   ​_nfsLogIsError·​=·​param_boolean(·​"DAGMAN_LOG_ON_NFS_IS​_ERROR",​·​true·​)​;​201 »   ​_nfsLogIsError·​=·​param_boolean(·​"DAGMAN_LOG_ON_NFS_IS​_ERROR",​·​true·​)​;​
199 202
245 »   ​»   ​}248 »   ​»   ​}
246 ····​}249 ····​}
247 ····​debug_printf(·​DEBUG_VERBOSE,​·​"Number·​of·​pre-completed·​nodes:​·​%d\n",​250 ····​debug_printf(·​DEBUG_VERBOSE,​·​"Number·​of·​pre-completed·​nodes:​·​%d\n",​
248 »   ​»   ​»   ​»   ​··​NumNodesDone()​·​)​;​251 »   ​»   ​»   ​»   ​··​NumNodesDone(·true·)​·​)​;​
249 ····252 ····
250 ····​if·​(recovery)​·​{253 ····​if·​(recovery)​·​{
251 »   ​»   ​_recovery·​=·​true;​254 »   ​»   ​_recovery·​=·​true;​
672 »   ​»   ​»   ​/​/​·​Only·​change·​the·​node·​status,​·​error·​info,​675 »   ​»   ​»   ​/​/​·​Only·​change·​the·​node·​status,​·​error·​info,​
673 »   ​»   ​»   ​/​/​·​etc.​,​·​if·​we·​haven't·​already·​gotten·​an·​error676 »   ​»   ​»   ​/​/​·​etc.​,​·​if·​we·​haven't·​already·​gotten·​an·​error
674 »   ​»   ​»   ​/​/​·​from·​another·​job·​proc·​in·​this·​job·​cluster677 »   ​»   ​»   ​/​/​·​from·​another·​job·​proc·​in·​this·​job·​cluster
675 »   ​»   ​if·​(·​job->_Status·​!=·​Job:​:​STATUS_ERROR·​)​·​{678 »   ​»   ​if·​(·​job->GetStatus()​·​!=·​Job:​:​STATUS_ERROR·​)​·​{
676 »   ​»   ​»   ​job->_Status·=·Job:​:​STATUS_ERROR;​679 »   ​»   ​»   ​job->TerminateFailure​()​;​
677 »   ​»   ​»   ​snprintf(·​job->error_text,​·​JOB_ERROR_TEXT_MAXLEN​,​680 »   ​»   ​»   ​snprintf(·​job->error_text,​·​JOB_ERROR_TEXT_MAXLEN​,​
678 »   ​»   ​»   ​»   ​··​"Condor·​reported·​%s·​event·​for·​job·​proc·​(%d.​%d.​%d)​",​681 »   ​»   ​»   ​»   ​··​"Condor·​reported·​%s·​event·​for·​job·​proc·​(%d.​%d.​%d)​",​
679 »   ​»   ​»   ​»   ​··​ULogEventNumberNames[​event->eventNumber],​682 »   ​»   ​»   ​»   ​··​ULogEventNumberNames[​event->eventNumber],​
722 725
723 »   ​»   ​»   ​»   ​/​/​·​Only·​change·​the·​node·​status,​·​error·​info,​·​etc.​,​726 »   ​»   ​»   ​»   ​/​/​·​Only·​change·​the·​node·​status,​·​error·​info,​·​etc.​,​
724 »   ​»   ​»   ​»   ​/​/​·​if·​we·​haven't·​already·​gotten·​an·​error·​on·​this·​node.​727 »   ​»   ​»   ​»   ​/​/​·​if·​we·​haven't·​already·​gotten·​an·​error·​on·​this·​node.​
725 »   ​»   ​»   ​if·​(·​job->_Status·​!=·​Job:​:​STATUS_ERROR·​)​·​{728 »   ​»   ​»   ​if·​(·​job->GetStatus()​·​!=·​Job:​:​STATUS_ERROR·​)​·​{
726 »   ​»   ​»   ​»   ​if(·​termEvent->normal·​)​·​{729 »   ​»   ​»   ​»   ​if(·​termEvent->normal·​)​·​{
727 »   ​»   ​»   ​»   ​»   ​snprintf(·​job->error_text,​·​JOB_ERROR_TEXT_MAXLEN​,​730 »   ​»   ​»   ​»   ​»   ​snprintf(·​job->error_text,​·​JOB_ERROR_TEXT_MAXLEN​,​
728 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"Job·​proc·​(%d.​%d.​%d)​·​failed·​with·​status·​%d",​731 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"Job·​proc·​(%d.​%d.​%d)​·​failed·​with·​status·​%d",​
745 »   ​»   ​»   ​»   ​»   ​}748 »   ​»   ​»   ​»   ​»   ​}
746 »   ​»   ​»   ​»   ​}749 »   ​»   ​»   ​»   ​}
747 750
748 »   ​»   ​»   ​»   ​job->_Status·=·Job:​:​STATUS_ERROR;​751 »   ​»   ​»   ​»   ​job->TerminateFailure​()​;​
749 »   ​»   ​»   ​»   ​if·​(·​job->_queuedNodeJobPr​ocs·​>·​0·​)​·​{752 »   ​»   ​»   ​»   ​if·​(·​job->_queuedNodeJobPr​ocs·​>·​0·​)​·​{
750 »   ​»   ​»   ​»   ​··​/​/​·​once·​one·​job·​proc·​fails,​·​remove753 »   ​»   ​»   ​»   ​··​/​/​·​once·​one·​job·​proc·​fails,​·​remove
751 »   ​»   ​»   ​»   ​··​/​/​·​the·​whole·​cluster754 »   ​»   ​»   ​»   ​··​/​/​·​the·​whole·​cluster
759 762
760 »   ​»   ​»   ​»   ​/​/​·​Only·​change·​the·​node·​status·​if·​we·​haven't·​already763 »   ​»   ​»   ​»   ​/​/​·​Only·​change·​the·​node·​status·​if·​we·​haven't·​already
761 »   ​»   ​»   ​»   ​/​/​·​gotten·​an·​error·​on·​this·​node.​764 »   ​»   ​»   ​»   ​/​/​·​gotten·​an·​error·​on·​this·​node.​
762 »   ​»   ​»   ​if·​(·​job->_Status·​!=·​Job:​:​STATUS_ERROR·​)​·​{765 »   ​»   ​»   ​if·​(·​job->GetStatus()​·​!=·​Job:​:​STATUS_ERROR·​)​·​{
763 »   ​»   ​»   ​»   ​job->retval·​=·​0;​766 »   ​»   ​»   ​»   ​job->retval·​=·​0;​
764 »   ​»   ​»   ​»   ​if·​(·​job->_scriptPost·​!=·​NULL)​·​{767 »   ​»   ​»   ​»   ​if·​(·​job->_scriptPost·​!=·​NULL)​·​{
765 »   ​»   ​»   ​»   ​»   ​»   ​/​/​·​let·​the·​script·​know·​the·​job's·​exit·​status768 »   ​»   ​»   ​»   ​»   ​»   ​/​/​·​let·​the·​script·​know·​the·​job's·​exit·​status
868 »   ​»   ​»   ​}871 »   ​»   ​»   ​}
869 »   ​»   ​»   ​if·​(·​job->_queuedNodeJobPr​ocs·​==·​0·​)​·​{872 »   ​»   ​»   ​if·​(·​job->_queuedNodeJobPr​ocs·​==·​0·​)​·​{
870 »   ​»   ​»   ​»   ​_numNodesFailed++;​873 »   ​»   ​»   ​»   ​_numNodesFailed++;​
874 »   ​»   ​»   ​»   ​if·​(·​_dagStatus·​==·​DAG_STATUS_OK·​)​·​{
875 »   ​»   ​»   ​»   ​»   ​_dagStatus·​=·​DAG_STATUS_NODE_FAILE​D;​
876 »   ​»   ​»   ​»   ​}
871 »   ​»   ​»   ​}877 »   ​»   ​»   ​}
872 »   ​»   ​}878 »   ​»   ​}
873 »   ​»   ​return;​879 »   ​»   ​return;​
881 »   ​»   ​»   ​/​/​·​if·​a·​POST·​script·​is·​specified·​for·​the·​job,​·​run·​it887 »   ​»   ​»   ​/​/​·​if·​a·​POST·​script·​is·​specified·​for·​the·​job,​·​run·​it
882 »   ​»   ​if(job->_scriptPost·​!=·​NULL)​·​{888 »   ​»   ​if(job->_scriptPost·​!=·​NULL)​·​{
883 »   ​»   ​»   ​if·​(·​recovery·​)​·​{889 »   ​»   ​»   ​if·​(·​recovery·​)​·​{
884 »   ​»   ​»   ​»   ​job->_Status·=·​Job:​:​STATUS_POSTRUN;​890 »   ​»   ​»   ​»   ​job->SetStatus(·​Job:​:​STATUS_POSTRUN·)​;​
885 »   ​»   ​»   ​»   ​_postRunNodeCount++;​891 »   ​»   ​»   ​»   ​_postRunNodeCount++;​
886 »   ​»   ​»   ​»   ​(void)​job->MonitorLogFile(·​_condorLogRdr,​·​_storkLogRdr,​892 »   ​»   ​»   ​»   ​(void)​job->MonitorLogFile(·​_condorLogRdr,​·​_storkLogRdr,​
887 »   ​»   ​»   ​»   ​»   ​»   ​_nfsLogIsError,​·​_recovery,​·​_defaultNodeLog·​)​;​893 »   ​»   ​»   ​»   ​»   ​»   ​_nfsLogIsError,​·​_recovery,​·​_defaultNodeLog·​)​;​
888 »   ​»   ​»   ​}·​else·​{894 »   ​»   ​»   ​}·​else·​{
889 »   ​»   ​»   ​»   ​(void)​RunPostScript(·​job,​·​_alwaysRunPost,​·​0·​)​;​895 »   ​»   ​»   ​»   ​(void)​RunPostScript(·​job,​·​_alwaysRunPost,​·​0·​)​;​
890 »   ​»   ​»   ​}896 »   ​»   ​»   ​}
891 »   ​»   ​}·​else·​if(·​job->_Status·​!=·​Job:​:​STATUS_ERROR·​)​·​{897 »   ​»   ​}·​else·​if(·​job->GetStatus()​·​!=·​Job:​:​STATUS_ERROR·​)​·​{
892 »   ​»   ​»   ​/​/​·​no·​POST·​script·​was·​specified,​·​so·​update·​DAG·​with898 »   ​»   ​»   ​/​/​·​no·​POST·​script·​was·​specified,​·​so·​update·​DAG·​with
893 »   ​»   ​»   ​/​/​·​node's·​successful·​completion·​if·​the·​node·​succeeded.​899 »   ​»   ​»   ​/​/​·​node's·​successful·​completion·​if·​the·​node·​succeeded.​
894 »   ​»   ​»   ​TerminateJob(·​job,​·​recovery·​)​;​900 »   ​»   ​»   ​TerminateJob(·​job,​·​recovery·​)​;​
928 »   ​»   ​»   ​»   ​job->GetJobName()​·​)​;​934 »   ​»   ​»   ​»   ​job->GetJobName()​·​)​;​
929 »   ​»   ​if(·​!(termEvent->normal·​&&·​termEvent->returnValu​e·​==·​0)​·​)​·​{935 »   ​»   ​if(·​!(termEvent->normal·​&&·​termEvent->returnValu​e·​==·​0)​·​)​·​{
930 »   ​»   ​»   ​»   ​/​/​·​POST·​script·​failed·​or·​was·​killed·​by·​a·​signal936 »   ​»   ​»   ​»   ​/​/​·​POST·​script·​failed·​or·​was·​killed·​by·​a·​signal
931 »   ​»   ​»   ​job->_Status·=·Job:​:​STATUS_ERROR;​937 »   ​»   ​»   ​job->TerminateFailure​()​;​
932 938
933 »   ​»   ​»   ​int·​mainJobRetval·​=·​job->retval;​939 »   ​»   ​»   ​int·​mainJobRetval·​=·​job->retval;​
934 940
973 »   ​»   ​»   ​}·​else·​{979 »   ​»   ​»   ​}·​else·​{
974 »   ​»   ​»   ​»   ​»   ​/​/​·​no·​more·​retries·​--·​node·​failed980 »   ​»   ​»   ​»   ​»   ​/​/​·​no·​more·​retries·​--·​node·​failed
975 »   ​»   ​»   ​»   ​_numNodesFailed++;​981 »   ​»   ​»   ​»   ​_numNodesFailed++;​
982 »   ​»   ​»   ​»   ​if·​(·​_dagStatus·​==·​DAG_STATUS_OK·​)​·​{
983 »   ​»   ​»   ​»   ​»   ​_dagStatus·​=·​DAG_STATUS_NODE_FAILE​D;​
984 »   ​»   ​»   ​»   ​}
976 985
977 »   ​»   ​»   ​»   ​MyString»   ​errMsg;​986 »   ​»   ​»   ​»   ​MyString»   ​errMsg;​
978 987
1086 »   ​»   ​»   ​}1095 »   ​»   ​»   ​}
1087 »   ​»   ​}1096 »   ​»   ​}
1088 1097
1089 »   ​»   ​job->_Status·=·​Job:​:​STATUS_SUBMITTED;​1098 »   ​»   ​job->SetStatus(·​Job:​:​STATUS_SUBMITTED·)​;​
1090 »   ​»   ​return;​1099 »   ​»   ​return;​
1091 »   ​}1100 »   ​}
1092 1101
1340 »   ​/​/​·​actual·​job·​to·​Condor·​if/​when·​the·​script·​exits·​successfully1349 »   ​/​/​·​actual·​job·​to·​Condor·​if/​when·​the·​script·​exits·​successfully
1341 1350
1342 ····​if(·​node->_scriptPre·​&&·​node->_scriptPre->_do​ne·​==·​FALSE·​)​·​{1351 ····​if(·​node->_scriptPre·​&&·​node->_scriptPre->_do​ne·​==·​FALSE·​)​·​{
1343 »   ​»   ​node->_Status·=·​Job:​:​STATUS_PRERUN;​1352 »   ​»   ​node->SetStatus(·​Job:​:​STATUS_PRERUN·)​;​
1344 »   ​»   ​_preRunNodeCount++;​1353 »   ​»   ​_preRunNodeCount++;​
1345 »   ​»   ​_preScriptQ->Run(·​node->_scriptPre·​)​;​1354 »   ​»   ​_preScriptQ->Run(·​node->_scriptPre·​)​;​
1346 »   ​»   ​return·​true;​1355 »   ​»   ​return·​true;​
1364 }1373 }
1365 1374
1366 /​/​---------------------​---------------------​---------------------​----------1375 /​/​---------------------​---------------------​---------------------​----------
1376 bool
1377 Dag:​:​StartFinalNode()​
1378 {
1379 »   ​if·​(·​_final_job·​&&·​_final_job->GetStatus​()​·​==·​Job:​:​STATUS_NOT_READY·​)​·​{
1380 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"Starting·​final·​node.​.​.​\n"·​)​;​
1381
1382 »   ​»   ​»   ​/​/​·​Clear·​out·​the·​empty·​queue·​so·​"regular"·​jobs·​don't·​run
1383 »   ​»   ​»   ​/​/​·​when·​we·​run·​the·​final·​node·​(this·​is·​really·​just·​needed
1384 »   ​»   ​»   ​/​/​·​for·​the·​DAG·​abort·​and·​DAG·​halt·​cases)​.​
1385 »   ​»   ​Job*·​job;​
1386 »   ​»   ​_readyQ->Rewind()​;​
1387 »   ​»   ​while·​(·​_readyQ->Next(·​job·​)​·​)​·​{
1388 »   ​»   ​»   ​if·​(·​!job->GetFinal()​·​)​·​{
1389 »   ​»   ​»   ​»   ​debug_printf(·​DEBUG_DEBUG_1,​
1390 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"Removing·​node·​%s·​from·​ready·​queue\n",​
1391 »   ​»   ​»   ​»   ​»   ​»   ​»   ​job->GetJobName()​·​)​;​
1392 »   ​»   ​»   ​»   ​_readyQ->DeleteCurren​t()​;​
1393 »   ​»   ​»   ​}
1394 »   ​»   ​}
1395
1396 »   ​»   ​»   ​/​/​·​Now·​start·​up·​the·​final·​node.​
1397 »   ​»   ​_final_job->SetStatus​(·​Job:​:​STATUS_READY·​)​;​
1398 »   ​»   ​if·​(·​StartNode(·​_final_job,​·​false·​)​·​)​·​{
1399 »   ​»   ​»   ​_runningFinalNode·​=·​true;​
1400 »   ​»   ​»   ​return·​true;​
1401 »   ​»   ​}
1402 »   ​}
1403
1404 »   ​return·​false;​
1405 }
1406
1407 /​/​---------------------​---------------------​---------------------​----------
1367 /​/​·​returns·​number·​of·​jobs·​submitted1408 /​/​·​returns·​number·​of·​jobs·​submitted
1368 int1409 int
1369 Dag:​:​SubmitReadyJobs(const​·​Dagman·​&dm)​1410 Dag:​:​SubmitReadyJobs(const​·​Dagman·​&dm)​
1386 »   ​»   ​/​/​·​any·​jobs·​or·​run·​scripts.​1427 »   ​»   ​/​/​·​any·​jobs·​or·​run·​scripts.​
1387 »   ​bool·​prevDagIsHalted·​=·​_dagIsHalted;​1428 »   ​bool·​prevDagIsHalted·​=·​_dagIsHalted;​
1388 »   ​_dagIsHalted·​=·​(·​access(·​_haltFile.​Value()​·​,​·​F_OK·​)​·​==·​0·​)​;​1429 »   ​_dagIsHalted·​=·​(·​access(·​_haltFile.​Value()​·​,​·​F_OK·​)​·​==·​0·​)​;​
1430
1389 »   ​if·​(·​_dagIsHalted·​)​·​{1431 »   ​if·​(·​_dagIsHalted·​)​·​{
1390 »   ​»   ​debug_printf(·​DEBUG_QUIET,​1432 »   ​»   ​debug_printf(·​DEBUG_QUIET,​
1391 »   ​»   ​»   ​»   ​»   ​"DAG·​is·​halted·​because·​halt·​file·​%s·​exists\n",​1433 »   ​»   ​»   ​»   ​»   ​"DAG·​is·​halted·​because·​halt·​file·​%s·​exists\n",​
1392 »   ​»   ​»   ​»   ​»   ​_haltFile.​Value()​·​)​;​1434 »   ​»   ​»   ​»   ​»   ​_haltFile.​Value()​·​)​;​
1393 ········return·numSubmitsThisCycle;​1435 »   ​»   ​if·(·_runningFinalNode·)​·{
1436 »   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​
1437 »   ​»   ​»   ​»   ​»   ​»   ​"Continuing·​to·​allow·​final·​node·​to·​run\n"·​)​;​
1438 »   ​»   ​}·​else·​{
1439 ········»   ​return·​numSubmitsThisCycle;​
1440 »   ​»   ​}
1394 »   ​}1441 »   ​}
1395 »   ​if·​(·​prevDagIsHalted·​)​·​{1442 »   ​if·​(·​prevDagIsHalted·​)​·​{
1396 »   ​»   ​debug_printf(·DEBUG_QUIET,​1443 »   ​»   ​if·(·!_dagIsHalted·)​·{
1397 »   ​»   ​»   ​»   ​»   ​"DAG·going·from·halted·to·running·state\n"·)​;​1444 »   ​»   ​»   ​debug_printf(·DEBUG_QUIET,​
1445 »   ​»   ​»   ​»   ​»   ​»   ​"DAG·​going·​from·​halted·​to·​running·​state\n"·​)​;​
1446 »   ​»   ​}
1398 »   ​»   ​»   ​/​/​·​If·​going·​from·​the·​halted·​to·​the·​not·​halted·​state,​·​we·​need1447 »   ​»   ​»   ​/​/​·​If·​going·​from·​the·​halted·​to·​the·​not·​halted·​state,​·​we·​need
1399 »   ​»   ​»   ​/​/​·​to·​fire·​up·​any·​PRE·​scripts·​that·​were·​deferred·​while·​we·​were1448 »   ​»   ​»   ​/​/​·​to·​fire·​up·​any·​PRE·​scripts·​that·​were·​deferred·​while·​we·​were
1400 »   ​»   ​»   ​/​/​·​halted.​1449 »   ​»   ​»   ​/​/​·​halted.​
1564 »   ​_jobstateLog.​WriteScriptSuccessOrF​ailure(·​job,​·​false·​)​;​1613 »   ​_jobstateLog.​WriteScriptSuccessOrF​ailure(·​job,​·​false·​)​;​
1565 1614
1566 »   ​if·​(·​preScriptFailed·​)​·​{1615 »   ​if·​(·​preScriptFailed·​)​·​{
1567 »   ​»   ​job->_Status·=·Job:​:​STATUS_ERROR;​1616 »   ​»   ​job->TerminateFailure​()​;​
1568 1617
1569 »   ​»   ​»   ​/​/​·​Check·​for·​PRE_SKIP.​1618 »   ​»   ​»   ​/​/​·​Check·​for·​PRE_SKIP.​
1570 »   ​»   ​if·​(·​job->HasPreSkip()​·​&&·​job->GetPreSkip()​·​==·​job->retval·​)​·​{1619 »   ​»   ​if·​(·​job->HasPreSkip()​·​&&·​job->GetPreSkip()​·​==·​job->retval·​)​·​{
1593 1642
1594 »   ​»   ​»   ​/​/​·​Check·​for·​retries.​1643 »   ​»   ​»   ​/​/​·​Check·​for·​retries.​
1595 »   ​»   ​else·​if(·​job->GetRetries()​·​<·​job->GetRetryMax()​·​)​·​{1644 »   ​»   ​else·​if(·​job->GetRetries()​·​<·​job->GetRetryMax()​·​)​·​{
1596 »   ​»   ​»   ​job->_Status·=·Job:​:​STATUS_ERROR;​1645 »   ​»   ​»   ​job->TerminateFailure​()​;​
1597 »   ​»   ​»   ​RestartNode(·​job,​·​false·​)​;​1646 »   ​»   ​»   ​RestartNode(·​job,​·​false·​)​;​
1598 »   ​»   ​}1647 »   ​»   ​}
1599 1648
1600 »   ​»   ​»   ​/​/​·​None·​of·​the·​above·​apply·​--·​the·​node·​has·​failed.​1649 »   ​»   ​»   ​/​/​·​None·​of·​the·​above·​apply·​--·​the·​node·​has·​failed.​
1601 »   ​»   ​else·​{1650 »   ​»   ​else·​{
1602 »   ​»   ​»   ​_numNodesFailed++;​1651 »   ​»   ​»   ​_numNodesFailed++;​
1652 »   ​»   ​»   ​if·​(·​_dagStatus·​==·​DAG_STATUS_OK·​)​·​{
1653 »   ​»   ​»   ​»   ​_dagStatus·​=·​DAG_STATUS_NODE_FAILE​D;​
1654 »   ​»   ​»   ​}
1603 »   ​»   ​»   ​if(·​job->GetRetryMax()​·​>·​0·​)​·​{1655 »   ​»   ​»   ​if(·​job->GetRetryMax()​·​>·​0·​)​·​{
1604 »   ​»   ​»   ​»   ​/​/​·​add·​#·​of·​retries·​to·​error_text1656 »   ​»   ​»   ​»   ​/​/​·​add·​#·​of·​retries·​to·​error_text
1605 »   ​»   ​»   ​»   ​char·​*tmp·​=·​strnewp(·​job->error_text·​)​;​1657 »   ​»   ​»   ​»   ​char·​*tmp·​=·​strnewp(·​job->error_text·​)​;​
1614 »   ​»   ​debug_printf(·​DEBUG_NORMAL,​·​"PRE·​Script·​of·​Node·​%s·​completed·​"1666 »   ​»   ​debug_printf(·​DEBUG_NORMAL,​·​"PRE·​Script·​of·​Node·​%s·​completed·​"
1615 »   ​»   ​»   ​»   ​"successfully.​\n",​·​job->GetJobName()​·​)​;​1667 »   ​»   ​»   ​»   ​"successfully.​\n",​·​job->GetJobName()​·​)​;​
1616 »   ​»   ​job->retval·​=·​0;​·​/​/​·​for·​safety·​on·​retries1668 »   ​»   ​job->retval·​=·​0;​·​/​/​·​for·​safety·​on·​retries
1617 »   ​»   ​job->_Status·=·​Job:​:​STATUS_READY;​1669 »   ​»   ​job->SetStatus(·​Job:​:​STATUS_READY·)​;​
1618 »   ​»   ​if·​(·​_submitDepthFirst·​)​·​{1670 »   ​»   ​if·​(·​_submitDepthFirst·​)​·​{
1619 »   ​»   ​»   ​_readyQ->Prepend(·​job,​·​-job->_nodePriority·​)​;​1671 »   ​»   ​»   ​_readyQ->Prepend(·​job,​·​-job->_nodePriority·​)​;​
1620 »   ​»   ​}·​else·​{1672 »   ​»   ​}·​else·​{
1635 {1687 {
1636 »   ​/​/​·​A·​little·​defensive·​programming.​·​Just·​check·​that·​we·​are1688 »   ​/​/​·​A·​little·​defensive·​programming.​·​Just·​check·​that·​we·​are
1637 »   ​/​/​·​allowed·​to·​run·​this·​script.​··​Because·​callers·​can·​be·​a·​little1689 »   ​/​/​·​allowed·​to·​run·​this·​script.​··​Because·​callers·​can·​be·​a·​little
1638 ········/​/​·​sloppy.​.​.​1690 »   ​/​/​·​sloppy.​.​.​
1639 »   ​if(·​(·​!ignore_status·​&&·​status·​!=·​0·​)​·​||·​!job->_scriptPost·​)​·​{1691 »   ​if(·​(·​!ignore_status·​&&·​status·​!=·​0·​)​·​||·​!job->_scriptPost·​)​·​{
1640 »   ​»   ​return·​false;​1692 »   ​»   ​return·​false;​
1641 »   ​}1693 »   ​}
1642 »   ​/​/​·​a·​POST·​script·​is·​specified·​for·​the·​job,​·​so·​run·​it1694 »   ​/​/​·​a·​POST·​script·​is·​specified·​for·​the·​job,​·​so·​run·​it
1643 »   ​/​/​·​We·​are·​told·​to·​ignore·​the·​result·​of·​the·​PRE·​script1695 »   ​/​/​·​We·​are·​told·​to·​ignore·​the·​result·​of·​the·​PRE·​script
1644 »   ​job->_Status·=·​Job:​:​STATUS_POSTRUN;​1696 »   ​job->SetStatus(·​Job:​:​STATUS_POSTRUN·)​;​
1645 »   ​if·​(·​!job->MonitorLogFile(​·​_condorLogRdr,​·​_storkLogRdr,​1697 »   ​if·​(·​!job->MonitorLogFile(​·​_condorLogRdr,​·​_storkLogRdr,​
1646 »   ​»   ​»   ​_nfsLogIsError,​·​_recovery,​·​_defaultNodeLog·​)​·​)​·​{1698 »   ​»   ​»   ​_nfsLogIsError,​·​_recovery,​·​_defaultNodeLog·​)​·​)​·​{
1647 »   ​»   ​debug_printf(DEBUG_QU​IET,​·​"Unable·​to·​monitor·​user·​logfile·​for·​node·​%s\n",​1699 »   ​»   ​debug_printf(DEBUG_QU​IET,​·​"Unable·​to·​monitor·​user·​logfile·​for·​node·​%s\n",​
1725 »   ​»   ​»   ​··​/​/​·​Exit·​here,​·​because·​otherwise·​we'll·​wait·​forever·​to·​see1777 »   ​»   ​»   ​··​/​/​·​Exit·​here,​·​because·​otherwise·​we'll·​wait·​forever·​to·​see
1726 »   ​»   ​»   ​··​/​/​·​the·​event·​that·​we·​just·​failed·​to·​write·​(see·​gittrac·​#934)​.​1778 »   ​»   ​»   ​··​/​/​·​the·​event·​that·​we·​just·​failed·​to·​write·​(see·​gittrac·​#934)​.​
1727 »   ​»   ​»   ​··​/​/​·​wenger·​2009-11-12.​1779 »   ​»   ​»   ​··​/​/​·​wenger·​2009-11-12.​
1728 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​1780 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·DAG_STATUS_ERROR·​)​;​
1729 »   ​»   ​}1781 »   ​»   ​}
1730 »   ​}1782 »   ​}
1731 »   ​return·​true;​1783 »   ​return·​true;​
1778 }1830 }
1779 1831
1780 /​/​---------------------​---------------------​---------------------​------------1832 /​/​---------------------​---------------------​---------------------​------------
1833 bool
1834 Dag:​:​FinishedRunning(·​bool·​includeFinalNode·​)​·​const
1835 {
1836 »   ​if·​(·​includeFinalNode·​&&·​_final_job·​&&·​!_runningFinalNode·​)​·​{
1837 »   ​»   ​»   ​/​/​·​Make·​sure·​we·​don't·​incorrectly·​return·​true·​if·​we·​get·​called
1838 »   ​»   ​»   ​/​/​·​just·​before·​a·​final·​node·​is·​started.​.​.​
1839 »   ​»   ​return·​false;​
1840 »   ​}
1841
1842 »   ​return·​NumJobsSubmitted()​·​==·​0·​&&·​NumNodesReady()​·​==·​0·​&&
1843 »   ​»   ​»   ​»   ​ScriptRunNodeCount()​·​==·​0;​
1844 }
1845
1846 /​/​---------------------​---------------------​---------------------​------------
1847 bool
1848 Dag:​:​DoneSuccess(·​bool·​includeFinalNode·​)​·​const
1849 {
1850 »   ​if·​(·​NumNodesDone(·​includeFinalNode·​)​·​==·​NumNodes(·​includeFinalNode·​)​·​)​·​{
1851 »   ​»   ​return·​true;​
1852 »   ​}·​else·​if·​(·​includeFinalNode·​&&·​_final_job·​&&
1853 »   ​»   ​»   ​»   ​_final_job->GetStatus​()​·​==·​Job:​:​STATUS_DONE·​)​·​{
1854 »   ​»   ​»   ​/​/​·​Final·​node·​can·​override·​the·​overall·​DAG·​status.​
1855 »   ​»   ​return·​true;​
1856 »   ​}
1857
1858 »   ​return·​false;​
1859 }
1860
1861 /​/​---------------------​---------------------​---------------------​------------
1862 bool
1863 Dag:​:​DoneFailed(·​bool·​includeFinalNode·​)​·​const
1864 {
1865 »   ​if·​(·​!FinishedRunning(·​includeFinalNode·​)​·​)​·​{
1866 »   ​»   ​»   ​/​/​·​Note:​·​if·​final·​node·​is·​running·​we·​should·​get·​to·​here.​.​.​
1867 »   ​»   ​return·​false;​
1868 »   ​}·​else·​if·​(·​includeFinalNode·​&&·​_final_job·​&&
1869 »   ​»   ​»   ​»   ​_final_job->GetStatus​()​·​==·​Job:​:​STATUS_DONE·​)​·​{
1870 »   ​»   ​»   ​/​/​·​Success·​of·​final·​node·​overrides·​failure·​of·​any·​other·​node.​
1871 »   ​»   ​return·​false;​
1872 »   ​}
1873
1874 »   ​return·​NumNodesFailed()​·​>·​0;​
1875 }
1876
1877 /​/​---------------------​---------------------​---------------------​------------
1878 int
1879 Dag:​:​NumNodes(·​bool·​includeFinal·​)​·​const
1880 {
1881 »   ​int·​result·​=·​_jobs.​Number()​;​
1882 »   ​if·​(·​!includeFinal·​&&·​HasFinalNode()​·​)​·​{
1883 »   ​»   ​result--;​
1884 »   ​}
1885
1886 »   ​return·​result;​
1887 }
1888
1889 /​/​---------------------​---------------------​---------------------​------------
1890 int
1891 Dag:​:​NumNodesDone(·​bool·​includeFinal·​)​·​const
1892 {
1893 »   ​int·​result·​=·​_numNodesDone;​
1894 »   ​if·​(·​!includeFinal·​&&·​HasFinalNode()​·​&&
1895 »   ​»   ​»   ​»   ​_final_job->GetStatus​()​·​==·​Job:​:​STATUS_DONE·​)​·​{
1896 »   ​»   ​result--;​
1897 »   ​}
1898
1899 »   ​return·​result;​
1900 }
1901
1902 /​/​---------------------​---------------------​---------------------​------------
1781 /​/​·​Note:​·​the·​Condor·​part·​of·​this·​method·​essentially·​duplicates·​functionality1903 /​/​·​Note:​·​the·​Condor·​part·​of·​this·​method·​essentially·​duplicates·​functionality
1782 /​/​·​that·​is·​now·​in·​schedd.​cpp.​··​We·​are·​keeping·​this·​here·​for·​now·​in·​case1904 /​/​·​that·​is·​now·​in·​schedd.​cpp.​··​We·​are·​keeping·​this·​here·​for·​now·​in·​case
1783 /​/​·​someone·​needs·​to·​run·​a·​7.​5.​6·​DAGMan·​with·​an·​older·​schedd.​1905 /​/​·​someone·​needs·​to·​run·​a·​7.​5.​6·​DAGMan·​with·​an·​older·​schedd.​
1885 2007
1886 /​/​---------------------​---------------------​---------------------​--------------2008 /​/​---------------------​---------------------​---------------------​--------------
1887 void·​Dag:​:​Rescue·​(·​const·​char·​*·​dagFile,​·​bool·​multiDags,​2009 void·​Dag:​:​Rescue·​(·​const·​char·​*·​dagFile,​·​bool·​multiDags,​
1888 »   ​»   ​»   ​int·​maxRescueDagNum,​·​bool·parseFailed,​·​bool·isPartial·)​·/​*·const·*/​2010 »   ​»   ​»   ​int·​maxRescueDagNum,​·​bool·overwrite,​·​bool·parseFailed,​
2011 »   ​»   ​»   ​bool·​isPartial·​)​·​/​*·​const·​*/​
1889 {2012 {
1890 »   ​MyString·​rescueDagFile;​2013 »   ​MyString·​rescueDagFile;​
1891 »   ​if·​(·​parseFailed·​)​·​{2014 »   ​if·​(·​parseFailed·​)​·​{
1894 »   ​}·​else·​{2017 »   ​}·​else·​{
1895 »   ​»   ​int·​nextRescue·​=·​FindLastRescueDagNum(​·​dagFile,​·​multiDags,​2018 »   ​»   ​int·​nextRescue·​=·​FindLastRescueDagNum(​·​dagFile,​·​multiDags,​
1896 »   ​»   ​»   ​»   ​»   ​maxRescueDagNum·​)​·​+·​1;​2019 »   ​»   ​»   ​»   ​»   ​maxRescueDagNum·​)​·​+·​1;​
1897 »   ​»   ​if·​(·nextRescue·>·maxRescueDagNum·)​·​nextRescue·=·maxRescueDagNum;​2020 »   ​»   ​if·​(·overwrite·&&·​nextRescue·>·1·)​·{
2021 »   ​»   ​»   ​nextRescue--;​
2022 »   ​»   ​}
2023 »   ​»   ​if·​(·​nextRescue·​>·​maxRescueDagNum·​)​·​{
2024 »   ​»   ​»   ​nextRescue·​=·​maxRescueDagNum;​
2025 »   ​»   ​}
1898 »   ​»   ​rescueDagFile·​=·​RescueDagName(·​dagFile,​·​multiDags,​·​nextRescue·​)​;​2026 »   ​»   ​rescueDagFile·​=·​RescueDagName(·​dagFile,​·​multiDags,​·​nextRescue·​)​;​
1899 »   ​}2027 »   ​}
1900 2028
1906 »   ​WriteRescue(·​rescueDagFile.​Value()​,​·​dagFile,​·​parseFailed,​·​isPartial·​)​;​2034 »   ​WriteRescue(·​rescueDagFile.​Value()​,​·​dagFile,​·​parseFailed,​·​isPartial·​)​;​
1907 }2035 }
1908 2036
1909 static·​const·​char·​*RESCUE_DAG_VERSION·​=·​"2.​0.​0";​2037 static·​const·​char·​*RESCUE_DAG_VERSION·​=·​"2.​0.​1";​
1910 2038
1911 /​/​---------------------​---------------------​---------------------​--------------2039 /​/​---------------------​---------------------​---------------------​--------------
1912 void·​Dag:​:​WriteRescue·​(const·​char·​*·​rescue_file,​·​const·​char·​*·​dagFile,​2040 void·​Dag:​:​WriteRescue·​(const·​char·​*·​rescue_file,​·​const·​char·​*·​dagFile,​
1945 »   ​»   ​»   ​»   ​isPartial·​?·​"partial"·​:​·​"full"·​)​;​2073 »   ​»   ​»   ​»   ​isPartial·​?·​"partial"·​:​·​"full"·​)​;​
1946 2074
1947 ····​fprintf(fp,​·​"#\n")​;​2075 ····​fprintf(fp,​·​"#\n")​;​
1948 ····​fprintf(fp,​·​"#·​Total·​number·​of·​Nodes:​·​%d\n",​·​NumNodes()​)​;​2076 ····​fprintf(fp,​·​"#·​Total·​number·​of·​Nodes:​·​%d\n",​·​NumNodes(·true·)​)​;​
1949 ····​fprintf(fp,​·​"#·​Nodes·​premarked·​DONE:​·​%d\n",​·​_numNodesDone)​;​2077 ····​fprintf(fp,​·​"#·​Nodes·​premarked·​DONE:​·​%d\n",​·​_numNodesDone)​;​
1950 ····​fprintf(fp,​·​"#·​Nodes·​that·​failed:​·​%d\n",​·​_numNodesFailed)​;​2078 ····​fprintf(fp,​·​"#·​Nodes·​that·​failed:​·​%d\n",​·​_numNodesFailed)​;​
1951 2079
2040 {2168 {
2041 »   ​»   ​/​/​·​Print·​the·​JOB/​DATA·​line.​2169 »   ​»   ​/​/​·​Print·​the·​JOB/​DATA·​line.​
2042 »   ​const·​char·​*keyword·​=·​"";​2170 »   ​const·​char·​*keyword·​=·​"";​
2043 »   ​if·​(·​node->JobType()​·==·Job:​:​TYPE_CONDOR·​)​·​{2171 »   ​if·​(·​node->GetFinal()​·​)​·​{
2172 »   ​»   ​keyword·​=·​"FINAL";​
2173 »   ​}·​else·​if·​(·​node->JobType()​·​==·​Job:​:​TYPE_CONDOR·​)​·​{
2044 »   ​»   ​keyword·​=·​node->GetDagFile()​·​?·​"SUBDAG·​EXTERNAL"·​:​·​"JOB";​2174 »   ​»   ​keyword·​=·​node->GetDagFile()​·​?·​"SUBDAG·​EXTERNAL"·​:​·​"JOB";​
2045 »   ​}·​else·​if(·​node->JobType()​·​==·​Job:​:​TYPE_STORK·​)​·​{2175 »   ​}·​else·​if(·​node->JobType()​·​==·​Job:​:​TYPE_STORK·​)​·​{
2046 »   ​»   ​keyword·​=·​"DATA";​2176 »   ​»   ​keyword·​=·​"DATA";​
2130 »   ​»   ​}2260 »   ​»   ​}
2131 »   ​}2261 »   ​}
2132 2262
2133 »   ​if·(·node->_Status·==·Job:​:​STATUS_DONE·)​·{2263 »   ​»   ​/​/​·Never·mark·a·FINAL·node·as·done.​
2264 »   ​if·​(·​node->GetStatus()​·​==·​Job:​:​STATUS_DONE·​&&·​!node->GetFinal()​·​)​·​{
2134 »   ​»   ​fprintf(fp,​·​"DONE·​%s\n",​·​node->GetJobName()​·​)​;​2265 »   ​»   ​fprintf(fp,​·​"DONE·​%s\n",​·​node->GetJobName()​·​)​;​
2135 »   ​}2266 »   ​}
2136 2267
2138 »   ​if(·​node->retry_max·​>·​0·​)​·​{2269 »   ​if(·​node->retry_max·​>·​0·​)​·​{
2139 »   ​»   ​int·​retriesLeft·​=·​(node->retry_max·​-·​node->retries)​;​2270 »   ​»   ​int·​retriesLeft·​=·​(node->retry_max·​-·​node->retries)​;​
2140 2271
2141 »   ​»   ​if·​(·​node->_Status·​==·​Job:​:​STATUS_ERROR2272 »   ​»   ​if·​(·​node->GetStatus()​·​==·​Job:​:​STATUS_ERROR
2142 »   ​»   ​»   ​»   ​»   ​&&·​node->retries·​<·​node->retry_max·2273 »   ​»   ​»   ​»   ​»   ​&&·​node->retries·​<·​node->retry_max·
2143 »   ​»   ​»   ​»   ​»   ​&&·​node->have_retry_abor​t_val2274 »   ​»   ​»   ​»   ​»   ​&&·​node->have_retry_abor​t_val
2144 »   ​»   ​»   ​»   ​»   ​&&·​node->retval·​==·​node->retry_abort_val​·​)​·​{2275 »   ​»   ​»   ​»   ​»   ​&&·​node->retval·​==·​node->retry_abort_val​·​)​·​{
2177 »   ​ASSERT(·​!(recovery·​&&·​bootstrap)​·​)​;​2308 »   ​ASSERT(·​!(recovery·​&&·​bootstrap)​·​)​;​
2178 ····​ASSERT(·​job·​!=·​NULL·​)​;​2309 ····​ASSERT(·​job·​!=·​NULL·​)​;​
2179 2310
2311 »   ​if·​(·​job·​==·​_final_job·​)​·​{
2312 »   ​»   ​_runningFinalNode·​=·​false;​
2313 »   ​}
2314
2180 »   ​job->TerminateSuccess​()​;​·​/​/​·​marks·​job·​as·​STATUS_DONE2315 »   ​job->TerminateSuccess​()​;​·​/​/​·​marks·​job·​as·​STATUS_DONE
2181 »   ​if·​(·​job->GetStatus()​·​!=·​Job:​:​STATUS_DONE·​)​·​{2316 »   ​if·​(·​job->GetStatus()​·​!=·​Job:​:​STATUS_DONE·​)​·​{
2182 »   ​»   ​EXCEPT(·​"Node·​%s·​is·​not·​in·​DONE·​state",​·​job->GetJobName()​·​)​;​2317 »   ​»   ​EXCEPT(·​"Node·​%s·​is·​not·​in·​DONE·​state",​·​job->GetJobName()​·​)​;​
2250 Dag:​:​RestartNode(·​Job·​*node,​·​bool·​recovery·​)​2385 Dag:​:​RestartNode(·​Job·​*node,​·​bool·​recovery·​)​
2251 {2386 {
2252 ····​ASSERT(·​node·​!=·​NULL·​)​;​2387 ····​ASSERT(·​node·​!=·​NULL·​)​;​
2253 »   ​if·​(·​node->_Status·​!=·​Job:​:​STATUS_ERROR·​)​·​{2388 »   ​if·​(·​node->GetStatus()​·​!=·​Job:​:​STATUS_ERROR·​)​·​{
2254 »   ​»   ​EXCEPT(·​"Node·​%s·​is·​not·​in·​ERROR·​state",​·​node->GetJobName()​·​)​;​2389 »   ​»   ​EXCEPT(·​"Node·​%s·​is·​not·​in·​ERROR·​state",​·​node->GetJobName()​·​)​;​
2255 »   ​}2390 »   ​}
2256 ····​if(·​node->have_retry_abor​t_val·​&&·​node->retval·​==·​node->retry_abort_val​·​)​·​{2391 ····​if(·​node->have_retry_abor​t_val·​&&·​node->retval·​==·​node->retry_abort_val​·​)​·​{
2258 ······················​"(last·​attempt·​returned·​%d)​\n",​2393 ······················​"(last·​attempt·​returned·​%d)​\n",​
2259 ······················​node->GetJobName()​,​·​node->retval)​;​2394 ······················​node->GetJobName()​,​·​node->retval)​;​
2260 ········​_numNodesFailed++;​2395 ········​_numNodesFailed++;​
2396 »   ​»   ​if·​(·​_dagStatus·​==·​DAG_STATUS_OK·​)​·​{
2397 »   ​»   ​»   ​_dagStatus·​=·​DAG_STATUS_NODE_FAILE​D;​
2398 »   ​»   ​}
2261 ········​return;​2399 ········​return;​
2262 ····​}2400 ····​}
2263 »   ​node->_Status·=·​Job:​:​STATUS_READY;​2401 »   ​node->SetStatus(·​Job:​:​STATUS_READY·)​;​
2264 »   ​node->retries++;​2402 »   ​node->retries++;​
2265 »   ​ASSERT(·​node->GetRetries()​·​<=·​node->GetRetryMax()​·​)​;​2403 »   ​ASSERT(·​node->GetRetries()​·​<=·​node->GetRetryMax()​·​)​;​
2266 »   ​if(·​node->_scriptPre·​)​·​{2404 »   ​if(·​node->_scriptPre·​)​·​{
2385 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"Aborting·​DAG·​because·​we·​got·​"2523 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"Aborting·​DAG·​because·​we·​got·​"
2386 »   ​»   ​»   ​»   ​"the·​ABORT·​exit·​value·​from·​a·​%s\n",​·​type)​;​2524 »   ​»   ​»   ​»   ​"the·​ABORT·​exit·​value·​from·​a·​%s\n",​·​type)​;​
2387 »   ​»   ​if·​(·​job->have_abort_dag_r​eturn_val·​)​·​{2525 »   ​»   ​if·​(·​job->have_abort_dag_r​eturn_val·​)​·​{
2388 »   ​»   ​»   ​main_shutdown_rescue(​·​job->abort_dag_return​_val·)​;​2526 »   ​»   ​»   ​main_shutdown_rescue(​·​job->abort_dag_return​_val,​
2527 »   ​»   ​»   ​»   ​»   ​»   ​job->abort_dag_return​_val·​!=·​0·​?·​DAG_STATUS_ABORT·​:​
2528 »   ​»   ​»   ​»   ​»   ​»   ​DAG_STATUS_OK·​)​;​
2389 »   ​»   ​}·​else·​{2529 »   ​»   ​}·​else·​{
2390 »   ​»   ​»   ​main_shutdown_rescue(​·​job->retval·​)​;​2530 »   ​»   ​»   ​main_shutdown_rescue(​·​job->retval,​·DAG_STATUS_ABORT·​)​;​
2391 »   ​»   ​}2531 »   ​»   ​}
2392 »   ​»   ​return·​true;​2532 »   ​»   ​return·​true;​
2393 »   ​}2533 »   ​}
2570 »   ​bool·​tooSoon·​=·​(_minStatusUpdateTime​·​>·​0)​·​&&2710 »   ​bool·​tooSoon·​=·​(_minStatusUpdateTime​·​>·​0)​·​&&
2571 »   ​»   ​»   ​»   ​((startTime·​-·​_lastStatusUpdateTime​stamp)​·​<2711 »   ​»   ​»   ​»   ​((startTime·​-·​_lastStatusUpdateTime​stamp)​·​<
2572 »   ​»   ​»   ​»   ​_minStatusUpdateTime)​;​2712 »   ​»   ​»   ​»   ​_minStatusUpdateTime)​;​
2573 »   ​if·​(·​tooSoon·​&&·​!held·​&&·​!removed·​&&·​!FinishedRunning()​·​)​·​{2713 »   ​if·​(·​tooSoon·​&&·​!held·​&&·​!removed·​&&·​!FinishedRunning(·true·)​·​)​·​{
2574 »   ​»   ​debug_printf(·​DEBUG_DEBUG_1,​·​"Node·​status·​file·​not·​updated·​"2714 »   ​»   ​debug_printf(·​DEBUG_DEBUG_1,​·​"Node·​status·​file·​not·​updated·​"
2575 »   ​»   ​»   ​»   ​»   ​"because·​min.​·​status·​update·​time·​has·​not·​yet·​passed\n"·​)​;​2715 »   ​»   ​»   ​»   ​»   ​"because·​min.​·​status·​update·​time·​has·​not·​yet·​passed\n"·​)​;​
2576 »   ​»   ​return;​2716 »   ​»   ​return;​
2646 »   ​»   ​/​/​2786 »   ​»   ​/​/​
2647 »   ​Job:​:​status_t·​dagStatus·​=·​Job:​:​STATUS_SUBMITTED;​2787 »   ​Job:​:​status_t·​dagStatus·​=·​Job:​:​STATUS_SUBMITTED;​
2648 »   ​const·​char·​*statusNote·​=·​"";​2788 »   ​const·​char·​*statusNote·​=·​"";​
2649 »   ​if·​(·​DoneSuccess()​·​)​·​{2789 »   ​if·​(·​DoneSuccess(·true·)​·​)​·​{
2650 »   ​»   ​dagStatus·​=·​Job:​:​STATUS_DONE;​2790 »   ​»   ​dagStatus·​=·​Job:​:​STATUS_DONE;​
2651 »   ​»   ​statusNote·​=·​"success";​2791 »   ​»   ​statusNote·​=·​"success";​
2652 »   ​}·​else·​if·​(·​DoneFailed()​·​)​·​{2792 »   ​}·​else·​if·​(·​DoneFailed(·true·)​·​)​·​{
2653 »   ​»   ​dagStatus·​=·​Job:​:​STATUS_ERROR;​2793 »   ​»   ​dagStatus·​=·​Job:​:​STATUS_ERROR;​
2654 »   ​»   ​statusNote·​=·​"failed";​2794 »   ​»   ​statusNote·​=·​"failed";​
2655 »   ​}·​else·​if·​(·​DoneCycle()​·​)​·​{2795 »   ​}·​else·​if·​(·​DoneCycle(·true·)​·​)​·​{
2656 »   ​»   ​dagStatus·​=·​Job:​:​STATUS_ERROR;​2796 »   ​»   ​dagStatus·​=·​Job:​:​STATUS_ERROR;​
2657 »   ​»   ​statusNote·​=·​"cycle";​2797 »   ​»   ​statusNote·​=·​"cycle";​
2658 »   ​}·​else·​if·​(·​held·​)​·​{2798 »   ​}·​else·​if·​(·​held·​)​·​{
2670 »   ​time_t·​endTime·​=·​time(·​NULL·​)​;​2810 »   ​time_t·​endTime·​=·​time(·​NULL·​)​;​
2671 2811
2672 »   ​fprintf(·​outfile,​·​"Next·​scheduled·​update:​·​"·​)​;​2812 »   ​fprintf(·​outfile,​·​"Next·​scheduled·​update:​·​"·​)​;​
2673 »   ​if·​(·​FinishedRunning()​·​||·​removed·​)​·​{2813 »   ​if·​(·​FinishedRunning(·true·)​·​||·​removed·​)​·​{
2674 »   ​»   ​fprintf(·​outfile,​·​"none\n"·​)​;​2814 »   ​»   ​fprintf(·​outfile,​·​"none\n"·​)​;​
2675 »   ​}·​else·​{2815 »   ​}·​else·​{
2676 »   ​»   ​time_t·​nextTime·​=·​endTime·​+·​_minStatusUpdateTime;​2816 »   ​»   ​time_t·​nextTime·​=·​endTime·​+·​_minStatusUpdateTime;​
3077 »   ​insertResult·​=·​_nodeIDHash.​insert(·​job.​GetJobID()​,​·​&job·​)​;​3217 »   ​insertResult·​=·​_nodeIDHash.​insert(·​job.​GetJobID()​,​·​&job·​)​;​
3078 »   ​ASSERT(·​insertResult·​==·​0·​)​;​3218 »   ​ASSERT(·​insertResult·​==·​0·​)​;​
3079 3219
3220 »   ​»   ​/​/​·​Final·​node·​status·​is·​set·​to·​STATUS_NOT_READY·​here,​·​so·​it
3221 »   ​»   ​/​/​·​won't·​get·​run·​even·​though·​it·​has·​no·​parents;​·​its·​status
3222 »   ​»   ​/​/​·​will·​get·​changed·​when·​it·​should·​be·​run.​
3223 »   ​if·​(·​job.​GetFinal()​·​)​·​{
3224 »   ​»   ​if·​(·​_final_job·​)​·​{
3225 ········»   ​debug_printf(·​DEBUG_QUIET,​·​"Error:​·​DAG·​already·​has·​a·​final·​"
3226 »   ​»   ​»   ​»   ​»   ​»   ​"node·​%s;​·​attempting·​to·​add·​final·​node·​%s\n",​
3227 »   ​»   ​»   ​»   ​»   ​»   ​_final_job->GetJobNam​e()​,​·​job.​GetJobName()​·​)​;​
3228 »   ​»   ​»   ​return·​false;​
3229 »   ​»   ​}
3230 »   ​»   ​job.​SetStatus(·​Job:​:​STATUS_NOT_READY·​)​;​
3231 »   ​»   ​_final_job·​=·​&job;​
3232 »   ​}
3080 »   ​return·​_jobs.​Append(&job)​;​3233 »   ​return·​_jobs.​Append(&job)​;​
3081 }3234 }
3082 3235
3083
3084 /​/​---------------------​---------------------​---------------------​------------3236 /​/​---------------------​---------------------​---------------------​------------
3085 bool3237 bool
3086 Dag:​:​RemoveNode(·​const·​char·​*name,​·​MyString·​&whynot·​)​3238 Dag:​:​RemoveNode(·​const·​char·​*name,​·​MyString·​&whynot·​)​
3116 »   ​else·​if(·​node->GetStatus()​·​==·​Job:​:​STATUS_ERROR·​)​·​{3268 »   ​else·​if(·​node->GetStatus()​·​==·​Job:​:​STATUS_ERROR·​)​·​{
3117 »   ​»   ​_numNodesFailed--;​3269 »   ​»   ​_numNodesFailed--;​
3118 »   ​»   ​ASSERT(·​_numNodesFailed·​>=·​0·​)​;​3270 »   ​»   ​ASSERT(·​_numNodesFailed·​>=·​0·​)​;​
3271 »   ​»   ​if·​(·​_numNodesFailed·​==·​0·​&&·​_dagStatus·​==·​DAG_STATUS_NODE_FAILE​D·​)​·​{
3272 »   ​»   ​»   ​_dagStatus·​=·​DAG_STATUS_OK;​
3273 »   ​»   ​}
3119 »   ​}3274 »   ​}
3120 »   ​else·​if(·​node->GetStatus()​·​==·​Job:​:​STATUS_READY·​)​·​{3275 »   ​else·​if(·​node->GetStatus()​·​==·​Job:​:​STATUS_READY·​)​·​{
3121 »   ​»   ​ASSERT(·​_readyQ·​)​;​3276 »   ​»   ​ASSERT(·​_readyQ·​)​;​
3424 »   ​»   ​»   ​»   ​»   ​"DAGMAN_ABORT_ON_SCAR​Y_SUBMIT·​to·​false·​if·​you·​are·​"3579 »   ​»   ​»   ​»   ​»   ​"DAGMAN_ABORT_ON_SCAR​Y_SUBMIT·​to·​false·​if·​you·​are·​"
3425 »   ​»   ​»   ​»   ​»   ​"*sure*·​this·​shouldn't·​cause·​an·​abort.​\n",​3580 »   ​»   ​»   ​»   ​»   ​"*sure*·​this·​shouldn't·​cause·​an·​abort.​\n",​
3426 »   ​»   ​»   ​»   ​»   ​message.​Value()​·​)​;​3581 »   ​»   ​»   ​»   ​»   ​message.​Value()​·​)​;​
3427 »   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​3582 »   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·DAG_STATUS_ERROR·​)​;​
3428 »   ​»   ​return·​true;​3583 »   ​»   ​return·​true;​
3429 »   ​}·​else·​{3584 »   ​}·​else·​{
3430 »   ​»   ​debug_printf(·​DEBUG_QUIET,​3585 »   ​»   ​debug_printf(·​DEBUG_QUIET,​
3566 »   ​»   ​debug_printf(·​DEBUG_NORMAL,​·​"ERROR:​·​No·​'log·​='·​value·​found·​in·​"3721 »   ​»   ​debug_printf(·​DEBUG_NORMAL,​·​"ERROR:​·​No·​'log·​='·​value·​found·​in·​"
3567 »   ​»   ​»   ​»   ​»   ​"submit·​file·​%s·​for·​node·​%s\n",​·​node->GetCmdFile()​,​3722 »   ​»   ​»   ​»   ​»   ​"submit·​file·​%s·​for·​node·​%s\n",​·​node->GetCmdFile()​,​
3568 »   ​»   ​»   ​»   ​»   ​node->GetJobName()​·​)​;​3723 »   ​»   ​»   ​»   ​»   ​node->GetJobName()​·​)​;​
3569 »   ​»   ​node->_Status·=·Job:​:​STATUS_ERROR;​3724 »   ​»   ​node->TerminateFailur​e()​;​
3570 »   ​»   ​snprintf(·​node->error_text,​·​JOB_ERROR_TEXT_MAXLEN​,​3725 »   ​»   ​snprintf(·​node->error_text,​·​JOB_ERROR_TEXT_MAXLEN​,​
3571 »   ​»   ​»   ​»   ​»   ​"No·​'log·​='·​value·​found·​in·​submit·​file·​%s",​3726 »   ​»   ​»   ​»   ​»   ​"No·​'log·​='·​value·​found·​in·​submit·​file·​%s",​
3572 »   ​»   ​»   ​»   ​»   ​node->GetCmdFile()​·​)​;​3727 »   ​»   ​»   ​»   ​»   ​node->GetCmdFile()​·​)​;​
3573 »   ​··»   ​_numNodesFailed++;​3728 »   ​··»   ​_numNodesFailed++;​
3729 »   ​»   ​if·​(·​_dagStatus·​==·​DAG_STATUS_OK·​)​·​{
3730 »   ​»   ​»   ​_dagStatus·​=·​DAG_STATUS_NODE_FAILE​D;​
3731 »   ​»   ​}
3574 »   ​»   ​result·​=·​SUBMIT_RESULT_NO_SUBM​IT;​3732 »   ​»   ​result·​=·​SUBMIT_RESULT_NO_SUBM​IT;​
3575 3733
3576 »   ​}·​else·​{3734 »   ​}·​else·​{
3640 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​_submitQ->enqueue()​·​failed!\n"·​)​;​3798 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​_submitQ->enqueue()​·​failed!\n"·​)​;​
3641 »   ​}3799 »   ​}
3642 3800
3643 ····​node->_Status·=·​Job:​:​STATUS_SUBMITTED;​3801 ····​node->SetStatus(·​Job:​:​STATUS_SUBMITTED·)​;​
3644 3802
3645 »   ​»   ​/​/​·​Note:​·​I·​assume·​we're·​incrementing·​this·​here·​instead·​of·​when3803 »   ​»   ​/​/​·​Note:​·​I·​assume·​we're·​incrementing·​this·​here·​instead·​of·​when
3646 »   ​»   ​/​/​·​we·​see·​the·​submit·​events·​so·​that·​we·​don't·​accidentally·​exceed3804 »   ​»   ​/​/​·​we·​see·​the·​submit·​events·​so·​that·​we·​don't·​accidentally·​exceed
3707 »   ​»   ​»   ​node->_scriptPost->_r​etValJob·​=·​DAG_ERROR_CONDOR_SUBM​IT_FAILED;​3865 »   ​»   ​»   ​node->_scriptPost->_r​etValJob·​=·​DAG_ERROR_CONDOR_SUBM​IT_FAILED;​
3708 »   ​»   ​»   ​(void)​RunPostScript(·​node,​·​_alwaysRunPost,​·​0·​)​;​3866 »   ​»   ​»   ​(void)​RunPostScript(·​node,​·​_alwaysRunPost,​·​0·​)​;​
3709 »   ​»   ​}·​else·​{3867 »   ​»   ​}·​else·​{
3710 »   ​»   ​»   ​node->_Status·=·Job:​:​STATUS_ERROR;​3868 »   ​»   ​»   ​node->TerminateFailur​e()​;​
3711 »   ​»   ​»   ​_numNodesFailed++;​3869 »   ​»   ​»   ​_numNodesFailed++;​
3870 »   ​»   ​»   ​if·​(·​_dagStatus·​==·​DAG_STATUS_OK·​)​·​{
3871 »   ​»   ​»   ​»   ​_dagStatus·​=·​DAG_STATUS_NODE_FAILE​D;​
3872 »   ​»   ​»   ​}
3712 »   ​»   ​}3873 »   ​»   ​}
3713 »   ​}·​else·​{3874 »   ​}·​else·​{
3714 »   ​»   ​/​/​·​We·​have·​more·​submit·​attempts·​left,​·​put·​this·​node·​back·​into·​the3875 »   ​»   ​/​/​·​We·​have·​more·​submit·​attempts·​left,​·​put·​this·​node·​back·​into·​the
3742 void3903 void
3743 Dag:​:​UpdateJobCounts(·​Job·​*node,​·​int·​change·​)​3904 Dag:​:​UpdateJobCounts(·​Job·​*node,​·​int·​change·​)​
3744 {3905 {
3745 ····_numJobsSubmitted·​+=·​change;​3906 »   ​_numJobsSubmitted·​+=·​change;​
3746 »   ​ASSERT(·​_numJobsSubmitted·​>=·​0·​)​;​3907 »   ​ASSERT(·​_numJobsSubmitted·​>=·​0·​)​;​
3747 3908
3748 »   ​if·​(·​node->GetThrottleInfo​()​·​)​·​{3909 »   ​if·​(·​node->GetThrottleInfo​()​·​)​·​{
3770 void3931 void
3771 Dag:​:​PropogateDirectoryToA​llNodes(void)​3932 Dag:​:​PropogateDirectoryToA​llNodes(void)​
3772 {3933 {
3773 ····Job·​*job·​=·​NULL;​3934 »   ​Job·​*job·​=·​NULL;​
3774 »   ​MyString·​key;​3935 »   ​MyString·​key;​
3775 3936
3776 »   ​if·​(m_directory·​==·​".​")​·​{3937 »   ​if·​(m_directory·​==·​".​")​·​{
3795 void3956 void
3796 Dag:​:​PrefixAllNodeNames(co​nst·​MyString·​&prefix)​3957 Dag:​:​PrefixAllNodeNames(co​nst·​MyString·​&prefix)​
3797 {3958 {
3798 ····Job·​*job·​=·​NULL;​3959 »   ​Job·​*job·​=·​NULL;​
3799 »   ​MyString·​key;​3960 »   ​MyString·​key;​
3800 3961
3801 »   ​debug_printf(DEBUG_DE​BUG_1,​·​"Entering:​·​Dag:​:​PrefixAllNodeNames()​"3962 »   ​debug_printf(DEBUG_DE​BUG_1,​·​"Entering:​·​Dag:​:​PrefixAllNodeNames()​"
3802 »   ​»   ​"·​with·​prefix·​%s\n",​prefix.​Value()​)​;​3963 »   ​»   ​"·​with·​prefix·​%s\n",​prefix.​Value()​)​;​
3803 3964
3804 ····_jobs.​Rewind()​;​3965 »   ​_jobs.​Rewind()​;​
3805 ····while(·​(job·​=·​_jobs.​Next()​)​·​)​·​{3966 »   ​while(·​(job·​=·​_jobs.​Next()​)​·​)​·​{
3806 »   ​··job->PrefixName(prefi​x)​;​3967 »   ​»   ​ASSERT(·job·!=·NULL·)​;​
3807 ····}3968 »   ​»   ​job->PrefixName(prefi​x)​;​
3969 »   ​}
3808 3970
3809 »   ​/​/​·​Here·​we·​must·​reindex·​the·​hash·​view·​with·​the·​prefixed·​name.​3971 »   ​/​/​·​Here·​we·​must·​reindex·​the·​hash·​view·​with·​the·​prefixed·​name.​
3810 »   ​/​/​·​also·​fix·​my·​node·​name·​hash·​view·​of·​the·​jobs3972 »   ​/​/​·​also·​fix·​my·​node·​name·​hash·​view·​of·​the·​jobs
3816 »   ​}3978 »   ​}
3817 3979
3818 »   ​/​/​·​Then,​·​reindex·​all·​the·​jobs·​keyed·​by·​their·​new·​name3980 »   ​/​/​·​Then,​·​reindex·​all·​the·​jobs·​keyed·​by·​their·​new·​name
3819 ····_jobs.​Rewind()​;​3981 »   ​_jobs.​Rewind()​;​
3820 ····while(·​(job·​=·​_jobs.​Next()​)​·​)​·​{3982 »   ​while(·​(job·​=·​_jobs.​Next()​)​·​)​·​{
3821 »   ​··key·=·​job->GetJobName()​;​3983 »   ​»   ​ASSERT(·​job·!=·NULL·)​;​
3984 »   ​»   ​key·​=·​job->GetJobName()​;​
3822 »   ​»   ​if·​(_nodeNameHash.​insert(key,​·​job)​·​!=·​0)​·​{3985 »   ​»   ​if·​(_nodeNameHash.​insert(key,​·​job)​·​!=·​0)​·​{
3823 »   ​»   ​»   ​/​/​·​I'm·​reinserting·​everything·​newly,​·​so·​this·​should·​never·​happen3986 »   ​»   ​»   ​/​/​·​I'm·​reinserting·​everything·​newly,​·​so·​this·​should·​never·​happen
3824 »   ​»   ​»   ​/​/​·​unless·​two·​jobs·​have·​an·​identical·​name,​·​which·​means·​another3987 »   ​»   ​»   ​/​/​·​unless·​two·​jobs·​have·​an·​identical·​name,​·​which·​means·​another
3827 »   ​»   ​»   ​debug_error(1,​·​DEBUG_QUIET,​·3990 »   ​»   ​»   ​debug_error(1,​·​DEBUG_QUIET,​·
3828 »   ​»   ​»   ​»   ​"Dag:​:​PrefixAllNodeNames()​:​·​This·​is·​an·​impossible·​error\n")​;​3991 »   ​»   ​»   ​»   ​"Dag:​:​PrefixAllNodeNames()​:​·​This·​is·​an·​impossible·​error\n")​;​
3829 »   ​»   ​}3992 »   ​»   ​}
3830 ····}3993 »   ​}
3831 3994
3832 »   ​debug_printf(DEBUG_DE​BUG_1,​·​"Leaving:​·​Dag:​:​PrefixAllNodeNames()​\n")​;​3995 »   ​debug_printf(DEBUG_DE​BUG_1,​·​"Leaving:​·​Dag:​:​PrefixAllNodeNames()​\n")​;​
3833 }3996 }
diff --git a/src/condor_dagman/dag.h b/src/condor_dagman/dag.h
index ebfe2ed4013f05001d529741adfb1ce2135cd1f2..1d6b645a08ce6b012dd9154424f04d1d2db9a796 100644
a/​src/​condor_dagman/​dag.​hb/​src/​condor_dagman/​dag.​h
160 160
161 ····​/​/​/​·​Add·​a·​job·​to·​the·​collection·​of·​jobs·​managed·​by·​this·​Dag.​161 ····​/​/​/​·​Add·​a·​job·​to·​the·​collection·​of·​jobs·​managed·​by·​this·​Dag.​
162 ····​bool·​Add(·​Job&·​job·​)​;​162 ····​bool·​Add(·​Job&·​job·​)​;​
163 ··
164 ····​/​**·​Specify·​a·​dependency·​between·​two·​jobs.​·​The·​child·​job·​will·​only163 ····​/​**·​Specify·​a·​dependency·​between·​two·​jobs.​·​The·​child·​job·​will·​only
165 ········​run·​after·​the·​parent·​job·​has·​finished.​164 ········​run·​after·​the·​parent·​job·​has·​finished.​
166 ········​@param·​parent·​The·​parent·​job165 ········​@param·​parent·​The·​parent·​job
310 ····​void·​PrintJobList()​·​const;​309 ····​void·​PrintJobList()​·​const;​
311 ····​void·​PrintJobList(·​Job:​:​status_t·​status·​)​·​const;​310 ····​void·​PrintJobList(·​Job:​:​status_t·​status·​)​·​const;​
312 311
313 ····​/​**·​@return·the·​total·​number·of·nodes·​in·​the·DAG312 ····​/​**·​@param·whether·​to·include·final·node,​·if·any,​·​in·​the·count
313 »   ​»   ​@return·​the·​total·​number·​of·​nodes·​in·​the·​DAG
314 ·····​*/​314 ·····​*/​
315 ····inline·​int·​NumNodes()​·const·{·return·_jobs.​Number()​;​·}315 ····​int·​NumNodes(·bool·includeFinal·)​·const;​
316 316
317 ····​/​**·​@return·​the·number·of·​nodes·completed317 ····​/​**·​@param·whether·​to·include·final·​node,​·if·any,​·in·the·count
318 ····»   ​@return·​the·​number·​of·​nodes·​completed
318 ·····​*/​319 ·····​*/​
319 ····inline·​int·​NumNodesDone()​·const·{·return·_numNodesDone;​·}320 ····​int·​NumNodesDone(·bool·includeFinal·)​·const;​
320 321
321 ····​/​**·​@return·​the·​number·​of·​nodes·​that·​failed·​in·​the·​DAG322 ····​/​**·​@return·​the·​number·​of·​nodes·​that·​failed·​in·​the·​DAG
322 ·····​*/​323 ·····​*/​
361 »   ​}362 »   ​}
362 363
363 »   ​/​**·​@return·​the·​number·​of·​nodes·​currently·​in·​the·​status364 »   ​/​**·​@return·​the·​number·​of·​nodes·​currently·​in·​the·​status
364 »   ​·​*··········​Job:​:​STATUS_PRERUN·​(whether·​or·​not·​the·script·is·actually365 »   ​·​*··········​Job:​:​STATUS_PRERUN·​(whether·​or·​not·​their·PRE
365 »   ​·​*»  ​»   ​»   ​running)​.​366 »   ​·​*»  ​»   ​»   ​script·is·actually·running)​.​
366 »   ​·​*/​367 »   ​·​*/​
367 »   ​inline·​int·​PreRunNodeCount()​·​const368 »   ​inline·​int·​PreRunNodeCount()​·​const
368 »   ​»   ​{·​return·​_preRunNodeCount;​·​}369 »   ​»   ​{·​return·​_preRunNodeCount;​·​}
369 370
370 »   ​/​**·​@return·​the·​number·​of·​nodes·​currently·​in·​the·​status371 »   ​/​**·​@return·​the·​number·​of·​nodes·​currently·​in·​the·​status
371 »   ​·​*··········​Job:​:​STATUS_POSTRUN·​(whether·​or·​not·​the·script·is·actually372 »   ​·​*··········​Job:​:​STATUS_POSTRUN·​(whether·​or·​not·​their·POST
372 »   ​·​*»  ​»   ​»   ​running)​.​373 »   ​·​*»  ​»   ​»   ​script·is·actually·running)​.​
373 »   ​·​*/​374 »   ​·​*/​
374 »   ​inline·​int·​PostRunNodeCount()​·​const375 »   ​inline·​int·​PostRunNodeCount()​·​const
375 »   ​»   ​{·​return·​_postRunNodeCount;​·​}376 »   ​»   ​{·​return·​_postRunNodeCount;​·​}
386 »   ​····»   ​(If·​no·​jobs·​are·​submitted·​and·​no·​scripts·​are·​running,​·​but·​the387 »   ​····»   ​(If·​no·​jobs·​are·​submitted·​and·​no·​scripts·​are·​running,​·​but·​the
387 »   ​»   ​····​dag·​is·​not·​complete,​·​then·​at·​least·​one·​job·​failed,​·​or·​a·​cycle388 »   ​»   ​····​dag·​is·​not·​complete,​·​then·​at·​least·​one·​job·​failed,​·​or·​a·​cycle
388 »   ​»   ​»   ​exists.​)​389 »   ​»   ​»   ​exists.​)​
390 ····»   ​»   ​@param·​whether·​to·​consider·​the·​final·​node,​·​if·​any
389 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​finished391 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​finished
390 »   ​»   ​*/​392 »   ​»   ​*/​
391 »   ​inline·bool·​FinishedRunning()​·const·{·return·NumJobsSubmitted()​·==·0·&&393 »   ​bool·​FinishedRunning(·bool·includeFinalNode·)​·const;​
392 »   ​»   ​»   ​»   ​NumNodesReady()​·​==·​0·​&&·​ScriptRunNodeCount()​·​==·​0;​·​}
393 394
394 »   ​»   ​/​**·​Determine·​whether·​the·​DAG·​is·​successfully·​completed.​395 »   ​»   ​/​**·​Determine·​whether·​the·​DAG·​is·​successfully·​completed.​
396 ····»   ​»   ​@param·​whether·​to·​consider·​the·​final·​node,​·​if·​any
395 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​successfully·​completed397 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​successfully·​completed
396 »   ​»   ​*/​398 »   ​»   ​*/​
397 »   ​inline·bool·​DoneSuccess()​·const·{·return·NumNodesDone()​·==·NumNodes()​;​·}399 »   ​bool·​DoneSuccess(·bool·includeFinalNode·)​·const;​
398 400
399 »   ​»   ​/​**·​Determine·​whether·​the·​DAG·​is·​finished,​·​but·​failed·​(because401 »   ​»   ​/​**·​Determine·​whether·​the·​DAG·​is·​finished,​·​but·​failed·​(because
400 »   ​»   ​»   ​of·​a·​node·​job·​failure,​·​etc.​)​.​402 »   ​»   ​»   ​of·​a·​node·​job·​failure,​·​etc.​)​.​
403 ····»   ​»   ​@param·​whether·​to·​consider·​the·​final·​node,​·​if·​any
401 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​finished·​but·​failed404 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​finished·​but·​failed
402 »   ​»   ​*/​405 »   ​»   ​*/​
403 »   ​inline·bool·​DoneFailed()​·const·{·return·FinishedRunning()​·&&406 »   ​bool·​DoneFailed(·bool·includeFinalNode·)​·const;​
404 »   ​»   ​»   ​»   ​NumNodesFailed()​·​>·​0;​·​}
405 407
406 »   ​»   ​/​**·​Determine·​whether·​the·​DAG·​is·​finished·​because·​of·​a·​cycle·​in408 »   ​»   ​/​**·​Determine·​whether·​the·​DAG·​is·​finished·​because·​of·​a·​cycle·​in
407 »   ​»   ​»   ​the·​DAG.​··(Note·that·this·method·sometimes·incorrectly·returns409 »   ​»   ​»   ​the·​DAG.​
408 »   ​»   ​»   ​true·for·errors·other·​than·​cycles·in·​the·DAG.​··wenger·2010-07-30.​)​410 ····»   ​»   ​@param·whether·​to·​consider·​the·final·node,​·if·any
409 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​finished·​but·​there·​is·​a·​cycle411 »   ​»   ​»   ​@return·​true·​iff·​the·​DAG·​is·​finished·​but·​there·​is·​a·​cycle
410 »   ​»   ​*/​412 »   ​»   ​*/​
411 »   ​inline·​bool·​DoneCycle()​·{·return·FinishedRunning()​·&&413 »   ​inline·​bool·​DoneCycle(·bool·​includeFinalNode)​·{
414 »   ​»   ​»   ​»   ​return·​FinishedRunning(·​includeFinalNode·​)​·​&&
415 »   ​»   ​»   ​»   ​!DoneSuccess(·​includeFinalNode·​)​·​&&
412 »   ​»   ​»   ​»   ​NumNodesFailed()​·​==·​0;​·​}416 »   ​»   ​»   ​»   ​NumNodesFailed()​·​==·​0;​·​}
413 417
414 »   ​»   ​/​**·​Submit·​all·​ready·​jobs,​·​provided·​they·​are·​not·​waiting·​on·​a418 »   ​»   ​/​**·​Submit·​all·​ready·​jobs,​·​provided·​they·​are·​not·​waiting·​on·​a
418 »   ​»   ​*/​422 »   ​»   ​*/​
419 ····​int·​SubmitReadyJobs(const​·​Dagman·​&dm)​;​423 ····​int·​SubmitReadyJobs(const​·​Dagman·​&dm)​;​
420 424
425 »   ​»   ​/​**·​Start·​the·​DAG's·​final·​node·​if·​there·​is·​one.​··​Note·​that·​this
426 »   ​»   ​»   ​method·​will·​not·​re-start·​the·​final·​node·​if·​it·​has·​already
427 »   ​»   ​»   ​been·​started.​
428 »   ​»   ​»   ​@return·​true·​iff·​the·​final·​node·​was·​actually·​started.​
429 »   ​»   ​*/​
430 »   ​bool·​StartFinalNode()​;​
431
421 ····​/​**·​Remove·​all·​jobs·​(using·​condor_rm)​·​that·​are·​currently·​running.​432 ····​/​**·​Remove·​all·​jobs·​(using·​condor_rm)​·​that·​are·​currently·​running.​
422 ········​All·​jobs·​currently·​marked·​Job:​:​STATUS_SUBMITTED·​will·​be·​fed433 ········​All·​jobs·​currently·​marked·​Job:​:​STATUS_SUBMITTED·​will·​be·​fed
423 ········​as·​arguments·​to·​condor_rm·​via·​popen.​··​This·​function·​is·​called434 ········​as·​arguments·​to·​condor_rm·​via·​popen.​··​This·​function·​is·​called
441 ········​@param·​datafile·​The·​original·​DAG·​file452 ········​@param·​datafile·​The·​original·​DAG·​file
442 »   ​»   ​@param·​multiDags·​Whether·​we·​have·​multiple·​DAGs453 »   ​»   ​@param·​multiDags·​Whether·​we·​have·​multiple·​DAGs
443 »   ​»   ​@param·​maxRescueDagNum·​the·​maximum·​legal·​rescue·​DAG·​number454 »   ​»   ​@param·​maxRescueDagNum·​the·​maximum·​legal·​rescue·​DAG·​number
455 »   ​»   ​@param·​overwrite·​Whether·​to·​overwrite·​the·​highest-numbered
456 »   ​»   ​»   ​rescue·​DAG·​(because·​with·​a·​final·​node·​you·​can·​write·​the
457 »   ​»   ​»   ​rescue·​DAG·​twice)​
444 »   ​»   ​@param·​parseFailed·​whether·​parsing·​the·​DAG(s)​·​failed458 »   ​»   ​@param·​parseFailed·​whether·​parsing·​the·​DAG(s)​·​failed
445 »   ​»   ​@param·​isPartial·​whether·​the·​rescue·​DAG·​is·​only·​a·​partial459 »   ​»   ​@param·​isPartial·​whether·​the·​rescue·​DAG·​is·​only·​a·​partial
446 »   ​»   ​»   ​DAG·​file·​(needs·​to·​be·​parsed·​in·​combination·​with·​the·​original460 »   ​»   ​»   ​DAG·​file·​(needs·​to·​be·​parsed·​in·​combination·​with·​the·​original
447 »   ​»   ​»   ​DAG·​file)​461 »   ​»   ​»   ​DAG·​file)​
448 ····​*/​462 ····​*/​
449 ····​void·​Rescue·​(const·​char·​*·​dagFile,​·​bool·​multiDags,​463 ····​void·​Rescue·​(const·​char·​*·​dagFile,​·​bool·​multiDags,​
450 »   ​»   ​»   ​»   ​int·​maxRescueDagNum,​·​bool·parseFailed·=·false,​464 »   ​»   ​»   ​»   ​int·​maxRescueDagNum,​·​bool·overwrite,​
451 »   ​»   ​»   ​»   ​bool·​isPartial·​=·​false)​·​/​*·​const·​*/​;​465 »   ​»   ​»   ​»   ​bool·parseFailed·=·false,​·bool·​isPartial·​=·​false)​·​/​*·​const·​*/​;​
452 466
453 ····​/​**·​Creates·​a·​DAG·​file·​based·​on·​the·​DAG·​in·​memory,​·​except·​all467 ····​/​**·​Creates·​a·​DAG·​file·​based·​on·​the·​DAG·​in·​memory,​·​except·​all
454 ········​completed·​jobs·​are·​premarked·​as·​DONE.​468 ········​completed·​jobs·​are·​premarked·​as·​DONE.​
689 »   ​*/​703 »   ​*/​
690 »   ​bool·​IsHalted()​·​{·​return·​_dagIsHalted;​·​}704 »   ​bool·​IsHalted()​·​{·​return·​_dagIsHalted;​·​}
691 705
706 »   ​enum·​dag_status·​{
707 »   ​»   ​DAG_STATUS_OK·​=·​0,​
708 »   ​»   ​DAG_STATUS_ERROR·​=·​1,​·​/​/​·​Error·​not·​enumerated·​below
709 »   ​»   ​DAG_STATUS_NODE_FAILE​D·​=·​2,​·​/​/​·​Node(s)​·​failed
710 »   ​»   ​DAG_STATUS_ABORT·​=·​3,​·​/​/​·​Hit·​special·​DAG·​abort·​value
711 »   ​»   ​DAG_STATUS_RM·​=·​4,​·​/​/​·​DAGMan·​job·​condor·​rm'ed
712 »   ​»   ​DAG_STATUS_CYCLE·​=·​5,​·​/​/​·​A·​cycle·​in·​the·​DAG
713 »   ​»   ​DAG_STATUS_HALTED·​=·​6,​·​/​/​·​DAG·​was·​halted·​and·​submitted·​jobs·​finished
714 »   ​};​
715
716 »   ​dag_status·​_dagStatus;​
717
718 »   ​/​**·​Determine·​whether·​this·​DAG·​has·​a·​final·​node.​
719 »   ​»   ​@return·​true·​iff·​the·​DAG·​has·​a·​final·​node.​
720 »   ​*/​
721 »   ​inline·​bool·​HasFinalNode()​·​const·​{·​return·​_final_job·​!=·​NULL;​·​}
722
723 »   ​/​**·​Determine·​whether·​the·​final·​node·​(if·​any)​·​of·​this·​DAG·​is
724 »   ​»   ​running·​(or·​has·​been·​run)​.​
725 »   ​»   ​@return·​true·​iff·​the·​final·​node·​is·​running·​or·​has·​been·​run
726 »   ​*/​
727 »   ​inline·​bool·​RunningFinalNode()​·​{·​return·​_runningFinalNode;​·​}
728
692 ··​private:​729 ··​private:​
693 730
694 »   ​/​/​·​If·​this·​DAG·​is·​a·​splice,​·​then·​this·​is·​what·​the·​DIR·​was·​set·​to,​·​it·731 »   ​/​/​·​If·​this·​DAG·​is·​a·​splice,​·​then·​this·​is·​what·​the·​DIR·​was·​set·​to,​·​it·
879 »   ​void·​WriteNodeToRescue(·​FILE·​*fp,​·​Job·​*node,​916 »   ​void·​WriteNodeToRescue(·​FILE·​*fp,​·​Job·​*node,​
880 »   ​»   ​»   ​»   ​bool·​reset_retries_upon_re​scue,​·​bool·​isPartial·​)​;​917 »   ​»   ​»   ​»   ​bool·​reset_retries_upon_re​scue,​·​bool·​isPartial·​)​;​
881 918
919 »   ​»   ​/​/​·​True·​iff·​the·​final·​node·​is·​ready·​to·​be·​run,​·​or·​is·​running
920 »   ​»   ​/​/​·​(including·​PRE·​and·​POST·​scripts,​·​if·​any.​
921 »   ​bool·​_runningFinalNode;​
922
882 ····​/​/​/​·​List·​of·​Job·​objects923 ····​/​/​/​·​List·​of·​Job·​objects
883 ····​List<Job>·····​_jobs;​924 ····​List<Job>·····​_jobs;​
884 925
926 »   ​»   ​/​/​·​Note:​·​the·​final·​node·​is·​in·​the·​_jobs·​list;​·​this·​pointer·​is·​just
927 »   ​»   ​/​/​·​for·​convenience.​
928 »   ​Job*·​_final_job;​
929
885 »   ​HashTable<MyString,​·​Job·​*>» ​»   ​_nodeNameHash;​930 »   ​HashTable<MyString,​·​Job·​*>» ​»   ​_nodeNameHash;​
886 931
887 »   ​HashTable<JobID_t,​·​Job·​*>» ​»   ​_nodeIDHash;​932 »   ​HashTable<JobID_t,​·​Job·​*>» ​»   ​_nodeIDHash;​
diff --git a/src/condor_dagman/dagman_commands.cpp b/src/condor_dagman/dagman_commands.cpp
index 77daf7d4a2592068054b85d5449861da424147e2..f8d227d369463b1d1567c7581cae33088e07131d 100644
a/​src/​condor_dagman/​dagman_commands.​cppb/​src/​condor_dagman/​dagman_commands.​cpp
64 »   ​»   ​·​const·​char*·​directory,​64 »   ​»   ​·​const·​char*·​directory,​
65 »   ​»   ​·​const·​char*·​submitFile,​65 »   ​»   ​·​const·​char*·​submitFile,​
66 »   ​»   ​·​const·​char·​*precmd,​·​const·​char·​*postcmd,​·​bool·​noop,​66 »   ​»   ​·​const·​char·​*precmd,​·​const·​char·​*postcmd,​·​bool·​noop,​
67 »   ​»   ​·​bool·​done,​67 »   ​»   ​·​bool·​done,​·bool·isFinal,​
68 »   ​»   ​·​MyString·​&failReason·​)​68 »   ​»   ​·​MyString·​&failReason·​)​
69 {69 {
70 »   ​MyString·​why;​70 »   ​MyString·​why;​
76 »   ​»   ​failReason·​=·​why;​76 »   ​»   ​failReason·​=·​why;​
77 »   ​»   ​return·​false;​77 »   ​»   ​return·​false;​
78 »   ​}78 »   ​}
79 »   ​if(·​done·​&&·​isFinal)​·​{
80 »   ​»   ​failReason.​sprintf(·​"Warning:​·​FINAL·​Job·​%s·​cannot·​be·​set·​to·​DONE\n",​
81 »   ​»   ​»   ​»   ​»   ​name·​)​;​
82 ········​debug_printf(·​DEBUG_QUIET,​·​failReason.​Value()​·​)​;​
83 »   ​»   ​(void)​check_warning_strictn​ess(·​DAG_STRICT_1,​·​false·​)​;​
84 »   ​»   ​done·​=·​false;​
85 »   ​}
79 »   ​Job*·​node·​=·​new·​Job(·​type,​·​name,​·​directory,​·​submitFile·​)​;​86 »   ​Job*·​node·​=·​new·​Job(·​type,​·​name,​·​directory,​·​submitFile·​)​;​
80 »   ​if(·​!node·​)​·​{87 »   ​if(·​!node·​)​·​{
81 »   ​»   ​dprintf(·​D_ALWAYS,​·​"ERROR:​·​out·​of·​memory!\n"·​)​;​88 »   ​»   ​dprintf(·​D_ALWAYS,​·​"ERROR:​·​out·​of·​memory!\n"·​)​;​
102 »   ​if(·​done·​)​·​{109 »   ​if(·​done·​)​·​{
103 »   ​»   ​node->SetStatus(·​Job:​:​STATUS_DONE·​)​;​110 »   ​»   ​node->SetStatus(·​Job:​:​STATUS_DONE·​)​;​
104 »   ​}111 »   ​}
112 »   ​node->SetFinal(·​isFinal·​)​;​
105 »   ​ASSERT(·​dag·​!=·​NULL·​)​;​113 »   ​ASSERT(·​dag·​!=·​NULL·​)​;​
106 »   ​if(·​!dag->Add(·​*node·​)​·​)​·​{114 »   ​if(·​!dag->Add(·​*node·​)​·​)​·​{
107 »   ​»   ​failReason·​=·​"unknown·​failure·​adding·node·to·DAG";​115 »   ​»   ​failReason·​=·​"unknown·​failure·​adding·​";​
116 »   ​»   ​failReason·​+=·​isFinal?·​"Final·​"·​:​·​"";​
117 »   ​»   ​failReason·​+=·​"node·​to·​DAG";​
108 »   ​»   ​delete·​node;​118 »   ​»   ​delete·​node;​
109 »   ​»   ​return·​false;​119 »   ​»   ​return·​false;​
110 »   ​}120 »   ​}
diff --git a/src/condor_dagman/dagman_commands.h b/src/condor_dagman/dagman_commands.h
index f40d9bad7681d7aea2455e8513291a005fb28081..bce5bd4a80036536fd0796df09853730af6278ba 100644
a/​src/​condor_dagman/​dagman_commands.​hb/​src/​condor_dagman/​dagman_commands.​h
35 »   ​»   ​»   ​··​const·​char*·​directory,​35 »   ​»   ​»   ​··​const·​char*·​directory,​
36 »   ​»   ​»   ​··​const·​char*·​submitFile,​36 »   ​»   ​»   ​··​const·​char*·​submitFile,​
37 »   ​»   ​»   ​··​const·​char·​*precmd,​·​const·​char·​*postcmd,​·​bool·​noop,​37 »   ​»   ​»   ​··​const·​char·​*precmd,​·​const·​char·​*postcmd,​·​bool·​noop,​
38 »   ​»   ​»   ​··​bool·​done,​·​MyString·​&failReason·​)​;​38 »   ​»   ​»   ​··​bool·​done,​·bool·isFinal,​·​MyString·​&failReason·​)​;​
39 39
40 /​**·​Set·​the·​DAG·​file·​(if·​any)​·​for·​a·​node.​40 /​**·​Set·​the·​DAG·​file·​(if·​any)​·​for·​a·​node.​
41 »   ​@param·​dag:​·​the·​DAG·​this·​node·​is·​part·​of41 »   ​@param·​dag:​·​the·​DAG·​this·​node·​is·​part·​of
diff --git a/src/condor_dagman/dagman_main.cpp b/src/condor_dagman/dagman_main.cpp
index 7b938c02b420b20c19f9bdd7a8e178cae2583084..0d6e58c7f8693693bbf7b17c0b3b00f11bb58d9d 100644
a/​src/​condor_dagman/​dagman_main.​cppb/​src/​condor_dagman/​dagman_main.​cpp
454 void·​main_shutdown_gracefu​l()​·​{454 void·​main_shutdown_gracefu​l()​·​{
455 »   ​dagman.​dag->DumpNodeStatus(·​true,​·​false·​)​;​455 »   ​dagman.​dag->DumpNodeStatus(·​true,​·​false·​)​;​
456 »   ​dagman.​dag->GetJobstateLog()​.​WriteDagmanFinished(·​EXIT_RESTART·​)​;​456 »   ​dagman.​dag->GetJobstateLog()​.​WriteDagmanFinished(·​EXIT_RESTART·​)​;​
457 ····dagman.​CleanUp()​;​457 »   ​dagman.​CleanUp()​;​
458 »   ​DC_Exit(·​EXIT_RESTART·​)​;​458 »   ​DC_Exit(·​EXIT_RESTART·​)​;​
459 }459 }
460 460
461 void·​main_shutdown_rescue(​·​int·​exitVal·​)​·​{461 void·​main_shutdown_rescue(​·​int·​exitVal,​·Dag:​:​dag_status·dagStatus·​)​·​{
462 »   ​»   ​/​/​·​Avoid·​possible·​infinite·​recursion·​if·​you·​hit·​a·​fatal·​error462 »   ​»   ​/​/​·​Avoid·​possible·​infinite·​recursion·​if·​you·​hit·​a·​fatal·​error
463 »   ​»   ​/​/​·​while·​writing·​a·​rescue·​DAG.​463 »   ​»   ​/​/​·​while·​writing·​a·​rescue·​DAG.​
464 »   ​static·​bool·​inShutdownRescue·​=·​false;​464 »   ​static·​bool·​inShutdownRescue·​=·​false;​
465 »   ​if·​(·​inShutdownRescue)​·return;​465 »   ​if·​(·​inShutdownRescue·)​·{
466 »   ​»   ​return;​
467 »   ​}
466 »   ​inShutdownRescue·​=·​true;​468 »   ​inShutdownRescue·​=·​true;​
467 469
470 »   ​dagman.​dag->_dagStatus·​=·​dagStatus;​
468 »   ​debug_printf(·​DEBUG_QUIET,​·​"Aborting·​DAG.​.​.​\n"·​)​;​471 »   ​debug_printf(·​DEBUG_QUIET,​·​"Aborting·​DAG.​.​.​\n"·​)​;​
472 »   ​»   ​/​/​·​Avoid·​writing·​two·​different·​rescue·​DAGs·​if·​the·​"main"·​DAG·​and
473 »   ​»   ​/​/​·​the·​final·​node·​(if·​any)​·​both·​fail.​
474 »   ​static·​bool·​wroteRescue·​=·​false;​
469 »   ​if(·​dagman.​dag·​)​·​{475 »   ​if(·​dagman.​dag·​)​·​{
470 »   ​»   ​»   ​/​/​·we·​write·​the·​rescue·​DAG·​*before*·​removing·​jobs·​because476 »   ​»   ​»   ​/​/​·We·​write·​the·​rescue·​DAG·​*before*·​removing·​jobs·​because
471 »   ​»   ​»   ​/​/​·​otherwise·​if·​we·​crashed,​·​failed,​·​or·​were·​killed·​while477 »   ​»   ​»   ​/​/​·​otherwise·​if·​we·​crashed,​·​failed,​·​or·​were·​killed·​while
472 »   ​»   ​»   ​/​/​·​removing·​them,​·​we·​would·​leave·​the·​DAG·​in·​an478 »   ​»   ​»   ​/​/​·​removing·​them,​·​we·​would·​leave·​the·​DAG·​in·​an
473 »   ​»   ​»   ​/​/​·​unrecoverable·​state.​.​.​479 »   ​»   ​»   ​/​/​·​unrecoverable·​state.​.​.​
475 »   ​»   ​»   ​if·​(·​dagman.​maxRescueDagNum·​>·​0·​)​·​{481 »   ​»   ​»   ​if·​(·​dagman.​maxRescueDagNum·​>·​0·​)​·​{
476 »   ​»   ​»   ​»   ​dagman.​dag->Rescue(·​dagman.​primaryDagFile.​Value()​,​482 »   ​»   ​»   ​»   ​dagman.​dag->Rescue(·​dagman.​primaryDagFile.​Value()​,​
477 »   ​»   ​»   ​»   ​»   ​»   ​»   ​dagman.​multiDags,​·​dagman.​maxRescueDagNum,​483 »   ​»   ​»   ​»   ​»   ​»   ​»   ​dagman.​multiDags,​·​dagman.​maxRescueDagNum,​
478 »   ​»   ​»   ​»   ​»   ​»   ​»   ​false,​·dagman.​_writePartialRescueDa​g·)​;​484 »   ​»   ​»   ​»   ​»   ​»   ​»   ​wroteRescue,​·false,​
485 »   ​»   ​»   ​»   ​»   ​»   ​»   ​dagman.​_writePartialRescueDa​g·​)​;​
486 »   ​»   ​»   ​»   ​wroteRescue·​=·​true;​
479 »   ​»   ​»   ​}·​else·​{487 »   ​»   ​»   ​}·​else·​{
480 »   ​»   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"No·​rescue·​DAG·​written·​because·​"488 »   ​»   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"No·​rescue·​DAG·​written·​because·​"
481 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"DAGMAN_MAX_RESCUE_NU​M·​is·​0\n"·​)​;​489 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"DAGMAN_MAX_RESCUE_NU​M·​is·​0\n"·​)​;​
493 »   ​»   ​»   ​dagman.​dag->RemoveRunningScr​ipts()​;​501 »   ​»   ​»   ​dagman.​dag->RemoveRunningScr​ipts()​;​
494 »   ​»   ​}502 »   ​»   ​}
495 »   ​»   ​dagman.​dag->PrintDeferrals(·​DEBUG_NORMAL,​·​true·​)​;​503 »   ​»   ​dagman.​dag->PrintDeferrals(·​DEBUG_NORMAL,​·​true·​)​;​
504
505 »   ​»   ​»   ​/​/​·​Start·​the·​final·​node·​if·​we·​have·​one.​
506 »   ​»   ​if·​(·​dagman.​dag->StartFinalNode()​·​)​·​{
507 »   ​»   ​»   ​»   ​/​/​·​We·​started·​a·​final·​node;​·​return·​here·​so·​we·​wait·​for·​the
508 »   ​»   ​»   ​»   ​/​/​·​final·​node·​to·​finish,​·​instead·​of·​exiting·​immediately.​
509 »   ​»   ​»   ​inShutdownRescue·​=·​false;​
510 »   ​»   ​»   ​return;​
511 »   ​»   ​}
496 »   ​»   ​dagman.​dag->DumpNodeStatus(·​false,​·​true·​)​;​512 »   ​»   ​dagman.​dag->DumpNodeStatus(·​false,​·​true·​)​;​
497 »   ​»   ​dagman.​dag->GetJobstateLog()​.​WriteDagmanFinished(·​exitVal·​)​;​513 »   ​»   ​dagman.​dag->GetJobstateLog()​.​WriteDagmanFinished(·​exitVal·​)​;​
498 »   ​}514 »   ​}
499 »   ​unlink(·​lockFileName·​)​;​·515 »   ​unlink(·​lockFileName·​)​;​·
500 ····dagman.​CleanUp()​;​516 »   ​dagman.​CleanUp()​;​
501 »   ​inShutdownRescue·​=·​false;​517 »   ​inShutdownRescue·​=·​false;​
502 »   ​DC_Exit(·​exitVal·​)​;​518 »   ​DC_Exit(·​exitVal·​)​;​
503 }519 }
507 /​/​·​the·​schedd·​will·​send·​if·​the·​DAGMan·​job·​is·​removed·​from·​the·​queue523 /​/​·​the·​schedd·​will·​send·​if·​the·​DAGMan·​job·​is·​removed·​from·​the·​queue
508 int·​main_shutdown_remove(​Service·​*,​·​int)​·​{524 int·​main_shutdown_remove(​Service·​*,​·​int)​·​{
509 ····​debug_printf(·​DEBUG_QUIET,​·​"Received·​SIGUSR1\n"·​)​;​525 ····​debug_printf(·​DEBUG_QUIET,​·​"Received·​SIGUSR1\n"·​)​;​
510 »   ​main_shutdown_rescue(​·​EXIT_ABORT·​)​;​526 »   ​main_shutdown_rescue(​·​EXIT_ABORT,​·Dag:​:​DAG_STATUS_RM·​)​;​
511 »   ​return·​FALSE;​527 »   ​return·​FALSE;​
512 }528 }
513 529
515 »   ​dagman.​dag->DumpNodeStatus(·​false,​·​false·​)​;​531 »   ​dagman.​dag->DumpNodeStatus(·​false,​·​false·​)​;​
516 »   ​dagman.​dag->GetJobstateLog()​.​WriteDagmanFinished(·​EXIT_OKAY·​)​;​532 »   ​dagman.​dag->GetJobstateLog()​.​WriteDagmanFinished(·​EXIT_OKAY·​)​;​
517 »   ​unlink(·​lockFileName·​)​;​·533 »   ​unlink(·​lockFileName·​)​;​·
518 ····dagman.​CleanUp()​;​534 »   ​dagman.​CleanUp()​;​
519 »   ​DC_Exit(·​EXIT_OKAY·​)​;​535 »   ​DC_Exit(·​EXIT_OKAY·​)​;​
520 }536 }
521 537
1008 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"because·​of·​-DumpRescue·​flag\n"·​)​;​1024 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"because·​of·​-DumpRescue·​flag\n"·​)​;​
1009 »   ​»   ​»   ​»   ​dagman.​dag->Rescue(·​dagman.​primaryDagFile.​Value()​,​1025 »   ​»   ​»   ​»   ​dagman.​dag->Rescue(·​dagman.​primaryDagFile.​Value()​,​
1010 »   ​»   ​»   ​»   ​»   ​»   ​»   ​dagman.​multiDags,​·​dagman.​maxRescueDagNum,​1026 »   ​»   ​»   ​»   ​»   ​»   ​»   ​dagman.​multiDags,​·​dagman.​maxRescueDagNum,​
1011 »   ​»   ​»   ​»   ​»   ​»   ​»   ​true,​·​false·​)​;​1027 »   ​»   ​»   ​»   ​»   ​»   ​»   ​false,​·true,​·​false·​)​;​
1012 »   ​»   ​»   ​}1028 »   ​»   ​»   ​}
1013 »   ​»   ​»   ​1029 »   ​»   ​»   ​
1014 »   ​»   ​»   ​dagman.​dag->RemoveRunningJob​s(dagman,​·​true)​;​1030 »   ​»   ​»   ​dagman.​dag->RemoveRunningJob​s(dagman,​·​true)​;​
1088 #ifndef·​NOT_DETECT_CYCLE1104 #ifndef·​NOT_DETECT_CYCLE
1089 »   ​if(·​dagman.​startup_cycle_detect·​&&·​dagman.​dag->isCycle()​·​)​1105 »   ​if(·​dagman.​startup_cycle_detect·​&&·​dagman.​dag->isCycle()​·​)​
1090 »   ​{1106 »   ​{
1091 »   ​»   ​debug_error·(1,​·DEBUG_QUIET,​·"ERROR:​·a·cycle·exists·in·the·dag,​·plese·check·​input\n")​;​1107 »   ​»   ​/​/​·Note:​·maybe·we·should·run·the·final·​node·here,​·if·there·​is·one.​
1108 »   ​»   ​/​/​·​wenger·​2011-12-19.​
1109 »   ​»   ​debug_error·​(1,​·​DEBUG_QUIET,​·​"ERROR:​·​a·​cycle·​exists·​in·​the·​dag,​·​please·​check·​input\n")​;​
1092 »   ​}1110 »   ​}
1093 #endif1111 #endif
1094 ····​debug_printf(·​DEBUG_VERBOSE,​·​"Dag·​contains·​%d·​total·​jobs\n",​1112 ····​debug_printf(·​DEBUG_VERBOSE,​·​"Dag·​contains·​%d·​total·​jobs\n",​
1095 »   ​»   ​»   ​»   ​··​dagman.​dag->NumNodes()​·​)​;​1113 »   ​»   ​»   ​»   ​··​dagman.​dag->NumNodes(·true·)​·​)​;​
1096 1114
1097 »   ​MyString·​firstLocation;​1115 »   ​MyString·​firstLocation;​
1098 »   ​if·​(·​dagman.​dag->GetReject(·​firstLocation·​)​·​)​·​{1116 »   ​if·​(·​dagman.​dag->GetReject(·​firstLocation·​)​·​)​·​{
1111 ····»   ​debug_printf(·​DEBUG_QUIET,​·​"Dumping·​rescue·​DAG·​and·​exiting·​"1129 ····»   ​debug_printf(·​DEBUG_QUIET,​·​"Dumping·​rescue·​DAG·​and·​exiting·​"
1112 »   ​»   ​»   ​»   ​»   ​"because·​of·​-DumpRescue·​flag\n"·​)​;​1130 »   ​»   ​»   ​»   ​»   ​"because·​of·​-DumpRescue·​flag\n"·​)​;​
1113 »   ​»   ​dagman.​dag->Rescue(·​dagman.​primaryDagFile.​Value()​,​1131 »   ​»   ​dagman.​dag->Rescue(·​dagman.​primaryDagFile.​Value()​,​
1114 »   ​»   ​»   ​»   ​»   ​dagman.​multiDags,​·​dagman.​maxRescueDagNum,​·​false,​·false·)​;​1132 »   ​»   ​»   ​»   ​»   ​dagman.​multiDags,​·​dagman.​maxRescueDagNum,​·​false,​
1133 »   ​»   ​»   ​»   ​»   ​false,​·​false·​)​;​
1115 »   ​»   ​ExitSuccess()​;​1134 »   ​»   ​ExitSuccess()​;​
1116 »   ​»   ​return;​1135 »   ​»   ​return;​
1117 »   ​}1136 »   ​}
1171 1190
1172 void1191 void
1173 print_status()​·​{1192 print_status()​·​{
1174 »   ​int·​total·​=·​dagman.​dag->NumNodes()​;​1193 »   ​int·​total·​=·​dagman.​dag->NumNodes(·true·)​;​
1175 »   ​int·​done·​=·​dagman.​dag->NumNodesDone()​;​1194 »   ​int·​done·​=·​dagman.​dag->NumNodesDone(·true·)​;​
1176 »   ​int·​pre·​=·​dagman.​dag->PreRunNodeCount(​)​;​1195 »   ​int·​pre·​=·​dagman.​dag->PreRunNodeCount(​)​;​
1177 »   ​int·​submitted·​=·​dagman.​dag->NumJobsSubmitted​()​;​1196 »   ​int·​submitted·​=·​dagman.​dag->NumJobsSubmitted​()​;​
1178 »   ​int·​post·​=·​dagman.​dag->PostRunNodeCount​()​;​1197 »   ​int·​post·​=·​dagman.​dag->PostRunNodeCount​()​;​
1233 »   ​/​/​·​If·​the·​log·​has·​grown1252 »   ​/​/​·​If·​the·​log·​has·​grown
1234 »   ​if(·​dagman.​dag->DetectCondorLogG​rowth()​·​)​·​{1253 »   ​if(·​dagman.​dag->DetectCondorLogG​rowth()​·​)​·​{
1235 »   ​»   ​if(·​dagman.​dag->ProcessLogEvents​(·​CONDORLOG·​)​·​==·​false·​)​·​{1254 »   ​»   ​if(·​dagman.​dag->ProcessLogEvents​(·​CONDORLOG·​)​·​==·​false·​)​·​{
1255 »   ​»   ​»   ​debug_printf(·​DEBUG_NORMAL,​
1256 »   ​»   ​»   ​»   ​»   ​»   ​"ProcessLogEvents(CON​DORLOG)​·​returned·​false\n"·​)​;​
1236 »   ​»   ​»   ​dagman.​dag->PrintReadyQ(·​DEBUG_DEBUG_1·​)​;​1257 »   ​»   ​»   ​dagman.​dag->PrintReadyQ(·​DEBUG_DEBUG_1·​)​;​
1237 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​1258 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·Dag:​:​DAG_STATUS_ERROR·​)​;​
1238 »   ​»   ​»   ​return;​1259 »   ​»   ​»   ​return;​
1239 »   ​»   ​}1260 »   ​»   ​}
1240 »   ​}1261 »   ​}
1242 »   ​if(·​dagman.​dag->DetectDaPLogGrow​th()​·​)​·​{1263 »   ​if(·​dagman.​dag->DetectDaPLogGrow​th()​·​)​·​{
1243 »   ​»   ​if(·​dagman.​dag->ProcessLogEvents​(·​DAPLOG·​)​·​==·​false·​)​·​{1264 »   ​»   ​if(·​dagman.​dag->ProcessLogEvents​(·​DAPLOG·​)​·​==·​false·​)​·​{
1244 »   ​»   ​»   ​debug_printf(·​DEBUG_NORMAL,​1265 »   ​»   ​»   ​debug_printf(·​DEBUG_NORMAL,​
1245 »   ​»   ​»   ​»   ​»   ​»   ​"ProcessLogEvents(DAP​LOG)​·​returned·​false\n")​;​1266 »   ​»   ​»   ​»   ​»   ​»   ​"ProcessLogEvents(DAP​LOG)​·​returned·​false\n"·)​;​
1246 »   ​»   ​»   ​dagman.​dag->PrintReadyQ(·​DEBUG_DEBUG_1·​)​;​1267 »   ​»   ​»   ​dagman.​dag->PrintReadyQ(·​DEBUG_DEBUG_1·​)​;​
1247 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​1268 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·Dag:​:​DAG_STATUS_ERROR·​)​;​
1248 »   ​»   ​»   ​return;​1269 »   ​»   ​»   ​return;​
1249 »   ​»   ​}1270 »   ​»   ​}
1250 »   ​}1271 »   ​}
1251 1272
1252 ····​/​/​·​print·​status·​if·​anything's·​changed·​(or·​we're·​in·​a·​high·​debug·​level)​1273 ····​/​/​·​print·​status·​if·​anything's·​changed·​(or·​we're·​in·​a·​high·​debug·​level)​
1253 ····​if(·​prevJobsDone·​!=·​dagman.​dag->NumNodesDone()​1274 ····​if(·​prevJobsDone·​!=·​dagman.​dag->NumNodesDone(·true·)​
1254 ········​||·​prevJobs·​!=·​dagman.​dag->NumNodes()​1275 ········​||·​prevJobs·​!=·​dagman.​dag->NumNodes(·true·)​
1255 ········​||·​prevJobsFailed·​!=·​dagman.​dag->NumNodesFailed()​1276 ········​||·​prevJobsFailed·​!=·​dagman.​dag->NumNodesFailed()​
1256 ········​||·​prevJobsSubmitted·​!=·​dagman.​dag->NumJobsSubmitted​()​1277 ········​||·​prevJobsSubmitted·​!=·​dagman.​dag->NumJobsSubmitted​()​
1257 ········​||·​prevJobsReady·​!=·​dagman.​dag->NumNodesReady()​1278 ········​||·​prevJobsReady·​!=·​dagman.​dag->NumNodesReady()​
1260 »   ​»   ​||·​DEBUG_LEVEL(·​DEBUG_DEBUG_4·​)​·​)​·​{1281 »   ​»   ​||·​DEBUG_LEVEL(·​DEBUG_DEBUG_4·​)​·​)​·​{
1261 »   ​»   ​print_status()​;​1282 »   ​»   ​print_status()​;​
1262 1283
1263 ········​prevJobsDone·​=·​dagman.​dag->NumNodesDone()​;​1284 ········​prevJobsDone·​=·​dagman.​dag->NumNodesDone(·true·)​;​
1264 ········​prevJobs·​=·​dagman.​dag->NumNodes()​;​1285 ········​prevJobs·​=·​dagman.​dag->NumNodes(·true·)​;​
1265 ········​prevJobsFailed·​=·​dagman.​dag->NumNodesFailed()​;​1286 ········​prevJobsFailed·​=·​dagman.​dag->NumNodesFailed()​;​
1266 ········​prevJobsSubmitted·​=·​dagman.​dag->NumJobsSubmitted​()​;​1287 ········​prevJobsSubmitted·​=·​dagman.​dag->NumJobsSubmitted​()​;​
1267 ········​prevJobsReady·​=·​dagman.​dag->NumNodesReady()​;​1288 ········​prevJobsReady·​=·​dagman.​dag->NumNodesReady()​;​
1275 1296
1276 »   ​dagman.​dag->DumpNodeStatus(·​false,​·​false·​)​;​1297 »   ​dagman.​dag->DumpNodeStatus(·​false,​·​false·​)​;​
1277 1298
1278 ····​ASSERT(·​dagman.​dag->NumNodesDone()​·​+·​dagman.​dag->NumNodesFailed()​1299 ····​ASSERT(·​dagman.​dag->NumNodesDone(·true·)​·​+·​dagman.​dag->NumNodesFailed()​
1279 »   ​»   ​»   ​<=·​dagman.​dag->NumNodes()​·​)​;​1300 »   ​»   ​»   ​<=·​dagman.​dag->NumNodes(·true·)​·​)​;​
1280 1301
1281 ····​/​/​1302 ····​/​/​
1282 ····​/​/​·​If·​DAG·​is·​complete,​·​hurray,​·​and·​exit.​1303 ····​/​/​·​If·​DAG·​is·​complete,​·​hurray,​·​and·​exit.​
1283 ····​/​/​1304 ····​/​/​
1284 ····​if(·​dagman.​dag->DoneSuccess()​·​)​·​{1305 ····​if(·​dagman.​dag->DoneSuccess(·true·)​·​)​·​{
1285 ········​ASSERT(·​dagman.​dag->NumJobsSubmitted​()​·​==·​0·​)​;​1306 ········​ASSERT(·​dagman.​dag->NumJobsSubmitted​()​·​==·​0·​)​;​
1286 »   ​»   ​dagman.​dag->CheckAllJobs()​;​1307 »   ​»   ​dagman.​dag->CheckAllJobs()​;​
1287 ········​debug_printf(·​DEBUG_NORMAL,​·​"All·​jobs·​Completed!\n"·​)​;​1308 ········​debug_printf(·​DEBUG_NORMAL,​·​"All·​jobs·​Completed!\n"·​)​;​
1296 »   ​»   ​return;​1317 »   ​»   ​return;​
1297 ····​}1318 ····​}
1298 1319
1320 »   ​/​/​
1321 »   ​/​/​·​DAG·​has·​failed·​--·​dump·​rescue·​DAG.​
1322 »   ​/​/​
1323 ····​if(·​dagman.​dag->DoneFailed(·​true·​)​·​)​·​{
1324 »   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·​dagman.​dag->_dagStatus·​)​;​
1325 »   ​»   ​return;​
1326 »   ​}
1327
1328 »   ​/​/​
1329 »   ​/​/​·​DAG·​has·​succeeded·​but·​we·​haven't·​run·​final·​node·​yet,​·​so·​do·​that.​
1330 »   ​/​/​
1331 ····​if(·​dagman.​dag->DoneSuccess(·​false·​)​·​)​·​{
1332 »   ​»   ​dagman.​dag->StartFinalNode()​;​
1333 »   ​»   ​return;​
1334 »   ​}
1335
1299 »   ​»   ​/​/​·​If·​the·​DAG·​is·​halted,​·​we·​don't·​want·​to·​actually·​exit·​yet·​if1336 »   ​»   ​/​/​·​If·​the·​DAG·​is·​halted,​·​we·​don't·​want·​to·​actually·​exit·​yet·​if
1300 »   ​»   ​/​/​·​jobs·​are·​still·​in·​the·​queue,​·​or·​any·​POST·​scripts·​need·​to·​be1337 »   ​»   ​/​/​·​jobs·​are·​still·​in·​the·​queue,​·​or·​any·​POST·​scripts·​need·​to·​be
1301 »   ​»   ​/​/​·​run·​(we·​need·​to·​run·​POST·​scripts·​so·​we·​don't·​"waste"·​jobs1338 »   ​»   ​/​/​·​run·​(we·​need·​to·​run·​POST·​scripts·​so·​we·​don't·​"waste"·​jobs
1303 »   ​»   ​/​/​·​for·​PRE·​scripts·​because·​they'll·​be·​re-run·​when·​the·​rescue1340 »   ​»   ​/​/​·​for·​PRE·​scripts·​because·​they'll·​be·​re-run·​when·​the·​rescue
1304 »   ​»   ​/​/​·​DAG·​is·​run·​anyhow)​.​1341 »   ​»   ​/​/​·​DAG·​is·​run·​anyhow)​.​
1305 »   ​if·​(·​dagman.​dag->IsHalted()​·​&&·​dagman.​dag->NumJobsSubmitted​()​·​==·​0·​&&1342 »   ​if·​(·​dagman.​dag->IsHalted()​·​&&·​dagman.​dag->NumJobsSubmitted​()​·​==·​0·​&&
1306 »   ​»   ​»   ​»   ​dagman.​dag->PostRunNodeCount​()​·​==·​0·)​·{1343 »   ​»   ​»   ​»   ​dagman.​dag->PostRunNodeCount​()​·​==·​0·&&
1344 »   ​»   ​»   ​»   ​!dagman.​dag->RunningFinalNode​()​·​)​·​{
1307 »   ​»   ​debug_printf·​(·​DEBUG_QUIET,​·​"Exiting·​because·​DAG·​is·​halted·​"1345 »   ​»   ​debug_printf·​(·​DEBUG_QUIET,​·​"Exiting·​because·​DAG·​is·​halted·​"
1308 »   ​»   ​»   ​»   ​»   ​"and·​no·​jobs·​or·​scripts·​are·​running\n"·​)​;​1346 »   ​»   ​»   ​»   ​»   ​"and·​no·​jobs·​or·​scripts·​are·​running\n"·​)​;​
1309 »   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​1347 »   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·Dag:​:​DAG_STATUS_HALTED·​)​;​
1348 »   ​»   ​return;​
1310 »   ​}1349 »   ​}
1311 1350
1312 ····​/​/​1351 ····​/​/​
1313 ····​/​/​·​If·​no·​jobs·​are·​submitted·​and·​no·​scripts·​are·​running,​·​but·​the1352 ····​/​/​·​If·​no·​jobs·​are·​submitted·​and·​no·​scripts·​are·​running,​·​but·​the
1314 ····​/​/​·​dag·​is·​not·​complete,​·​then·​at·​least·​one·​job·​failed,​·​or·​a·​cycle1353 ····​/​/​·​dag·​is·​not·​complete,​·​then·​at·​least·​one·​job·​failed,​·​or·​a·​cycle
1315 ····​/​/​·​exists.​1354 ····​/​/​·​exists.​··(Note·that·if·the·DAG·completed·successfully,​·we·already
1355 »   ​/​/​·​returned·​from·​this·​function·​above.​)​
1316 ····​/​/​·1356 ····​/​/​·
1317 ····​if(·​dagman.​dag->FinishedRunning(​)​·​)​·​{1357 ····​if(·​dagman.​dag->FinishedRunning(​·false·)​·​)​·​{
1318 »   ​»   ​if(·dagman.​dag->DoneFailed()​·)​·{1358 »   ​»   ​Dag:​:​dag_status·dagStatus·=·Dag:​:​DAG_STATUS_OK;​
1359 »   ​»   ​if(·​dagman.​dag->DoneFailed(·​false·​)​·​)​·​{
1319 »   ​»   ​»   ​if(·​DEBUG_LEVEL(·​DEBUG_QUIET·​)​·​)​·​{1360 »   ​»   ​»   ​if(·​DEBUG_LEVEL(·​DEBUG_QUIET·​)​·​)​·​{
1320 »   ​»   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​1361 »   ​»   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​
1321 »   ​»   ​»   ​»   ​»   ​»   ​»   ​··​"ERROR:​·​the·​following·​job(s)​·​failed:​\n"·​)​;​1362 »   ​»   ​»   ​»   ​»   ​»   ​»   ​··​"ERROR:​·​the·​following·​job(s)​·​failed:​\n"·​)​;​
1322 »   ​»   ​»   ​»   ​dagman.​dag->PrintJobList(·​Job:​:​STATUS_ERROR·​)​;​1363 »   ​»   ​»   ​»   ​dagman.​dag->PrintJobList(·​Job:​:​STATUS_ERROR·​)​;​
1323 »   ​»   ​»   ​}1364 »   ​»   ​»   ​}
1365 »   ​»   ​»   ​dagStatus·​=·​Dag:​:​DAG_STATUS_NODE_FAILE​D;​
1324 »   ​»   ​}·​else·​{1366 »   ​»   ​}·​else·​{
1325 »   ​»   ​»   ​/​/​·​no·​jobs·​failed,​·​so·​a·​cycle·​must·​exist1367 »   ​»   ​»   ​/​/​·​no·​jobs·​failed,​·​so·​a·​cycle·​must·​exist
1326 »   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​DAG·​finished·​but·​not·​all·​"1368 »   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​DAG·​finished·​but·​not·​all·​"
1327 »   ​»   ​»   ​»   ​»   ​»   ​"nodes·​are·​complete·​--·​checking·​for·​a·​cycle.​.​.​\n"·​)​;​1369 »   ​»   ​»   ​»   ​»   ​»   ​"nodes·​are·​complete·​--·​checking·​for·​a·​cycle.​.​.​\n"·​)​;​
1328 »   ​»   ​»   ​if(·​dagman.​dag->isCycle()​·​)​·​{1370 »   ​»   ​»   ​if(·​dagman.​dag->isCycle()​·​)​·​{
1329 »   ​»   ​»   ​»   ​debug_printf·​(DEBUG_QUIET,​·​".​.​.​·​ERROR:​·​a·​cycle·​exists·​"1371 »   ​»   ​»   ​»   ​debug_printf·​(DEBUG_QUIET,​·​".​.​.​·​ERROR:​·​a·​cycle·​exists·​"
1330 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"in·​the·​dag,​·​plese·​check·​input\n")​;​1372 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"in·​the·​dag,​·​please·​check·​input\n")​;​
1373 »   ​»   ​»   ​»   ​dagStatus·​=·​Dag:​:​DAG_STATUS_CYCLE;​
1331 »   ​»   ​»   ​}·​else·​{1374 »   ​»   ​»   ​}·​else·​{
1332 »   ​»   ​»   ​»   ​debug_printf·​(DEBUG_QUIET,​·​".​.​.​·​ERROR:​·​no·​cycle·​found;​·​"1375 »   ​»   ​»   ​»   ​debug_printf·​(DEBUG_QUIET,​·​".​.​.​·​ERROR:​·​no·​cycle·​found;​·​"
1333 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"unknown·​error·​condition\n")​;​1376 »   ​»   ​»   ​»   ​»   ​»   ​»   ​"unknown·​error·​condition\n")​;​
1377 »   ​»   ​»   ​»   ​dagStatus·​=·​Dag:​:​DAG_STATUS_ERROR;​
1334 »   ​»   ​»   ​}1378 »   ​»   ​»   ​}
1335 »   ​»   ​»   ​if·​(·​debug_level·​>=·​DEBUG_NORMAL·​)​·​{1379 »   ​»   ​»   ​if·​(·​debug_level·​>=·​DEBUG_NORMAL·​)​·​{
1336 »   ​»   ​»   ​»   ​dagman.​dag->PrintJobList()​;​1380 »   ​»   ​»   ​»   ​dagman.​dag->PrintJobList()​;​
1337 »   ​»   ​»   ​}1381 »   ​»   ​»   ​}
1338 »   ​»   ​}1382 »   ​»   ​}
1339 1383
1340 »   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​1384 »   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·dagStatus·​)​;​
1341 »   ​»   ​return;​1385 »   ​»   ​return;​
1342 ····​}1386 ····​}
1343 }1387 }
diff --git a/src/condor_dagman/dagman_main.h b/src/condor_dagman/dagman_main.h
index ed04f8925fa1d512e38ddb2cae97ea9c863f93aa..51c667c81291d97ec7b0b9634d27f2e3a8c9be48 100644
a/​src/​condor_dagman/​dagman_main.​hb/​src/​condor_dagman/​dagman_main.​h
31 »   ​EXIT_RESTART·​=·​3,​»   ​/​/​·​exit·​but·​indicate·​that·​we·​should·​be·​restarted31 »   ​EXIT_RESTART·​=·​3,​»   ​/​/​·​exit·​but·​indicate·​that·​we·​should·​be·​restarted
32 };​32 };​
33 33
34 void·​main_shutdown_rescue(​·​int·​exitVal·​)​;​34 void·​main_shutdown_rescue(​·​int·​exitVal,​·Dag:​:​dag_status·dagStatus·​)​;​
35 void·​main_shutdown_gracefu​l(·​void·​)​;​35 void·​main_shutdown_gracefu​l(·​void·​)​;​
36 void·​print_status()​;​36 void·​print_status()​;​
37 37
diff --git a/src/condor_dagman/dagman_submit.cpp b/src/condor_dagman/dagman_submit.cpp
index a86ed7a6d26528dd341bf7544b56f2b016b25e43..787bfff1d80f8cffc8f80b25f640fcd7a8e5a614 100644
a/​src/​condor_dagman/​dagman_submit.​cppb/​src/​condor_dagman/​dagman_submit.​cpp
196 »   ​debug_printf(·​DEBUG_NORMAL,​·​"Submit·​generated·​%d·​job·​procs;​·​"196 »   ​debug_printf(·​DEBUG_NORMAL,​·​"Submit·​generated·​%d·​job·​procs;​·​"
197 »   ​»   ​»   ​»   ​"disallowed·​by·​DAGMAN_PROHIBIT_MULTI​_JOBS·​setting\n",​197 »   ​»   ​»   ​»   ​"disallowed·​by·​DAGMAN_PROHIBIT_MULTI​_JOBS·​setting\n",​
198 »   ​»   ​»   ​»   ​jobProcCount·​)​;​198 »   ​»   ​»   ​»   ​jobProcCount·​)​;​
199 »   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​199 »   ​main_shutdown_rescue(​·​EXIT_ERROR,​·Dag:​:​DAG_STATUS_ERROR·​)​;​
200 ··​}200 ··​}
201 ··201 ··
202 ··​return·​true;​202 ··​return·​true;​
254 »   ​»   ​»   ​»   ​/​/​·​We·​don't·​just·​return·​false·​here·​because·​that·​would254 »   ​»   ​»   ​»   ​/​/​·​We·​don't·​just·​return·​false·​here·​because·​that·​would
255 »   ​»   ​»   ​»   ​/​/​·​cause·​us·​to·​re-try·​the·​submit,​·​which·​would·​obviously255 »   ​»   ​»   ​»   ​/​/​·​cause·​us·​to·​re-try·​the·​submit,​·​which·​would·​obviously
256 »   ​»   ​»   ​»   ​/​/​·​just·​fail·​again.​256 »   ​»   ​»   ​»   ​/​/​·​just·​fail·​again.​
257 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​257 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·Dag:​:​DAG_STATUS_ERROR·​)​;​
258 »   ​»   ​»   ​return·​false;​·​/​/​·​We·​won't·​actually·​get·​to·​here.​258 »   ​»   ​»   ​return·​false;​·​/​/​·​We·​won't·​actually·​get·​to·​here.​
259 »   ​»   ​}·​else·​if·​(·​queueCount·​!=·​1·​)​·​{259 »   ​»   ​}·​else·​if·​(·​queueCount·​!=·​1·​)​·​{
260 »   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​node·​%s·​job·​queues·​%d·​"260 »   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​node·​%s·​job·​queues·​%d·​"
263 »   ​»   ​»   ​»   ​/​/​·​We·​don't·​just·​return·​false·​here·​because·​that·​would263 »   ​»   ​»   ​»   ​/​/​·​We·​don't·​just·​return·​false·​here·​because·​that·​would
264 »   ​»   ​»   ​»   ​/​/​·​cause·​us·​to·​re-try·​the·​submit,​·​which·​would·​obviously264 »   ​»   ​»   ​»   ​/​/​·​cause·​us·​to·​re-try·​the·​submit,​·​which·​would·​obviously
265 »   ​»   ​»   ​»   ​/​/​·​just·​fail·​again.​265 »   ​»   ​»   ​»   ​/​/​·​just·​fail·​again.​
266 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​266 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·Dag:​:​DAG_STATUS_ERROR·​)​;​
267 »   ​»   ​»   ​return·​false;​·​/​/​·​We·​won't·​actually·​get·​to·​here.​267 »   ​»   ​»   ​return·​false;​·​/​/​·​We·​won't·​actually·​get·​to·​here.​
268 »   ​»   ​}268 »   ​»   ​}
269 »   ​}269 »   ​}
331 »   ​»   ​args.​AppendArg(·​var.​Value()​·​)​;​331 »   ​»   ​args.​AppendArg(·​var.​Value()​·​)​;​
332 »   ​}332 »   ​}
333 333
334 »   ​»   ​/​/​·​Set·​the·​special·​DAG_STATUS·​variable·​(mainly·​for·​use·​by
335 »   ​»   ​/​/​·​"final"·​nodes)​.​
336 »   ​args.​AppendArg(·​"-a"·​)​;​
337 »   ​MyString·​var·​=·​"DAG_STATUS·​=·​";​
338 »   ​var·​+=·​dm.​dag->_dagStatus;​
339 »   ​args.​AppendArg(·​var.​Value()​·​)​;​
340
341 »   ​»   ​/​/​·​Set·​the·​special·​FAILED_COUNT·​variable·​(mainly·​for·​use·​by
342 »   ​»   ​/​/​·​"final"·​nodes)​.​
343 »   ​args.​AppendArg(·​"-a"·​)​;​
344 »   ​var·​=·​"FAILED_COUNT·​=·​";​
345 »   ​var·​+=·​dm.​dag->NumNodesFailed()​;​
346 »   ​args.​AppendArg(·​var.​Value()​·​)​;​
347
334 »   ​»   ​/​/​·​how·​big·​is·​the·​command·​line·​so·​far348 »   ​»   ​/​/​·​how·​big·​is·​the·​command·​line·​so·​far
335 »   ​MyString·​display;​349 »   ​MyString·​display;​
336 »   ​args.​GetArgsStringForDispl​ay(·​&display·​)​;​350 »   ​args.​GetArgsStringForDispl​ay(·​&display·​)​;​
diff --git a/src/condor_dagman/debug.cpp b/src/condor_dagman/debug.cpp
index 9ff531b0ce2c723013502ce2599f1c35f26578df..1c7a59d0ee248443b767fdcbec6410b23b70bca8 100644
a/​src/​condor_dagman/​debug.​cppb/​src/​condor_dagman/​debug.​cpp
266 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​Warning·​is·​fatal·​"266 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​Warning·​is·​fatal·​"
267 »   ​»   ​»   ​»   ​»   ​"error·​because·​of·​DAGMAN_USE_STRICT·​setting\n"·​)​;​267 »   ​»   ​»   ​»   ​»   ​"error·​because·​of·​DAGMAN_USE_STRICT·​setting\n"·​)​;​
268 »   ​»   ​if·​(·​quit_if_error·​)​·​{268 »   ​»   ​if·​(·​quit_if_error·​)​·​{
269 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR·​)​;​269 »   ​»   ​»   ​main_shutdown_rescue(​·​EXIT_ERROR,​·Dag:​:​DAG_STATUS_ERROR·​)​;​
270 »   ​»   ​}270 »   ​»   ​}
271 271
272 »   ​»   ​return·​true;​272 »   ​»   ​return·​true;​
diff --git a/src/condor_dagman/job.cpp b/src/condor_dagman/job.cpp
index 305bb6aa0851997ca9b2fa24b14bb06899cf7d72..857e9a327d826e4a846201c2992960dd59f31c3f 100644
a/​src/​condor_dagman/​job.​cppb/​src/​condor_dagman/​job.​cpp
47 /​/​---------------------​---------------------​---------------------​------------47 /​/​---------------------​---------------------​---------------------​------------
48 /​/​·​NOTE:​·​this·​must·​be·​kept·​in·​sync·​with·​the·​status_t·​enum48 /​/​·​NOTE:​·​this·​must·​be·​kept·​in·​sync·​with·​the·​status_t·​enum
49 const·​char·​*·​Job:​:​status_t_names[]·​=·​{49 const·​char·​*·​Job:​:​status_t_names[]·​=·​{
50 ····​"STATUS_NOT_READY",​
50 ····​"STATUS_READY····​",​51 ····​"STATUS_READY····​",​
51 ····​"STATUS_PRERUN···​",​52 ····​"STATUS_PRERUN···​",​
52 ····​"STATUS_SUBMITTED",​53 ····​"STATUS_SUBMITTED",​
99 /​/​---------------------​---------------------​---------------------​------------100 /​/​---------------------​---------------------​---------------------​------------
100 Job:​:​Job(·​const·​job_type_t·​jobType,​·​const·​char*·​jobName,​101 Job:​:​Job(·​const·​job_type_t·​jobType,​·​const·​char*·​jobName,​
101 »   ​»   ​»   ​const·​char·​*directory,​·​const·​char*·​cmdFile·​)​·​:​102 »   ​»   ​»   ​const·​char·​*directory,​·​const·​char*·​cmdFile·​)​·​:​
102 »   ​_jobType(·​jobType·​)​,​·​_preskip(·​PRE_SKIP_INVALID·​)​,​·_pre_status(·NO_PRE_VALUE·)​103 »   ​_jobType(·​jobType·​)​,​·​_preskip(·​PRE_SKIP_INVALID·​)​,​
103 {104 »   ​»   ​»   ​_pre_status(·NO_PRE_VALUE·)​,​·_final(·false·)​
104 »   ​Init(·​jobName,​·​directory,​·​cmdFile·​)​;​
105 }
106
107 /​/​---------------------​---------------------​---------------------​------------
108 void·​Job:​:​Init(·​const·​char*·​jobName,​·​const·​char*·​directory,​
109 »   ​»   ​»   ​const·​char*·​cmdFile·​)​
110 {105 {
111 »   ​ASSERT(·​jobName·​!=·​NULL·​)​;​106 »   ​ASSERT(·​jobName·​!=·​NULL·​)​;​
112 »   ​ASSERT(·​cmdFile·​!=·​NULL·​)​;​107 »   ​ASSERT(·​cmdFile·​!=·​NULL·​)​;​
113 108
114 »   ​debug_printf(·​DEBUG_DEBUG_1,​·​"Job:​:​Init(%s,​·​%s,​·​%s)​\n",​·​jobName,​109 »   ​debug_printf(·​DEBUG_DEBUG_1,​·​"Job:​:​Job(%s,​·​%s,​·​%s)​\n",​·​jobName,​
115 »   ​»   ​»   ​»   ​directory,​·​cmdFile·​)​;​110 »   ​»   ​»   ​directory,​·​cmdFile·​)​;​
116 111
117 ····_scriptPre·​=·​NULL;​112 »   ​_scriptPre·​=·​NULL;​
118 ····_scriptPost·​=·​NULL;​113 »   ​_scriptPost·​=·​NULL;​
119 ····_Status·​=·​STATUS_READY;​114 »   ​_Status·​=·​STATUS_READY;​
120 »   ​_isIdle·​=·​false;​115 »   ​_isIdle·​=·​false;​
121 »   ​countedAsDone·​=·​false;​116 »   ​countedAsDone·​=·​false;​
122 117
123 ····_jobName·​=·​strnewp·​(jobName)​;​118 »   ​_jobName·​=·​strnewp·​(jobName)​;​
124 »   ​_directory·​=·​strnewp·​(directory)​;​119 »   ​_directory·​=·​strnewp·​(directory)​;​
125 ····_cmdFile·​=·​strnewp·​(cmdFile)​;​120 »   ​_cmdFile·​=·​strnewp·​(cmdFile)​;​
126 »   ​_dagFile·​=·​NULL;​121 »   ​_dagFile·​=·​NULL;​
127 »   ​_throttleInfo·​=·​NULL;​122 »   ​_throttleInfo·​=·​NULL;​
128 »   ​_logIsMonitored·​=·​false;​123 »   ​_logIsMonitored·​=·​false;​
130 125
131 ····​/​/​·​_condorID·​struct·​initializes·​itself126 ····​/​/​·​_condorID·​struct·​initializes·​itself
132 127
133 ····/​/​·​jobID·​is·​a·​primary·​key·​(a·​database·​term)​.​··​All·​should·​be·​unique128 »   ​»   ​/​/​·​jobID·​is·​a·​primary·​key·​(a·​database·​term)​.​··​All·​should·​be·​unique
134 ····_jobID·​=·​_jobID_counter++;​129 »   ​_jobID·​=·​_jobID_counter++;​
135 130
136 ····retry_max·​=·​0;​131 »   ​retry_max·​=·​0;​
137 ····retries·​=·​0;​132 »   ​retries·​=·​0;​
138 ····_submitTries·​=·​0;​133 »   ​_submitTries·​=·​0;​
139 »   ​retval·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy134 »   ​retval·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy
140 ····have_retry_abort_val·​=·​false;​135 »   ​have_retry_abort_val·​=·​false;​
141 ····retry_abort_val·​=·​0xdeadbeef;​136 »   ​retry_abort_val·​=·​0xdeadbeef;​
142 ····have_abort_dag_val·​=·​false;​137 »   ​have_abort_dag_val·​=·​false;​
143 »   ​abort_dag_val·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy138 »   ​abort_dag_val·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy
144 ····have_abort_dag_return​_val·​=·​false;​139 »   ​have_abort_dag_return​_val·​=·​false;​
145 »   ​abort_dag_return_val·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy140 »   ​abort_dag_return_val·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy
146 »   ​_visited·​=·​false;​141 »   ​_visited·​=·​false;​
147 »   ​_dfsOrder·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy142 »   ​_dfsOrder·​=·​-1;​·​/​/​·​so·​Coverity·​is·​happy
151 »   ​_hasNodePriority·​=·​false;​146 »   ​_hasNodePriority·​=·​false;​
152 »   ​_nodePriority·​=·​0;​147 »   ​_nodePriority·​=·​0;​
153 148
154 ····_logFile·​=·​NULL;​149 »   ​_logFile·​=·​NULL;​
155 »   ​_logFileIsXml·​=·​false;​150 »   ​_logFileIsXml·​=·​false;​
156 151
157 »   ​_noop·​=·​false;​152 »   ​_noop·​=·​false;​
329 bool324 bool
330 Job:​:​SetStatus(·​status_t·​newStatus·​)​325 Job:​:​SetStatus(·​status_t·​newStatus·​)​
331 {326 {
327 »   ​·​debug_printf(·​DEBUG_DEBUG_1,​·​"Job(%s)​:​:​SetStatus(%s)​\n",​
328 »   ​·»   ​»   ​»   ​GetJobName()​,​·​status_t_names[newSta​tus]·​)​;​
329
332 »   ​»   ​/​/​·​TODO:​·​add·​some·​state·​transition·​sanity-checking·​here?330 »   ​»   ​/​/​·​TODO:​·​add·​some·​state·​transition·​sanity-checking·​here?
333 »   ​_Status·​=·​newStatus;​331 »   ​_Status·​=·​newStatus;​
334 »   ​return·​true;​332 »   ​return·​true;​
391 »   ​»   ​whynot·​=·​"parent·​==·​NULL";​389 »   ​»   ​whynot·​=·​"parent·​==·​NULL";​
392 »   ​»   ​return·​false;​390 »   ​»   ​return·​false;​
393 »   ​}391 »   ​}
392 »   ​if(GetFinal()​)​·​{
393 »   ​»   ​whynot·​=·​"Tried·​to·​add·​a·​parent·​to·​a·​Final·​node";​
394 »   ​»   ​return·​false;​
395 »   ​}
394 396
395 »   ​»   ​/​/​·​we·​don't·​currently·​allow·​a·​new·​parent·​to·​be·​added·​to·​a397 »   ​»   ​/​/​·​we·​don't·​currently·​allow·​a·​new·​parent·​to·​be·​added·​to·​a
396 »   ​»   ​/​/​·​child·​that·​has·​already·​been·​started·​(unless·​the·​parent·​is398 »   ​»   ​/​/​·​child·​that·​has·​already·​been·​started·​(unless·​the·​parent·​is
455 »   ​»   ​whynot·​=·​"child·​==·​NULL";​457 »   ​»   ​whynot·​=·​"child·​==·​NULL";​
456 »   ​»   ​return·​false;​458 »   ​»   ​return·​false;​
457 »   ​}459 »   ​}
458 460 »   ​if(GetFinal()​)​·{
461 »   ​»   ​whynot·​=·​"Tried·​to·​add·​a·​child·​to·​a·​final·​node";​
462 »   ​»   ​return·​false;​
463 »   ​}
459 »   ​whynot·​=·​"n/​a";​464 »   ​whynot·​=·​"n/​a";​
460 »   ​return·​true;​465 »   ​return·​true;​
461 }466 }
diff --git a/src/condor_dagman/job.h b/src/condor_dagman/job.h
index 32b5c735690a5933bc5f5c608155a67bdcdaf990..c5e74898bc7aab77163b45abe84030bd7d89992d 100644
a/​src/​condor_dagman/​job.​hb/​src/​condor_dagman/​job.​h
131 ········​If·​you·​update·​this·​enum,​·​you·​*must*·​also·​update·​status_t_names131 ········​If·​you·​update·​this·​enum,​·​you·​*must*·​also·​update·​status_t_names
132 »   ​»   ​and·​the·​IsActive()​·​method,​·​etc.​132 »   ​»   ​and·​the·​IsActive()​·​method,​·​etc.​
133 ····​*/​133 ····​*/​
134 »   ​/​/​·​WARNING!··​status_t·​and·​status_t_names·​must·​be·​kept·​in·​sync!!
134 ····​enum·​status_t·​{135 ····​enum·​status_t·​{
136 »   ​»   ​/​**·​Job·​is·​not·​ready·​(for·​final)​·​*/​·​STATUS_NOT_READY,​
135 ········​/​**·​Job·​is·​ready·​for·​submission·​*/​·​STATUS_READY,​137 ········​/​**·​Job·​is·​ready·​for·​submission·​*/​·​STATUS_READY,​
136 ········​/​**·​Job·​waiting·​for·​PRE·​script·​*/​··​STATUS_PRERUN,​138 ········​/​**·​Job·​waiting·​for·​PRE·​script·​*/​··​STATUS_PRERUN,​
137 ········​/​**·​Job·​has·​been·​submitted·​*/​······​STATUS_SUBMITTED,​139 ········​/​**·​Job·​has·​been·​submitted·​*/​······​STATUS_SUBMITTED,​
143 ····​/​**·​The·​string·​names·​for·​the·​status_t·​enumeration.​··​Use·​this·​the·​same145 ····​/​**·​The·​string·​names·​for·​the·​status_t·​enumeration.​··​Use·​this·​the·​same
144 ········​way·​you·​would·​use·​the·​queue_t_names·​array.​146 ········​way·​you·​would·​use·​the·​queue_t_names·​array.​
145 ····​*/​147 ····​*/​
148 »   ​/​/​·​WARNING!··​status_t·​and·​status_t_names·​must·​be·​kept·​in·​sync!!
146 ····​static·​const·​char·​*·​status_t_names[];​149 ····​static·​const·​char·​*·​status_t_names[];​
147 150
148 »   ​/​/​·​explanation·​text·​for·​errors151 »   ​/​/​·​explanation·​text·​for·​errors
179 »   ​bool·​AddPostScript(·​const·​char·​*cmd,​·​MyString·​&whynot·​)​;​182 »   ​bool·​AddPostScript(·​const·​char·​*cmd,​·​MyString·​&whynot·​)​;​
180 »   ​bool·​AddScript(·​bool·​post,​·​const·​char·​*cmd,​·​MyString·​&whynot·​)​;​183 »   ​bool·​AddScript(·​bool·​post,​·​const·​char·​*cmd,​·​MyString·​&whynot·​)​;​
181 184
185 »   ​void·​SetFinal(bool·​value)​·​{·​_final·​=·​value;​·​}
186 »   ​bool·​GetFinal()​·​const·​{·​return·​_final;​·​}
182 »   ​void·​SetNoop(·​bool·​value·​)​·​{·​_noop·​=·​value;​·​}187 »   ​void·​SetNoop(·​bool·​value·​)​·​{·​_noop·​=·​value;​·​}
183 »   ​bool·​GetNoop(·​void·​)​·​{·​return·​_noop;​·​}188 »   ​bool·​GetNoop(·​void·​)​·const·​{·​return·​_noop;​·​}
184 189
185 »   ​Script·​*·​_scriptPre;​190 »   ​Script·​*·​_scriptPre;​
186 »   ​Script·​*·​_scriptPost;​191 »   ​Script·​*·​_scriptPost;​
415 »   ​int·​GetPreSkip()​·​const;​420 »   ​int·​GetPreSkip()​·​const;​
416 »   ​421 »   ​
417 ····​/​**·​*/​·​CondorID·​_CondorID;​422 ····​/​**·​*/​·​CondorID·​_CondorID;​
418 ····​/​**·​*/​·​status_t·​_Status;​
419 423
420 ····​/​/​·​maximum·​number·​of·​times·​to·​retry·​this·​node424 ····​/​/​·​maximum·​number·​of·​times·​to·​retry·​this·​node
421 ····​int·​retry_max;​425 ····​int·​retry_max;​
487 »   ​»   ​/​/​·​(Note:​·​we·​may·​need·​to·​track·​the·​hold·​state·​of·​each·​proc·​in·​a491 »   ​»   ​/​/​·​(Note:​·​we·​may·​need·​to·​track·​the·​hold·​state·​of·​each·​proc·​in·​a
488 »   ​»   ​/​/​·​cluster·​separately·​to·​correctly·​deal·​with·​multi-proc·​clusters.​)​492 »   ​»   ​/​/​·​cluster·​separately·​to·​correctly·​deal·​with·​multi-proc·​clusters.​)​
489 »   ​int·​_jobProcsOnHold;​493 »   ​int·​_jobProcsOnHold;​
490 private:​
491 494
492 »   ​»   ​/​/​·Note:​·Init·moved·to·private·section·because·calling·int·more·than495 private:​
493 »   ​»   ​/​/​·​once·​will·​cause·​a·​memory·​leak.​··​wenger·​2005-06-24.​
494 »   ​void·​Init(·​const·​char*·​jobName,​·​const·​char·​*directory,​
495 »   ​»   ​»   ​»   ​const·​char*·​cmdFile·​)​;​
496 ··
497 »   ​»   ​/​/​·​Mark·​this·​node·​as·​failed·​because·​of·​an·​error·​in·​monitoring496 »   ​»   ​/​/​·​Mark·​this·​node·​as·​failed·​because·​of·​an·​error·​in·​monitoring
498 »   ​»   ​/​/​·​the·​log·​file.​497 »   ​»   ​/​/​·​the·​log·​file.​
499 ··»   ​void·​LogMonitorFailed()​;​498 ··»   ​void·​LogMonitorFailed()​;​
517 ··516 ··
518 ····​/​/​·​name·​given·​to·​the·​job·​by·​the·​user517 ····​/​/​·​name·​given·​to·​the·​job·​by·​the·​user
519 ····​char*·​_jobName;​518 ····​char*·​_jobName;​
519
520 ····​/​**·​*/​·​status_t·​_Status;​
520 ··521 ··
521 ····​/​*··​Job·​queues522 ····​/​*··​Job·​queues
522 »   ​····​NOTE:​·​indexed·​by·​queue_t523 »   ​····​NOTE:​·​indexed·​by·​queue_t
600 »   ​»   ​PRE_SKIP_MAX·​=·​0xff,​601 »   ​»   ​PRE_SKIP_MAX·​=·​0xff,​
601 »   ​»   ​NO_PRE_VALUE·​=·​-1602 »   ​»   ​NO_PRE_VALUE·​=·​-1
602 »   ​};​603 »   ​};​
604
605 »   ​/​/​·​whether·​this·​is·​a·​final·​job
606 »   ​bool·​_final;​
603 };​607 };​
604 608
605 /​**·​A·​wrapper·​function·​for·​Job:​:​Print·​which·​allows·​a·​NULL·​job·​pointer.​609 /​**·​A·​wrapper·​function·​for·​Job:​:​Print·​which·​allows·​a·​NULL·​job·​pointer.​
diff --git a/src/condor_dagman/parse.cpp b/src/condor_dagman/parse.cpp
index d864cac660b446b199909bdbbe9d09a0979886ff..f5ffc68e3feed377824665f8ff77f1c01f94bcef 100644
a/​src/​condor_dagman/​parse.​cppb/​src/​condor_dagman/​parse.​cpp
236 »   ​»   ​»   ​»   ​»   ​···​"submitfile")​;​236 »   ​»   ​»   ​»   ​»   ​···​"submitfile")​;​
237 »   ​»   ​}237 »   ​»   ​}
238 238
239 »   ​»   ​/​/​·​Handle·​a·​SUBDAG·​spec
239 »   ​»   ​else·​if» ​(strcasecmp(token,​·​"SUBDAG")​·​==·​0)​·​{240 »   ​»   ​else·​if» ​(strcasecmp(token,​·​"SUBDAG")​·​==·​0)​·​{
240 »   ​»   ​»   ​parsed_line_successfu​lly·​=·​parse_subdag(·​dag,​·241 »   ​»   ​»   ​parsed_line_successfu​lly·​=·​parse_subdag(·​dag,​·
241 »   ​»   ​»   ​»   ​»   ​»   ​Job:​:​TYPE_CONDOR,​242 »   ​»   ​»   ​»   ​»   ​»   ​Job:​:​TYPE_CONDOR,​
242 »   ​»   ​»   ​»   ​»   ​»   ​token,​·​filename,​·​lineNumber,​·​tmpDirectory.​Value()​·​)​;​243 »   ​»   ​»   ​»   ​»   ​»   ​token,​·​filename,​·​lineNumber,​·​tmpDirectory.​Value()​·​)​;​
243 »   ​»   ​}244 »   ​»   ​}
244 245
246 »   ​»   ​/​/​·​Handle·​a·​FINAL·​spec
247 »   ​»   ​else·​if(strcasecmp(token,​·​"FINAL")​·​==·​0)​·​{
248 »   ​»   ​»   ​parsed_line_successfu​lly·​=·​parse_node(·​dag,​·
249 »   ​»   ​»   ​»   ​»   ​···​Job:​:​TYPE_CONDOR,​·​token,​
250 »   ​»   ​»   ​»   ​»   ​···​filename,​·​lineNumber,​·​tmpDirectory.​Value()​,​·​"",​
251 »   ​»   ​»   ​»   ​»   ​···​"submitfile"·​)​;​
252 »   ​»   ​}
253
245 »   ​»   ​/​/​·​Handle·​a·​SCRIPT·​spec254 »   ​»   ​/​/​·​Handle·​a·​SCRIPT·​spec
246 »   ​»   ​/​/​·​Example·​Syntax·​is:​··​SCRIPT·​(PRE|POST)​·​JobName·​ScriptName·​Args·​.​.​.​255 »   ​»   ​/​/​·​Example·​Syntax·​is:​··​SCRIPT·​(PRE|POST)​·​JobName·​ScriptName·​Args·​.​.​.​
247 »   ​»   ​else·​if·​(·​strcasecmp(token,​·​"SCRIPT")​·​==·​0·​)​·​{256 »   ​»   ​else·​if·​(·​strcasecmp(token,​·​"SCRIPT")​·​==·​0·​)​·​{
350 »   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"%s·​(line·​%d)​:​·​"359 »   ​»   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"%s·​(line·​%d)​:​·​"
351 »   ​»   ​»   ​»   ​"Expected·​JOB,​·​DATA,​·​SUBDAG,​·​SCRIPT,​·​PARENT,​·​RETRY,​·​"360 »   ​»   ​»   ​»   ​"Expected·​JOB,​·​DATA,​·​SUBDAG,​·​SCRIPT,​·​PARENT,​·​RETRY,​·​"
352 »   ​»   ​»   ​»   ​"ABORT-DAG-ON,​·​DOT,​·​VARS,​·​PRIORITY,​·​CATEGORY,​·​MAXJOBS,​·​"361 »   ​»   ​»   ​»   ​"ABORT-DAG-ON,​·​DOT,​·​VARS,​·​PRIORITY,​·​CATEGORY,​·​MAXJOBS,​·​"
353 »   ​»   ​»   ​»   ​"CONFIG,​·​SPLICE,​·​NODE_STATUS_FILE,​·​or·​PRE_SKIP·​token\n",​362 »   ​»   ​»   ​»   ​"CONFIG,​·​SPLICE,​·FINAL,​·​NODE_STATUS_FILE,​·​or·​PRE_SKIP·​token\n",​
354 »   ​»   ​»   ​»   ​filename,​·​lineNumber·​)​;​363 »   ​»   ​»   ​»   ​filename,​·​lineNumber·​)​;​
355 »   ​»   ​»   ​parsed_line_successfu​lly·​=·​false;​364 »   ​»   ​»   ​parsed_line_successfu​lly·​=·​false;​
356 »   ​»   ​}365 »   ​»   ​}
537 »   ​}546 »   ​}
538 547
539 »   ​/​/​·​looks·​ok,​·​so·​add·​it548 »   ​/​/​·​looks·​ok,​·​so·​add·​it
549 »   ​bool·​isFinal·​=·​strcasecmp(·​nodeTypeKeyword,​·​"FINAL"·​)​·​==·​MATCH;​
540 »   ​if(·​!AddNode(·​dag,​·​nodeType,​·​nodeName,​·​directory,​550 »   ​if(·​!AddNode(·​dag,​·​nodeType,​·​nodeName,​·​directory,​
541 »   ​»   ​»   ​submitFile,​·​NULL,​·​NULL,​·​noop,​·​done,​·​whynot·​)​·​)​551 »   ​»   ​»   ​»   ​submitFile,​·​NULL,​·​NULL,​·​noop,​·​done,​·isFinal,​·​whynot·​)​·​)​
542 »   ​{552 »   ​{
543 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​%s·​(line·​%d)​:​·​%s\n",​553 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·​"ERROR:​·​%s·​(line·​%d)​:​·​%s\n",​
544 »   ​»   ​»   ​»   ​»   ​··​dagFile,​·​lineNum,​·​whynot.​Value()​·​)​;​554 »   ​»   ​»   ​»   ​»   ​··​dagFile,​·​lineNum,​·​whynot.​Value()​·​)​;​
880 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​890 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​
881 »   ​»   ​return·​false;​891 »   ​»   ​return·​false;​
882 »   ​}892 »   ​}
893
894 »   ​if·​(·​job->GetFinal()​·​)​·​{
895 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·
896 »   ​»   ​»   ​»   ​»   ​··​"ERROR:​·​%s·​(line·​%d)​:​·​Final·​job·​%s·​cannot·​have·​retries\n",​
897 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​
898 »   ​»   ​return·​false;​
899 »   ​}
883 »   ​900 »   ​
884 »   ​char·​*s·​=·​strtok(·​NULL,​·​DELIMITERS·​)​;​901 »   ​char·​*s·​=·​strtok(·​NULL,​·​DELIMITERS·​)​;​
885 »   ​if(·​s·​==·​NULL·​)​·​{902 »   ​if(·​s·​==·​NULL·​)​·​{
979 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​996 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​
980 »   ​»   ​return·​false;​997 »   ​»   ​return·​false;​
981 »   ​}998 »   ​}
999
1000 »   ​if·​(·​job->GetFinal()​·​)​·​{
1001 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·
1002 »   ​»   ​»   ​»   ​»   ​··​"ERROR:​·​%s·​(line·​%d)​:​·​Final·​job·​%s·​cannot·​have·​ABORT-DAG-ON·​specification\n",​
1003 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​
1004 »   ​»   ​return·​false;​
1005 »   ​}
982 »   ​1006 »   ​
983 »   ​»   ​/​/​·​Node·​abort·​value.​1007 »   ​»   ​/​/​·​Node·​abort·​value.​
984 »   ​char·​*abortValStr·​=·​strtok(·​NULL,​·​DELIMITERS·​)​;​1008 »   ​char·​*abortValStr·​=·​strtok(·​NULL,​·​DELIMITERS·​)​;​
1315 »   ​»   ​}1339 »   ​»   ​}
1316 »   ​}1340 »   ​}
1317 1341
1342 »   ​if·​(·​job->GetFinal()​·​)​·​{
1343 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·
1344 »   ​»   ​»   ​»   ​»   ​··​"ERROR:​·​%s·​(line·​%d)​:​·​Final·​job·​%s·​cannot·​have·​priority\n",​
1345 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​
1346 »   ​»   ​return·​false;​
1347 »   ​}
1348
1318 »   ​/​/​1349 »   ​/​/​
1319 »   ​/​/​·​Next·​token·​is·​the·​priority·​value.​1350 »   ​/​/​·​Next·​token·​is·​the·​priority·​value.​
1320 »   ​/​/​1351 »   ​/​/​
1409 »   ​»   ​}1440 »   ​»   ​}
1410 »   ​}1441 »   ​}
1411 1442
1443 »   ​if·​(·​job->GetFinal()​·​)​·​{
1444 »   ​»   ​debug_printf(·​DEBUG_QUIET,​·
1445 »   ​»   ​»   ​»   ​»   ​··​"ERROR:​·​%s·​(line·​%d)​:​·​Final·​job·​%s·​cannot·​have·​category\n",​
1446 »   ​»   ​»   ​»   ​»   ​··​filename,​·​lineNumber,​·​jobNameOrig·​)​;​
1447 »   ​»   ​return·​false;​
1448