Ticket #2122: Allow PRE script to skip node job

Logan from Botany wants the following behavior: he wants a PRE script to be able to decide that a node job should not be run, but the node should be considered successful (if the output that would be generated by the node job already exists).

I'm thinking that we could do this by adding a new keyword that specifies an exit code from the PRE script that will skip the node job, like this:

  SKIP <node name> <exit code>


  PRE_SKIP <node name> <exit code>
[Append remarks]


I implemented the PRE_SKIP keyword, with test cases. I pushed it onto the branch V7_7-implement_pre_skip_key_word-branch and am sending it over to Kent for review.

2011-Jun-01 16:01:48 by wenger:
See TEMPTEMP comments in dag.cpp for info about possible problems w/ rescue DAGs.

Also, the test scripts (job_dagman_pre_skip-A.run, etc.) all fail for me.

2011-Jun-04 17:15:19 by nwp:
Fixed code to address concerns in previous remark, and fixed the tests.

2011-Jun-04 20:21:34 by nwp:
Added documentation in the user manual at c22e81a in CONDOR_DOC

2011-Jun-07 13:51:12 by wenger:
In test B and test C, the B_A and C_A node jobs are actually run even though the PRE script returns the skip value. Obviously, this needs to be fixed in DAGMan; also, the tests need to be fixed to that they fail if the job is run.

2011-Jun-07 13:57:00 by wenger:
The PRE_SKIP setting is not preserved in a rescue DAG.

2011-Jun-21 12:02:55 by smoler:
Documentation for this new feature is now on master branch -- for 7.7.1

2011-Jul-12 22:00:26 by wenger:
I just did a bit more cleanup on this. However, there is still some temporary code (marked with TEMPTEMP comments) in place, because the jobstate_log test fails, and I'm still playing a bit with how a skip should show up in the jobstate.log file.

2011-Jul-13 17:05:51 by wenger:
Okay, I've got the jobstate.log-related stuff fixed up -- go ahead and merge!
[Append remarks]


Type: enhance           Last Change: 2013-Jan-29 13:57
Status: resolved          Created: 2011-May-03 13:48
Fixed Version: v070701           Broken Version: v070600 
Priority:          Subsystem: Dag 
Assigned To: nwp           Derived From:  
Creator: wenger  Rust:  
Customer Group: other  Visibility: public 
Notify: wenger@cs.wisc.edu, psilord@cs.wisc.edu  Due Date:  

Derived Tickets:

#2966   Test combination of PRE_SKIP and recovery

Related Check-ins:

2011-Sep-01 15:07   Check-in [27070]: Gittrac #2057 (also kind of #2122): changed PRE_SKIP to skip the POST script, if any, since with the new "always run the POST script" feature there isn't much point to PRE_SKIP otherwise; changed job_dagman_pre_skip-B to make sure the POST script really is skipped. (By Kent Wenger )
2011-Jul-12 13:13   Check-in [22410]: Whitespace changes urged by Kent ===GT=== #2122 (By Nathan W. Panike )
2011-Jul-07 16:09   Check-in [22380]: Partway through fixing up the 'pre skip' (gittrac #2122) tests -- I think job job_dagman_pre_skip-A is okay at this point; the others still need more work. (By Kent Wenger )
2011-Jun-22 17:03   Check-in [22274]: Renamed a couple of test files that somehow got committed with stars on the end of the names... This is for #2122 (By Kent Wenger )
2011-Jun-22 16:59   Check-in [22273]: Renamed a couple of test files that somehow got committed with stars on the end of the names... This is for #2122 (By Kent Wenger )
2011-Jun-21 12:00   Check-in [26136]: document new PRE_SKIP key word for DAGMan; gittrac #2122 (By Karen Miller )
2011-Jun-21 10:48   Check-in [26134]: removal of PRE_SKIP documentation; gittrac #2122; feature not ready for 7.7.0 (By Karen Miller )
2011-Jun-14 13:39   Check-in [22178]: Add another test to CMakeLists.txt ===GT=== #2122 (By Nathan W. Panike )
2011-Jun-14 11:57   Check-in [22177]: Test case to verify PRE_SKIP node gets written to rescue DAG ===GT=== #2122 (By Nathan W. Panike )
2011-Jun-14 11:34   Check-in [22176]: Write out the PRE_SKIP node in the rescue DAG ===GT=== #2122 (By Nathan W. Panike )
2011-Jun-14 10:30   Check-in [22173]: Fix up tests for PRE_SKIP key word ===GT=== #2122 (By Nathan W. Panike )
2011-Jun-14 10:25   Check-in [22172]: Use Dag::TerminateJob instead of Job::TerminateSuccess ===GT=== #2122 TerminateJob fixes up the queues properly for queuing the child jobs (By Nathan W. Panike )
2011-Jun-04 20:19   Check-in [26111]: Document new semantics of PRE_SKIP DAGman command #2122 (By Nathan W. Panike )
2011-Jun-04 16:59   Check-in [22103]: Fix test cases ===GT=== #2122 We expect to see failure in job_dagman_pre_skip-A. Revise command in job_dagman_pre_skip-B. Look for the correct files and check the .dag.dagman.out file for the PRE_SKIP indicator. (By Nathan W. Panike )
2011-Jun-04 16:57   Check-in [22102]: Implement changes from Kent's review ===GT=== #2122 Write to jobstate.log Call TerminateSuccess rather than setting noop (By Nathan W. Panike )
2011-Jun-01 15:59   Check-in [22069]: Gittrac #2122: review of Nathan's implementation of this. Changed some of the code to be more in the existing style (both in terms of formatting and higher-level structure); added some notes about possible problems with rescue DAGs caused by skip implementation. (By Kent Wenger )
2011-May-12 15:27   Check-in [21853]: Test to make sure PRE_SKIP kills POST script ===GT=== #2122 (By Nathan W. Panike )
2011-May-12 15:03   Check-in [21852]: Kill the POST script if PRE_SKIP kills the job ===GT=== #2122 (By Nathan W. Panike )
2011-May-12 14:49   Check-in [21851]: Add tests for PRE_SKIP node in DAG ===GT=== #2122 (By Nathan W. Panike )
2011-May-12 14:48   Check-in [21850]: Parse a PRE_SKIP node ===GT=== #2122 (By Nathan W. Panike )
2011-May-12 14:44   Check-in [21849]: Add code in dag class to handle PRE_SKIP ===GT=== #2122 When DAGman checks the return value of the PRE script, it ignores failure if the return value matches the PRE_SKIP value given in the DAG submit file. In that case, we mark the DAG job as a noop and move on. This does not handle the case where there [...] (By Nathan W. Panike )
2011-May-12 14:43   Check-in [21848]: Job class data and methods for PRE_SKIP ===GT=== #2122 Add a _preskip data member to the Job class. Add HasPreSkip and GetPreSkip to interrogate whether a job has a PRE_SKIP value, and what that value is. The AddPreSkip method adds a PRE_SKIP script to the job. (By Nathan W. Panike )