Page History
- 2017-Mar-23 11:38 wenger
 - 2016-Jun-17 11:30 wenger
 - 2016-Jun-17 11:25 wenger
 - 2016-Jun-17 11:22 wenger
 - 2016-Jun-14 11:17 wenger
 - 2016-Jun-14 11:02 wenger
 - 2016-Jun-14 10:53 wenger
 - 2016-Jun-14 10:45 wenger
 - 2016-Jun-13 16:13 wenger
 - 2016-Jun-13 16:07 wenger
 - 2016-Jun-13 16:01 wenger
 - 2016-Jun-13 15:59 wenger
 - 2016-Jun-13 15:46 wenger
 - 2016-Jun-13 15:42 wenger
 - 2016-Jun-13 15:40 wenger
 - 2016-Jun-13 15:29 wenger
 
When should I use one of these features?
Both external sub-DAGs and splices allow you to compose a large workflow from various sub-pieces that are defined in individual DAG files. This is the basic motivation for using either external sub-DAGs or splices: you want to create a single workflow from a number of DAG files, either because the smaller DAG files already exist, or because it's easier to deal with sub-parts of the workflow. (One use case might be that you have sub-workflows that you want to combine in different ways to make different overall workflows.)
Some reasons to use external sub-DAGs or splices:
- Create a workflow from separate sub-workflows
 - Dynamically create parts of the workflow (external sub-DAGs only)
 - Re-try multiple nodes as a unit (external sub-DAGs only)
 - Short-circuit parts of the workflow (external sub-DAGs only)
 
Feature comparison
Here's a table comparing external sub-DAGs and splices. Note that the bold entries are the ones that are advantageous for a given feature.
| Feature | External sub-DAGs | Splices | Notes | 
| Ability to incorporate separate sub-workflow files | yes | yes | |
| Rescue DAG(s) created upon failure | yes | yes | |
| DAG recovery (e.g., from submit machine crash) | yes | yes | |
| Creates multiple DAGMan instances in the queue | yes | no | |
| Possible combinatorial explosion of dependencies (see below) | no | yes | Until we implement socket nodes for splices | 
| Sub-workflow files must exist at submission | no | yes | |
| PRE/POST scripts allowed on sub-workflows | yes | no | Until we implement socket nodes for splices | 
| Ability to retry sub-workflows | yes | no | |
| Job/script throttling applies across entire workflow | no | yes | |
| Separate job/script throttles for each sub-workflow | yes | no | |
| Node categories can apply across entire workflow | no | yes | |
| Ability to set priority on sub-workflows as nodes | yes | no | |
| Ability to reduce workflow memory footprint | yes? | no | If used properly | 
| Ability to have separate final nodes in sub-workflows | yes | no | |
| Ability to abort sub-workflows individually | yes | no | |
| Ability to associate variables associated with sub-workflows | yes | no | |
| Ability to configure sub-workflows individually | yes | no | Can be good or bad | 
| Separate node status files, etc., for sub-workflows | yes | no | 
Should I use external sub-DAGs or splices?
The simple answer is that, unless you need one of the features that's available with external sub-DAGs but not with splices (see the table above), you should use splices. Splices are generally simpler and have less overhead than external sub-DAGs (unless the workflow is specifically designed to minimize the external sub-DAG overhead). Also, workflow-wide throttling is generally more useful than separate throttles for sub-parts of the workflow.
How to use external sub-DAGs to reduce workflow memory footprint
(Coming soon!)
Note: This document is valid for HTCondor version 8.5.5.
