Page History
- 2017-Mar-23 11:38 wenger
- 2016-Jun-17 11:30 wenger
- 2016-Jun-17 11:25 wenger
- 2016-Jun-17 11:22 wenger
- 2016-Jun-14 11:17 wenger
- 2016-Jun-14 11:02 wenger
- 2016-Jun-14 10:53 wenger
- 2016-Jun-14 10:45 wenger
- 2016-Jun-13 16:13 wenger
- 2016-Jun-13 16:07 wenger
- 2016-Jun-13 16:01 wenger
- 2016-Jun-13 15:59 wenger
- 2016-Jun-13 15:46 wenger
- 2016-Jun-13 15:42 wenger
- 2016-Jun-13 15:40 wenger
- 2016-Jun-13 15:29 wenger
When should I use one of these features?
Both external sub-DAGs and splices allow you to compose a large workflow from various sub-pieces that are defined in individual DAG files. This is the basic motivation for using either external sub-DAGs or splices: you want to create a single workflow from a number of DAG files, either because the smaller DAG files already exist, or because it's easier to deal with sub-parts of the workflow. (One use case might be that you have sub-workflows that you want to combine in different ways to make different overall workflows.)
Some reasons to use external sub-DAGs or splices:
- Create a workflow from separate sub-workflows
- Dynamically create parts of the workflow (external sub-DAGs only)
- Re-try multiple nodes as a unit (external sub-DAGs only)
- Short-circuit parts of the workflow (external sub-DAGs only)
Feature comparison
Here's a table comparing external sub-DAGs and splices. Note that the bold entries are the ones that are advantageous for a given feature.
Feature | External sub-DAGs | Splices | Notes |
Ability to incorporate separate sub-workflow files | yes | yes | |
Rescue DAG(s) created upon failure | yes | yes | |
DAG recovery (e.g., from submit machine crash) | yes | yes | |
Creates multiple DAGMan instances in the queue | yes | no | |
Possible combinatorial explosion of dependencies (see below) | no | yes | Until we implement socket nodes for splices |
Sub-workflow files must exist at submission | no | yes | |
PRE/POST scripts allowed on sub-workflows | yes | no | Until we implement socket nodes for splices |
Ability to retry sub-workflows | yes | no | |
Job/script throttling applies across entire workflow | no | yes | |
Separate job/script throttles for each sub-workflow | yes | no | |
Node categories can apply across entire workflow | no | yes | |
Ability to set priority on sub-workflows as nodes | yes | no | |
Ability to reduce workflow memory footprint | yes? | no | If used properly |
Ability to have separate final nodes in sub-workflows | yes | no | |
Ability to abort sub-workflows individually | yes | no | |
Ability to associate variables associated with sub-workflows | yes | no | |
Ability to configure sub-workflows individually | yes | no | Can be good or bad |
Separate node status files, etc., for sub-workflows | yes | no |
Should I use external sub-DAGs or splices?
The simple answer is that, unless you need one of the features that's available with external sub-DAGs but not with splices (see the table above), you should use splices. Splices are generally simpler and have less overhead than external sub-DAGs (unless the workflow is specifically designed to minimize the external sub-DAG overhead). Also, workflow-wide throttling is generally more useful than separate throttles for sub-parts of the workflow.
How to use external sub-DAGs to reduce workflow memory footprint
(Coming soon!)
Note: This document is valid for HTCondor version 8.5.5.