This document is an explanation of why you might want to use the DAGMan _external sub-DAG_ and _splice_ features, and which one you might want to use in a particular situation. (See {link: http://research.cs.wisc.edu/htcondor/manual/v8.5/2_10DAGMan_Applications.html#SECTION003108900000000000000} and {link: http://research.cs.wisc.edu/htcondor/manual/v8.5/2_10DAGMan_Applications.html#SECTION0031081000000000000000} for detailed information about external sub-DAGs and splices, respectively.) *When should I use one of these features?* Both external sub-DAGs and splices allow you to compose a large workflow from various sub-pieces that are defined in individual DAG files. This is the basic motivation for using either external sub-DAGs or splices: you want to create a single workflow from a number of DAG files, either because the smaller DAG files already exist, or because it's easier to deal with sub-parts of the workflow. (One use case might be that you have sub-workflows that you want to combine in different ways to make different overall workflows.) Some reasons to use external sub-DAGs or splices: *: Create a workflow from separate sub-workflows *: Dynamically create parts of the workflow (external sub-DAGs only) *: Re-try multiple nodes as a unit (external sub-DAGs only) *: Short-circuit parts of the workflow (external sub-DAGs only) *Feature comparison* Here's a table comparing external sub-DAGs and splices. Note that the bold entries are the ones that are advantageous for a given feature. | *Feature* | *External sub-DAGs* | *Splices* | *Notes* | | Ability to incorporate separate sub-workflow files | *yes* | *yes* | | | Rescue DAG(s) created upon failure | *yes* | *yes* | | | DAG recovery (e.g., from submit machine crash) | *yes* | *yes* | | | Creates multiple DAGMan instances in the queue | yes | *no* | | | Possible combinatorial explosion of dependencies (see below) | *no* | yes | Until we implement socket nodes for splices | | Sub-workflow files must exist at submission | *no* | yes | | | PRE/POST scripts allowed on sub-workflows | *yes* | no | Until we implement socket nodes for splices | | Ability to retry sub-workflows | *yes* | no | | | Job/script throttling applies across entire workflow | no | *yes* | | | Separate job/script throttles for each sub-workflow | yes | no | | | Node categories can apply across entire workflow | no | *yes* | | | Ability to set priority on sub-workflows as nodes | *yes* | no | | | Ability to reduce workflow memory footprint | *yes?* | no | If used properly | | Ability to have separate final nodes in sub-workflows | *yes* | no | | | Ability to abort sub-workflows individually | *yes* | no | | | Ability to associate variables associated with sub-workflows | *yes* | no | | | Ability to configure sub-workflows individually | yes | no | Can be good or bad | | Separate node status files, etc., for sub-workflows | yes | *no* | | *Should I use external sub-DAGs or splices?* The simple answer is that, unless you need one of the features that's available with external sub-DAGs but not with splices (see the table above), you should use splices. Splices are generally simpler and have less overhead than external sub-DAGs (unless the workflow is specifically designed to minimize the external sub-DAG overhead). Also, workflow-wide throttling is generally more useful than separate throttles for sub-parts of the workflow. *How to use external sub-DAGs to reduce workflow memory footprint* (Coming soon!) Note: This document is valid for HTCondor version 8.5.5.