HTCondorWiki: Sub Dags Vs Splices

Page History

Note: In the process of creating this document (2016-06-14)

This document is an explanation of why you might want to use the DAGMan external sub-DAG and splice features, and which one you might want to use in a particular situation. (See http://research.cs.wisc.edu/htcondor/manual/v8.5/2_10DAGMan_Applications.html#SECTION003108900000000000000 and http://research.cs.wisc.edu/htcondor/manual/v8.5/2_10DAGMan_Applications.html#SECTION0031081000000000000000 for detailed information about external sub-DAGs and splices, respectively.)

Both external sub-DAGs and splices allow you to compose a large workflow from various sub-pieces that are defined in individual DAG files. This is the basic motivation for using either external sub-DAGs or splices: you want to create a single workflow from a number of DAG files, either because the smaller DAG files already exist, or because it's easier to deal with sub-parts of the workflow. (One use case might be that you have sub-workflows that you want to combine in different ways to make different overall workflows.)

Feature Comparison

Here's a table comparing external sub-DAGs and splices. Note that the bold entries are the ones that are advantageous for a given feature.

Feature External sub-DAGs Splices Notes

Incorporate separate DAG files yes yes

Rescue DAGs yes yes

DAGMan recovery yes yes

Multiple DAGMan instances yes no

Possible combinatorial explosion of dependencies no yes Until we implement socket nodes for splices

Dynamic creation of sub-workflows yes no

PRE/POST scripts on sub-workflows yes no Until we implement socket nodes for splices

Retries of sub-workflows yes no

Workflow-wide throttling no yes

Per-sub-workflow throttling yes no

Workflow-wide node categories no yes

Node priorities on sub-workflows yes no

Reduce memory footprint of large workflows yes? no If used properly

Per-sub-workflow file final nodes yes no

Abort sub-workflows individually yes no

Variables associated with sub-workflows yes no

Separate configuration for sub-workflows yes no Can be good or bad

One node status file, etc., for entire workflow no yes

When should I use one of these features?

(add stuff here)

Should I use external sub-DAGs or splices?

The simple answer is that, unless you need one of the features that's available with external sub-DAGs but not with splices, you should use splices. Splices are generally simpler... (add more stuff here)

How to use external sub-DAGs to reduce workflow memory footprint

(Coming soon!)

Note: This document is valid for HTCondor version 8.5.5.

Feature	External sub-DAGs	Splices	Notes
Incorporate separate DAG files	yes	yes
Rescue DAGs	yes	yes
DAGMan recovery	yes	yes
Multiple DAGMan instances	yes	no
Possible combinatorial explosion of dependencies	no	yes	Until we implement socket nodes for splices
Dynamic creation of sub-workflows	yes	no
PRE/POST scripts on sub-workflows	yes	no	Until we implement socket nodes for splices
Retries of sub-workflows	yes	no
Workflow-wide throttling	no	yes
Per-sub-workflow throttling	yes	no
Workflow-wide node categories	no	yes
Node priorities on sub-workflows	yes	no
Reduce memory footprint of large workflows	yes?	no	If used properly
Per-sub-workflow file final nodes	yes	no
Abort sub-workflows individually	yes	no
Variables associated with sub-workflows	yes	no
Separate configuration for sub-workflows	yes	no	Can be good or bad
One node status file, etc., for entire workflow	no	yes