*: Releases are now in /p/condor/public/binaries/contrib, named dmtcp_condor_integration-*-Any-Any.tar.gz where * is the version number.
*: The Git repository is in /p/condor/repository/dmtcp_condor.git/ and can be cloned with
{code}git clone /p/condor/repository/dmtcp_condor.git/{endcode}
+*: When making a new release, you need to change the version number inside of shim_dmtcp.
+*: DMTCP has a build option for Condor. The normal behavior is to try and checkpoint sockets. When built for Condor, DMTCP behaves like Condor's checkpointing support: attempts to checkpoint are delayed until all sockets are closed.
+
+General system:
+*: The user submits a job. Notable changes to their submit file are:
+{code}
+# Submit the ship_dmtcp instead of their normal job.
+executable=shim_dmtcp
+
+# Those are IN ADDITION to the users "real" binary and input files!
+transfer_input_files = dmtcp_checkpoint,dmtcp_coordinator,\
+ dmtcp_command,dmtcp_restart,dmtcphijack.so,libmtcp.so,\
+ mtcp_restart
+
+# Argument Meaning
+# --log log file name for actions in shim_dmtcp script,
+# if n/a use /dev/null
+# --stdin stdin file, if n/a use /dev/null
+# --stdout stdout file, if n/a use /dev/null
+# --stderr stderr file, if n/a use /dev/null
+# --ckptint checkpointing interval in seconds
+# 1 the executable name you should have transferred in
+# 2+ arguments to the executable
+#
+# Note that stdout/stderr files are for output from the "real"
+# binaries. The normal output/error will execlusively be
+# messages from shim_dmtcp and the DMTCP tools.
+arguments = --log shim_dmtcp.$(CLUSTER).$(PROCESS).log \
+ --stdin foo.py \
+ --stdout job.$(CLUSTER).$(PROCESS).out \
+ --stderr job.$(CLUSTER).$(PROCESS).err \
+ --ckptint 1800 \
+ ./REAL_BINARY example-argument-one example-argument-two
+
+
+# These are all required by DMTCP. JALIB is an internal DMTCP
+# library ("Jason's library"). If your jobs needs more
+# environment options set, just append them.
+environment=DMTCP_TMPDIR=./;JALIB_STDERR_PATH=/dev/null;\
+ DMTCP_PREFIX_ID=$(CLUSTER)_$(PROCESS)
+
+# On kill, tell our shim to checkpoint. You can change this, but will
+# need to change shim_dmtcp as well
+kill_sig = 2
+
+# If your pool isn't homogenous (nearly identical distributions
+# and updates), your checkpoints may not be portable. The exact
+# options needed aren't yet knom, but these may work. Note that
+# you'll need to identify the exact values yourself; these won't
+# work for you!
+Requirements = \
+ (CheckpointPlatform == "LINUX INTEL 2.6.x normal 0x40000000"\
+ && OSKernelRelease == "2.6.18-128.el5")
+{endcode}
+
+*: The shim starts up the dmtcp_coordinator. It is given instructions to checkpoint at regular intervals.
+*: The shim starts up the user job, sneaking dmtcphijack.so into the runtime. (Probably using LD_PRELOAD?)
+*: Whe
+
+Contents of the DMTCP Condor integration repository (and tarballs at the moment):
+*: =Makefile= - Really just for development use. Collects DMTCP files and builds/submits a little test program. Note that the pay to mtcp_restart varies in different distributions.
+*: testing and development tools
+*:: =foo.py= - Python script used for testing Python under DMTCP
+*:: =foo.c=- C program used for testing Python under DMTCP