Page History

Turn Off History

How to scavenge cycles from PBS

Known to work in Condor version: 7.0

Overview

The Condor system is designed (among many other things) to scavenge compute cycles on desktop workstations when interactive users are idle. This same concept can be applied to scavenging cycles from another batch system running on the same computer. The main idea is that instead of configuring Condor to notice when an interactive user is idle, to configure Condor to notice when the other batch system is idle on the machine. When the other system is idle, Condor is free to run jobs, until such time as the other batch system has work to do. Then, Condor must preempt or checkpoint the current work. This page discusses how to configure Condor to do this with PBS, though the concept works for other batch systems as well.

Condor and PBS

First, configure the condor startd to only run jobs when the attribute PBSRunning is set. We'll set this dynamically with the condor_config_val -rset command.

On the worker nodes, define in the condor config:

ENABLE_RUNTIME_CONFIG = TRUE
STARTD_SETTABLE_ATTRS_OWNER = PBSRunning
PBSRunning                      = False

# Only start jobs if PBS is not currently running a job
START_NOPBS = ( $(PBSRunning) == False )

START = $(START) && $(START_NOPBS)

so that Condor will only start if START is true and there are no PBS jobs running.

In the PBS world, again on the worker side, have PBS tell Condor when it is running, by adding the following to the PBS prologue.

        if [ -x /opt/condor/bin/condor_config_val ]; then
                 /opt/condor/bin/condor_config_val -rset                          -startd PBSRunning=True > /dev/null
                 /opt/condor/sbin/condor_reconfig -startd > /dev/null
                 sleep 2
                 if ( /opt/condor/bin/condor_status  -format '%s '
						Name                          -format '%s \n' State  $(hostname) 2> /dev/ null                          | grep -q Claimed )
                 then
                         /opt/condor/sbin/condor_vacate > /dev/null
                         sleep 2
                 fi
         fi

In the PBS Epilogue, tell condor that it is OK to use this machine again:

                 if [ -x /opt/condor/bin/condor_config_val ]; then
                         /opt/condor/bin/condor_config_val -rset                                  -startd PBSRunning=False > /dev/null
                         /opt/condor/sbin/condor_reconfig -startd > / dev/null
                 fi

Acknowledgments

This is based on a recipe from Preston Smith of Purdue University. Thanks Preston!