* WORK IN PROGRESS *

Condor can help manage GPUs (graphics processing units) in your pool of execute nodes, making them available to jobs that can use them using an API like {link:http://www.khronos.org/opencl/ OpenCL} or {link:http://www.nvidia.com/object/cuda_home_new.html CUDA}.

Condor matches execute nodes (described by ClassAds) to jobs (also described by ClassAds). The general technique to manage GPUs is:

1. Advertise the GPU: Configure Condor so that execute nodes include information about available GPUs in their ClassAd.
2. Require a GPU: Jobs modify their Requirements to require a suitable GPU
3. Identify the GPU: Jobs modify their arguments or environment to learn which GPU it may use.

{section: Advertising the GPU}

A key challenge of advertising GPUs is that a GPU can only be used by one job at a time.  If an execute node has multiple slots (a likely case!), you'll want to limit each GPU to only being advertised to a single slot.

You have several options for advertising your GPUs.  In increasing order of complexity they are:

1: Static configuration
2: Automatic configuration
3: Dynamic advertising

This progression may be a useful way to do initial setup and testing.  Start with a static configuration to ensure everything works.  Move to an automatic configuration to develop and test partial automation.  Finally a few small changes should make it possible to turn your automatic configuration into dynamic advertising.

{subsection: Static configuration}

If you have a small number of nodes, or perhaps a large number
of identical nodes, you can add static attributes manually using
{link:http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#16198 STARTD_ATTRS} on a {link:http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#37111 per slot basis}. In the simplest case, it might just be:

{code}
SLOT1_HAS_GPU=TRUE
SLOT1_GPU_DEV=0
STARTD_ATTRS=HAS_GPU,GPU_DEV
{endcode}

This limits the GPU to only being advertised by the first slot.  A job can use HAS_GPU to identify available slots with GPUs.  The job can use GPU_DEV to identify which GPU device to use.  (A job could use the presence of GPU_DEV to identify slots with GPUs instead of HAS_GPU, but "=HAS_CPU=" is a bit easier to read than "=(GPU_DEV=!=UNDEFINED)="

If you have two GPUs, you might give the first two slots a GPU each.
{code}
SLOT1_HAS_GPU=TRUE
SLOT1_GPU_DEV=0
SLOT2_HAS_GPU=TRUE
SLOT2_GPU_DEV=0
STARTD_ATTRS=HAS_GPU,GPU_DEV
{endcode}

You can also provide more information about your GPUs so that a job can distinguish between different GPUs:

{code}
SLOT1_GPU_CUDA_DRV=3.20
SLOT1_GPU_CUDA_RUN=3.20
SLOT1_GPU_DEV=0
SLOT1_GPU_NAME="Tesla C2050"
SLOT1_GPU_CAPABILITY=2.0
SLOT1_GPU_GLOBALMEM_MB=2687
SLOT1_GPU_MULTIPROC=14
SLOT1_GPU_NUMCORES=32
SLOT1_GPU_CLOCK_GHZ=1.15
STARTD_ATTRS = GPU_DEV, GPU_NAME, GPU_CAPABILITY, GPU_GLOBALMEM_MB, \
  GPU_MULTIPROC, GPU_NUMCORES, GPU_CLOCK_GHZ, GPU_CUDA_DRV, \
  GPU_CUDA_RUN, GPU_MULTIPROC, GPU_NUMCORES
{endcode}

(The above is from {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"}.)


{subsection:Automatic configuration}

You can write a program to write your
configuration file.  This is still using STARTD_ATTRS, but potentially scales
better for mixed pools. For an extended example, see {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"} in which he does exactly this.


{subsection:Dynamic advertising}

One step beyond automatic configuration is dynamic configuration.  Instead of a static or automated configuration, Condor itself can run your program and incorporate the information.
This is {link: http://www.cs.wisc.edu/condor/manual/v7.6/4_4Hooks.html#sec:daemon-classad-hooks Condor's "Daemon ClassAd Hooks" functionality},
previous known as HawkEye and Condor Cron.  This is the route taken by the {link: http://sourceforge.net/projects/condorgpu/ condorgpu project} (Note that the condorgpu project has no affiliation with Condor.  We have not tested or reviewed that code and cannot promise anything about it!)


{section: The Future}

The Condor team is working on various improvements in how Condor can manage GPUs.  If you have
TODO: condor-admin/condor-users, link to ticket when made public.

{section: Credits}

Several examples were drawn from {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"} sent to the {link: http://www.cs.wisc.edu/condor/mail-lists/ condor-users} mailing list on March 25th, 2011.