Page History

Turn Off History

* WORK IN PROGRESS *

Condor can help manage GPUs (graphics processing units) in your pool of execute nodes, making them available to jobs that can use them using an API like OpenCL or CUDA.

Condor matches execute nodes (described by ClassAds) to jobs (also described by ClassAds). The general technique to manage GPUs is:

1. Advertise the GPU: Configure Condor so that execute nodes include information about available GPUs in their ClassAd. 2. Require a GPU: Jobs modify their Requirements to require a suitable GPU 3. Identify the GPU: Jobs modify their arguments or environment to learn which GPU it may use.

Advertising the GPU

You have several options for advertising your GPUs. In increasing order of complexity they are:

  1. Static configuration
  2. Automatic configuration
  3. Dynamic configuration

This progression may be a useful way to do initial setup and testing. Start with a static configuration to ensure everything works. Move to an automatic configuration to develop and test partial automation. Finally a few small changes should make it possible to turn your automatic configuration into a dynamic configuration.

Static configuration

If you have a small number of nodes, or perhaps a large number of identical nodes, you can add static attributes manually using STARTD_ATTRS. In the simplest case, it might just be:

HAS_GPU=TRUE
STARTD_ATTRS=HAS_GPU

Or you could give lots of information about the GPUs, and even have slot-specific information. For an extended example, see Carsten's post.

2. You can write a program to automatically write your configuration file. This is still using STARTD_ATTRS, but scales better. This is actually how Carsten's configuration works; he has some example code at the above link.

3. You can have Condor automatically run a program you provide to learn about the GPUs. This is the "Daemon ClassAd Hooks", previous known as HawkEye and Condor Cron. http://www.cs.wisc.edu/condor/manual/v7.6/4_4Hooks.html#sec:daemon-classad-hooks This is the route taken by the condorgpu project you found. Converting Carsten's scripts to work this way would be pretty easy.

Credits

Several examples were drawn from Carsten Aulbert's post "RFC: Adding GPUs into Condor" sent to the condor-users mailing list on March 25th, 2011.