* WORK IN PROGRESS * Condor can help manage GPUs (graphics processing units) in your pool of execute nodes, making them available to jobs that can use them using an API like {link:http://www.khronos.org/opencl/ OpenCL} or {link:http://www.nvidia.com/object/cuda_home_new.html CUDA}. Condor matches execute nodes (described by ClassAds) to jobs (also described by ClassAds). The general technique to manage GPUs is: 1. Advertise the GPU: Configure Condor so that execute nodes include information about available GPUs in their ClassAd. 2. Require a GPU: Jobs modify their Requirements to require a suitable GPU 3. Identify the GPU: Jobs modify their arguments or environment to learn which GPU it may use. {section: Advertising the GPU} A key challenge of advertising GPUs is that a GPU can only be used by one job at a time. If an execute node has multiple slots (a likely case!), you'll want to limit each GPU to only being advertised to a single slot. You have several options for advertising your GPUs. In increasing order of complexity they are: 1: Static configuration 2: Automatic configuration 3: Dynamic advertising This progression may be a useful way to do initial setup and testing. Start with a static configuration to ensure everything works. Move to an automatic configuration to develop and test partial automation. Finally a few small changes should make it possible to turn your automatic configuration into dynamic advertising. {subsection: Static configuration} If you have a small number of nodes, or perhaps a large number of identical nodes, you can add static attributes manually using {link:http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#16198 STARTD_ATTRS} on a {link:http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#37111 per slot basis}. In the simplest case, it might just be: {code} SLOT1_HAS_GPU=TRUE SLOT1_GPU_DEV=0 STARTD_ATTRS=HAS_GPU,GPU_DEV {endcode} This limits the GPU to only being advertised by the first slot. A job can use HAS_GPU to identify available slots with GPUs. The job can use GPU_DEV to identify which GPU device to use. (A job could use the presence of GPU_DEV to identify slots with GPUs instead of HAS_GPU, but "=HAS_CPU=" is a bit easier to read than "=(GPU_DEV=!=UNDEFINED)=" If you have two GPUs, you might give the first two slots a GPU each. {code} SLOT1_HAS_GPU=TRUE SLOT1_GPU_DEV=0 SLOT2_HAS_GPU=TRUE SLOT2_GPU_DEV=0 STARTD_ATTRS=HAS_GPU,GPU_DEV {endcode} You can also provide more information about your GPUs so that a job can distinguish between different GPUs: {code} SLOT1_GPU_CUDA_DRV=3.20 SLOT1_GPU_CUDA_RUN=3.20 SLOT1_GPU_DEV=0 SLOT1_GPU_NAME="Tesla C2050" SLOT1_GPU_CAPABILITY=2.0 SLOT1_GPU_GLOBALMEM_MB=2687 SLOT1_GPU_MULTIPROC=14 SLOT1_GPU_NUMCORES=32 SLOT1_GPU_CLOCK_GHZ=1.15 STARTD_ATTRS = GPU_DEV, GPU_NAME, GPU_CAPABILITY, GPU_GLOBALMEM_MB, \ GPU_MULTIPROC, GPU_NUMCORES, GPU_CLOCK_GHZ, GPU_CUDA_DRV, \ GPU_CUDA_RUN, GPU_MULTIPROC, GPU_NUMCORES {endcode} (The above is from {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"}.) {subsection:Automatic configuration} You can write a program to write your configuration file. This is still using STARTD_ATTRS, but potentially scales better for mixed pools. For an extended example, see {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"} in which he does exactly this. {subsection:Dynamic advertising} One step beyond automatic configuration is dynamic configuration. Instead of a static or automated configuration, Condor itself can run your program and incorporate the information. This is {link: http://www.cs.wisc.edu/condor/manual/v7.6/4_4Hooks.html#sec:daemon-classad-hooks Condor's "Daemon ClassAd Hooks" functionality}, previous known as HawkEye and Condor Cron. This is the route taken by the {link: http://sourceforge.net/projects/condorgpu/ condorgpu project} (Note that the condorgpu project has no affiliation with Condor. We have not tested or reviewed that code and cannot promise anything about it!) {section: The Future} The Condor team is working on various improvements in how Condor can manage GPUs. If you have TODO: condor-admin/condor-users, link to ticket when made public. {section: Credits} Several examples were drawn from {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"} sent to the {link: http://www.cs.wisc.edu/condor/mail-lists/ condor-users} mailing list on March 25th, 2011.