{section: Advertising the GPU} +A key challenge of advertising GPUs is that a GPU can only be used by one job at a time. If an execute node has multiple slots (a likely case!), you'll want to limit each GPU to only being advertised to a single slot. + You have several options for advertising your GPUs. In increasing order of complexity they are: 1: Static configuration 2: Automatic configuration -3: Dynamic configuration +3: Dynamic advertising -This progression may be a useful way to do initial setup and testing. Start with a static configuration to ensure everything works. Move to an automatic configuration to develop and test partial automation. Finally a few small changes should make it possible to turn your automatic configuration into a dynamic configuration. +This progression may be a useful way to do initial setup and testing. Start with a static configuration to ensure everything works. Move to an automatic configuration to develop and test partial automation. Finally a few small changes should make it possible to turn your automatic configuration into dynamic advertising. {subsection: Static configuration} If you have a small number of nodes, or perhaps a large number of identical nodes, you can add static attributes manually using -{link:http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#16198 STARTD_ATTRS}. In the simplest case, it might just be: +{link:http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#16198 STARTD_ATTRS} on a {link:http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#37111 per slot basis}. In the simplest case, it might just be: + +{code} +SLOT1_HAS_GPU=TRUE +SLOT1_GPU_DEV=0 +STARTD_ATTRS=HAS_GPU,GPU_DEV +{endcode} + +This limits the GPU to only being advertised by the first slot. A job can use HAS_GPU to identify available slots with GPUs. The job can use GPU_DEV to identify which GPU device to use. (A job could use the presence of GPU_DEV to identify slots with GPUs instead of HAS_GPU, but "=HAS_CPU=" is a bit easier to read than "=(GPU_DEV=!=UNDEFINED)=" + +If you have two GPUs, you might give the first two slots a GPU each. +{code} +SLOT1_HAS_GPU=TRUE +SLOT1_GPU_DEV=0 +SLOT2_HAS_GPU=TRUE +SLOT2_GPU_DEV=0 +STARTD_ATTRS=HAS_GPU,GPU_DEV +{endcode} + +You can also provide more information about your GPUs so that a job can distinguish between different GPUs: {code} -HAS_GPU=TRUE -STARTD_ATTRS=HAS_GPU +SLOT1_GPU_CUDA_DRV=3.20 +SLOT1_GPU_CUDA_RUN=3.20 +SLOT1_GPU_DEV=0 +SLOT1_GPU_NAME="Tesla C2050" +SLOT1_GPU_CAPABILITY=2.0 +SLOT1_GPU_GLOBALMEM_MB=2687 +SLOT1_GPU_MULTIPROC=14 +SLOT1_GPU_NUMCORES=32 +SLOT1_GPU_CLOCK_GHZ=1.15 +STARTD_ATTRS = GPU_DEV, GPU_NAME, GPU_CAPABILITY, GPU_GLOBALMEM_MB, \ + GPU_MULTIPROC, GPU_NUMCORES, GPU_CLOCK_GHZ, GPU_CUDA_DRV, \ + GPU_CUDA_RUN, GPU_MULTIPROC, GPU_NUMCORES {endcode} -Or you could give lots of information about the GPUs, and even -have slot-specific information. For an extended example, see -Carsten's post. - - -2. You can write a program to automatically write your -configuration file. This is still using STARTD_ATTRS, but scales -better. This is actually how Carsten's configuration works; he -has some example code at the above link. - - -3. You can have Condor automatically run a program you provide to -learn about the GPUs. This is the "Daemon ClassAd Hooks", -previous known as HawkEye and Condor Cron. -http://www.cs.wisc.edu/condor/manual/v7.6/4_4Hooks.html#sec:daemon-classad-hooks -This is the route taken by the condorgpu project you found. -Converting Carsten's scripts to work this way would be pretty -easy. +(The above is from {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"}.) + + +{subsection:Automatic configuration} + +You can write a program to write your +configuration file. This is still using STARTD_ATTRS, but potentially scales +better for mixed pools. For an extended example, see {link: https://lists.cs.wisc.edu/archive/condor-users/2011-March/msg00121.shtml Carsten Aulbert's post "RFC: Adding GPUs into Condor"} in which he does exactly this. + + +{subsection:Dynamic advertising} + +One step beyond automatic configuration is dynamic configuration. Instead of a static or automated configuration, Condor itself can run your program and incorporate the information. +This is {link: http://www.cs.wisc.edu/condor/manual/v7.6/4_4Hooks.html#sec:daemon-classad-hooks Condor's "Daemon ClassAd Hooks" functionality}, +previous known as HawkEye and Condor Cron. This is the route taken by the {link: http://sourceforge.net/projects/condorgpu/ condorgpu project} (Note that the condorgpu project has no affiliation with Condor. We have not tested or reviewed that code and cannot promise anything about it!) + + +{section: The Future} +The Condor team is working on various improvements in how Condor can manage GPUs. If you have +TODO: condor-admin/condor-users, link to ticket when made public. {section: Credits}