{section: Advertise the GPU} -HTCondor contains a tool designed to assist in detecting GPUs and configuring HTCondor to advertise GPU information. This tool is +The HTCondor =condor_gpu_discovery= tool is designed to assist in detecting GPUs and in configuring HTCondor to advertise GPU information. +This tool detects CUDA and OpenCL devices, and outputs a list of GPU identifiers for all detected devices. - condor_gpu_discovery - -This tool will detect CUDA and OpenCL devices and output a list of GPU identifiers for all detected devices. - -HTCondor has a general mechanism for declaring user-defined slot resources. We will use this mechanism to define a resource called 'GPUs'. The resource type name 'GPUs' is case insensitive, but you must be consistent about the plural. HTCondor considers 'GPU' to be a different resource type than 'GPUs'. We recommend the use of 'GPUs' for GPU custom resources. +HTCondor has a general mechanism for declaring user-defined slot resources. GPUs are a user-defined slot resource, so use this mechanism to define a resource called 'GPUs'. This resource type name, 'GPUs,' is case insensitive, but you must be consistent about the plural. HTCondor considers 'GPU' to be a different resource type than 'GPUs'. We recommend the use of 'GPUs' for GPU custom resources. To define 'GPUs' as a custom resource simply add the following statements to the configuration on your execute node. @@ -25,7 +22,7 @@ ENVIRONMENT_FOR_AssignedGPUs = CUDA_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL -The first line tells HTCondor to run the condor_gpu_discovery tool, and use it's output to define a custom resource called 'GPUs'. +The first line tells HTCondor to run the condor_gpu_discovery tool, and use its output to define a custom resource called 'GPUs'. The second line tells HTCondor to publish the AssignedGPUs for a slot in the job's environment using the environment variables =CUDA_VISIBLE_DEVICES= and =GPU_DEVICE_ORDINAL=. If you know for certain that your devices will be CUDA then you can omit =GPU_DEVICE_ORDINAL= in the configuration above. If you know for certain that your devices are OpenCL only, then you can omit =CUDA_VISIBLE_DEVICES=. In addition =AssignedGPUs= will always be published into the job's environment as =_CONDOR_AssignedGPUs=, so the second line above is not strictly necessary, but it is recommended.