[Wisdom dates from 2021-06-25] We have access to a(t least one) A100 in =coba2000.chtc.wisc.edu=, although that machine is frequently busy. The machine =gpulab2004.chtc.wisc.edu= has four A100s. ---- For now, to get an A100 from AWS, you have to rent eight of them with the =p4d.24xlarge= instance type. Use the "Deep Learning Base AMI (Amazon Linux 2)" to avoid having to deal with drivers and belike. {term} sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm sudo yum install https://research.cs.wisc.edu/htcondor/repo/current/htcondor-release-current.amzn2.noarch.rpm sudo yum-builddep condor {endterm} Then set up your HTCondor build tree in the usual way. (Don't forget the =-j BIGNUM=; this instance type has a _lot_ of cores.) ---- NVidia has a {link: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html MIG user guide}. Of particular note: *: =sudo nvidia-smi -i INDEX -mig 1= enables MIG but does not create any GPU instances. Doing this step but not the next one is the (mis)configuration of relevance to {link: https://opensciencegrid.atlassian.net/browse/HTCONDOR-476 HTCONDOR-476}. *: =sudo nvidia-smi mig -i 1 -cgi 19,19,19,19,19,19,19 -C= creates a 7-way split of the A100.