[Wisdom dates from 2021-06-25]

We have access to a(t least one) A100 in =coba2000.chtc.wisc.edu=, although that machine is frequently busy.

The machine =gpulab2004.chtc.wisc.edu= has four A100s.

----

For now, to get an A100 from AWS, you have to rent eight of them with the =p4d.24xlarge= instance type.  Use the "Deep Learning Base AMI (Amazon Linux 2)" to avoid having to deal with drivers and belike.

{term}
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install https://research.cs.wisc.edu/htcondor/repo/current/htcondor-release-current.amzn2.noarch.rpm
sudo yum-builddep condor
{endterm}

Then set up your HTCondor build tree in the usual way.  (Don't forget the =-j BIGNUM=; this instance type has a _lot_ of cores.)

----

NVidia has a {link: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html MIG user guide}.  Of particular note:

*: =sudo nvidia-smi -i INDEX -mig 1=  enables MIG but does not create any GPU instances.  Doing this step but not the next one is the (mis)configuration of relevance to {link: https://opensciencegrid.atlassian.net/browse/HTCONDOR-476 HTCONDOR-476}.
*: =sudo nvidia-smi mig -i 1 -cgi 19,19,19,19,19,19,19 -C=  creates a 7-way split of the A100.