Assumptions

We assume you already have an AWS account.

Installation and Configuration

Martin Kandes wrote excellent instructions on installing and configuring condor_annex: http://www.t2.ucsd.edu/twiki2/bin/view/UCSDTier2/Condor_annex.

First-time users should:

Everyone should then follow the instructions below ("set the default launch configurations").

Set the default launch configurations.

  1. Go to your list of Auto Scaling group launch configurations: https://console.aws.amazon.com/ec2/autoscaling/home?region=us-east-1#LaunchConfigurations:
  2. If you don't have any Auto Scaling groups, you'll have to (temporarily) create one in order to create a Launch Configuration; click the Create Auto Scaling group button.
  3. Click the Create launch configuration button in the lower-right.
  4. Select "Community AMIs" from the tabs on the left. Enter "FIXME" and hit the Select button to choose the default AMI.
  5. Select an instance type from the list. If you just created a new account, the "t2.micro" type may be free. Otherwise, "m3.large" is a nice, simple choice.
  6. Click the Configure details button.
  7. For HTCondor to find this launch configuration, it must be named "HTCondorAnnex1".
  8. If you're not able to use AWS' free tier, the cheapest way to experiment is the Spot instances; see FIXME for details. The simplest way (because you instances won't vanish unexpectedly) is without Spot instances.
  9. Click the the 5. Configure Security Group link at the top.
  10. Click "Select an existing security group" and make sure the default security group (or the security you created) is selected.
  11. Hit the Review button.
  12. Hit the Create launch configuration button.
  13. Select the EC2 keypair you created in step 5 of Martin Kandes' instructions.
  14. Click the acknowledgement and the Create launch configuration button.
  15. Now hit the "Cancel" link at the bottom -- condor_annex will create the autoscaling group(s) you need for you.

You only need one launch configuration to get started, but if you want to another instance types later, just change the name to "HTCondorAnnex2". condor_annex will recognize and use up to "HTCondorAnnex8".

Start an Annex

The minimal set of options to start an annex follows:

$ condor_annex --project-id MyFirstAnnex --instances 3 --expiry "2017-01-20 17:18:19"

[The collector address and the password file path are both extracted from the command environment's HTCondor configuration. The password file is uploaded to a private S3 bucket managed by condor_annex; the location is passed into the custom AMI via the usual instance contextualization methods.]

This command will return after HTCondor has set up the lease and requested that Amazon start 3 instances of the type(s) you specified.

[The tool allows you to create a set of default launch configurations, so you don't have to type them in every time, but specifying an example type on the command-line is a much faster way to get started.]

It will take a few minutes for the annex's slots to show up, but they will be assigned in the next negotiation cycle (which may also take a few minutes), and your jobs will start running shortly after that.

Monitor your Annex

The following command line will print out the usual condor_status information for the annex you specify:

$ condor_status -annex MyFirstAnnex
ip-172-31-48-84.ec2.internal  LINUX      X86_64 Claimed   Busy      0.640 3767
ip-172-31-54-121.ec2.internal LINUX      X86_64 Claimed   Busy      0.880 3767
ip-172-31-56-45.ec2.internal  LINUX      X86_64 Claimed   Busy      0.600 3767

              Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain
 X86_64/LINUX    11     0      11         0       0          0        0      0
        Total    11     0      11         0       0          0        0      0

[This is entirely equivalent to condor_status -const 'AnnexName =?= MyFirstAnnex' so it should be easy to implement.]
In this case, all three of the slots you requested have already started to run jobs.

Stop an Annex

If you're already familiar with the condor_off command, you can use it to turn off HTCondor on the annex nodes; the default image is configured so that this will also shut down the machine. To shut down each machine in an annex, use the following command-line:

$ condor_off -annex MyFirstAnnex

[This is entirely equivalent to condor_off -const 'AnnexName =?= MyFirstAnnex' so it should be easy to implement.]