Page History

Turn Off History

Introduction

The condor_annex tool rents computational resources from Amazon's cloud service and adds those resources to an HTCondor pool for your jobs to use. These instructions document how to use condor_annex for CHTC jobs. Some restrictions apply:

Working with these restrictions will be covered in the following instructions.

In these instructions, we've included sample output after the commands. Lines you should execute start with the dollar sign ($) character; do not include the dollar sign ($) when copying the line.

Overview

  1. Prepare to Add Resources for Your Jobs
    1. Grant Access to Your AWS Account
    2. Lay the Groundwork in AWS
    3. Check the Groundwork
  2. Submit a Test Job
  3. Add Resources for Your Jobs
  4. Run Jobs on Your Resources
  5. Clean Up (optional)

These instructions assume this is the first time you're using condor_annex on CHTC. You'll want to have two terminal windows open: one for running condor_annex commands (logged into annex-cm.chtc.wisc.edu) and another for submitting jobs (logged into submit-4.chtc.wisc.edu). If you've used condor_annex on CHTC before, skip ahead to section 1.3 ("Check the Set-Up").

1 Prepare to Add Resources for Your Jobs

Before you can add resources for your jobs, you must (a) give condor_annex access to your AWS account and (b) have it do some one-time set-up.

1.1 Grant Access to Your AWS Account

Like you, condor_annex needs an account to use AWS. You can grant condor_annex access to your account by acquiring a pair security "keys" that function like a user name and password. Like a user name, the "access key" is (more or less) public information; the corresponding "secret key" is like a password and must be kept a secret. To help keep both halves secret, you never tell condor_annex the keys themselves; instead, you put each key in its own protected file.

To create those two files, execute the following commands on annex-cm.chtc.wisc.edu:

$ mkdir ~/.condor
$ cd ~/.condor
$ touch publicKeyFile privateKeyFile
$ chmod 600 publicKeyFile privateKeyFile

The last command ensures that no user other than you can read or write to those files. (Like any other file on CHTC machines, these files will be readable by the CHTC administrative staff. If that bothers you, contact us for alternatives.)

To fill the files you just created, go to the IAM console; log in if you need to. The following instructions assume you are logged in as a user with the privilege to create new users. (The 'root' user for any account has this privilege; other accounts may as well.)

  1. Click the "Add User" button.
  2. Enter name in the User name box; "annex-user" is a fine choice.
  3. Click the check box labelled "Programmatic access".
  4. Click the button labelled "Next: Permissions".
  5. Select "Attach existing policies directly".
  6. Type "AdministratorAccess" in the box labelled "Filter".
  7. Click the check box on the single line that will appear below (labelled "AdministratorAccess").
  8. Click the "Next: review" button (you may need to scroll down).
  9. Click the "Create user" button.
  10. From the line labelled "annex-user", copy the value in the column labelled "Access key ID" to publicKeyFile.
  11. On the line labelled "annex-user", click the "Show" link in the column labelled "Secret access key"; copy the revealed value to privateKeyFile.
  12. Hit the "Close" button.

You have now granted condor_annex access to your AWS account.

1.2 Lay the Groundwork in AWS

It takes a few minutes for condor_annex to lay the groundwork it needs at AWS. Since this groundwork doesn't cost you anything to keep around, you can just create it once and forget about it. Run the following commands on annex-cm.chtc.wisc.edu; you should still have a terminal window logged in there from the previous step.

$ condor_annex -setup
Creating configuration bucket (this takes less than a minute)....... complete.
Creating Lambda functions (this takes about a minute)........ complete.
Creating instance profile (this takes about two minutes)................... complete.
Creating security group (this takes less than a minute)..... complete.
Setup successful.

1.3 Check the Groundwork

You can verify at this point (or any later time) that the groundwork was laid successfully by running the following command (also on annex-cm.chtc.wisc.edu).

$ condor_annex -check-setup
Checking for configuration bucket... OK.
Checking for Lambda functions... OK.
Checking for instance profile... OK.
Checking for security group... OK.

If you don't see four "OK"s, return to step 1.1 and try again. If you've done that once already, contact your research computing facilitator for assistance.

2 Submit a Test Job

It sounds a little strange, but if you submit a test job before you add resources for your jobs, you won't have to wait as long for it to start, which will save you both time and money. Use a second terminal window to log into submit-4.chtc.wisc.edu and create the following submit file:

annex-test.submit
executable              = /bin/sleep
transfer_executable     = false
should_transfer_files   = true
universe                = vanilla
arguments               = 600

log                     = sleep.log

# You MUST include this when submitting from CHTC to let the annex see the job.
+WantFlocking           = TRUE

# This is required, by default, to run a job in an annex.
+MayUseAWS              = TRUE

# The first clause requires this job to run on EC2; that's what makes it
# good as a test.  The second clause prevents CHTC from setting a
# requirement for OpSysMajorVer, allowing this job to run on any.
requirements            = regexp( ".*\.ec2\.internal", Machine ) && (TRUE || TARGET.OpSysMajorVer)

queue 1

Submit this file to the queue; it won't run until after you've completed the next step.

3 Add Resources for Your Jobs

Entering the following on annex-cm.chtc.wisc.edu will add resources for your jobs to the pool. We call the set of resources you added an "annex". You have to supply a name for each annex you create; the example below uses 'MyFirstAnnex'. When you run condor_annex, it will print out what it's going to do, and then ask you if that's OK. You must type 'yes' (and hit enter) at the prompt to start an annex; if you do not, condor_annex will print out instructions about how to change whatever you may not like about what it said it was going to do, and then exit. The following command adds one resource (an "instance") for one hour; you should increase that if the job you want to run takes longer. Don't increase the number of resources if you haven't tested your job with condor_annex yet; you can easily add resources after you've verified that everything should work.

$ condor_annex -count 1 -annex-name MyFirstAnnex -idle 1 -duration 1
Will request 1 m4.large on-demand instance for 1 hours.  Each instance will terminate after being idle for 1 hours.
Is that OK?  (Type 'yes' or 'no'): yes
Starting annex...
Annex started.  Its identity with the cloud provider is
'TestAnnex0_f2923fd1-3cad-47f3-8e19-fff9988ddacf'.  It will take about three minutes for the new machines to join the pool.

You won't need to know the annex's identity with the cloud provider unless something goes wrong.

Before starting the annex, condor_annex will check to make sure that the instances will be able to contact CHTC. Contact your machine's administrator if condor_annex reports a problem with this step.

Otherwise, wait a few minutes and run the following to make sure your annex has started up and joined the pool:

$ condor_annex status -annex TestAnnexOne
Name                                OpSys      Arch   State     Activity LoadAv

slot1@ip-172-31-15-209.ec2.internal LINUX      X86_64 Unclaimed Idle      0.000
slot2@ip-172-31-15-209.ec2.internal LINUX      X86_64 Unclaimed Idle      0.000

               Machines Owner Claimed Unclaimed Matched Preempting  Drain

  X86_64/LINUX        2     0       0         2       0          0      0

         Total        2     0       0         2       0          0      0

An annex (by default) will only runs jobs which (a) you submitted and (b) have MayUseAWS set to true. You can confirm this by running the following command:

$ condor_annex -annex TestAnnexOne status -af:r START
(MayUseAWS == true) && stringListMember(Owner,"tlmiller")
(MayUseAWS == true) && stringListMember(Owner,"tlmiller")

The tlmiller above should be your login.

There are additional instructions for general annex use. For now, we'll move on to actually running on your new resource.

4 Run Jobs on Your Resources

It might take a while for submit-4.chtc.wisc.edu to try the other possibilities before giving the annex a chance to run the job. Run condor_q in the terminal logged in to that machine to keep track of the test job; it should eventually run. (Check its log if it's gone when you check; the job may have run and finished.)

You can make use of the annex resources for your own jobs in two ways: by submitting new jobs and by editing existing ones.

To submit new jobs, you can follow the example of the test job, above; you'll need the '+MayUseAWS = TRUE' line and the '+WantFlocking = TRUE' line. A reminder: this means these jobs will flock! For now, you shouldn't use condor_annex if you don't want your jobs flocking. (Your job, like the test job, can require that it be run on a machine whose name ends in '.ec2.internal', but that's not a secure solution.) You may also add 'regexp( ".*\.ec2\.internal", Machine )' to your requirements expression if you want to make sure a job doesn't run anywhere but on your annex. You will also need to something to your requirements to address the issue that annex resources don't presently advertise OpSysMajorVer. A requirements expression like the following should do the trick. (Remember, the default resources rented by condor_annex run an EL6-like OS.)

requirements = (regexp( ".*\.ec2\.internal", Machine ) || IsCHTC) && (OpSysMajorVer isnt defined || OpSysMajorVer == 6)

If you prefer, you can restrict by the name of the annex you requested (but be aware that any annex user can assign any name to their annex, including one that you're already using):

requirements = (AnnexName =?= "MyFirstAnnex" || IsCHTC) && (OpSysMajorVer isnt defined || OpSysMajorVer == 6)

You can also edit existing jobs by using condor_q. More on using this tool will be forthcoming.

Once you've gotten one job running, you may want to add additional resources to your annex. While repeating the command from section 3 will add another instance to the "MyFirstAnnex" annex, for simplicity we recommend using another name. (Using an existing name updates the lease for all instances in the annex, existing and new; but only new instances will respect the new max idle time. If you've used the name of your annex in your job requirements, this might be worth the trouble.)

5 Cleaning Up (Optional)

The resources condor_annex rents for you from Amazon will, as we mentioned before, shut themselves down after the duration, or if they're idle for longer than the time-out. At that point, no more charges will accrue -- it costs you nothing to leave your account set-up to use condor_annex.

If your jobs all finish early, you can run (on annex-cm.chtc.wisc.edu) condor_annex -annex MyFirstAnnex off to shut off all the resources you rented immediately.