{section: Introduction} The =condor_annex= tool rents computational resources from Amazon's cloud service and adds them to an HTCondor pool for your jobs to use. These instructions document how to use =condor_annex= for CHTC jobs. Some restrictions apply: *: At the moment, you can only use =condor_annex= for jobs on =submit-4.chtc.wisc.edu=. *: You will need a log-in on =annex-cm.chtc.wisc.edu=. (Ask your research computing facilitator about this.) *: The jobs you want to run on AWS must have =MayUseAWS= set in their ads. *: The jobs you want to run on AWS must have =WantFlocking= set in their ads. This means your jobs will flock! If you don't know what that means, don't use =condor_annex= until you've talked to your research computing facilitator. *: Your jobs' requirements must allow them to run on Amazon's cloud. When =condor_annex= acquires resources from AWS, they will by default run an EL6-like operating system, but they won't have =OpSysMajorVer= set. *: Intentionally or otherwise, other users of =condor_annex= on the CHTC may run jobs they don't own, including yours, on their resources. We're working on a solution to this, but if that possibility worries you, don't use =condor_annex= for now. These restrictions will be covered in the following instructions, but if you haven't run a particular job with =condor_annex= before, you should probably add just a single resource and make sure you job actually works there before adding a useful number of resources. In these instructions, we've included sample output after the commands. Lines you should execute start with the dollar sign (=$=) character; do not include the dollar sign (=$=) when copying the line. {section: Overview} 1: Prepare your AWS account 1:: Obtaining an Access Key 1:: Running the Set-Up Command 1:: Checking the Set-Up 2: Submit a Test Job 3: Add Resources to the Pool 4: Run Jobs on Those Resources 5: Clean Up (optional) These instructions assume this is the first time you're using =condor_annex= on CHTC. You'll want to have two terminal windows open: one for running =condor_annex= commands (logged into =annex-cm.chtc.wisc.edu=) and another for submitting jobs (logged into =submit-4.chtc.wisc.edu=). {section: 1 Prepare your AWS account} The =condor_annex= tool includes a =-setup= command which will prepare your AWS account. If you're not sure if you've done the set-up before, it won't hurt to repeat it, but you may save some time if you follow section 1.3 ("Checking the Set-Up") first. If the set-up checks out OK, great; if not, return here and start at section 1.1 ("Obtaining an Access Key"). {subsection: 1.1 Obtaining an Access Key} In order to use AWS, =condor_annex= needs a pair of security tokens (like a user name and password). Like a user name, the "access key" is (more or less) public information; the corresponding "secret key" is like a password and must be kept a secret. To help keep both halves secret, =condor_annex= (and HTCondor) are never told these keys directly; instead, you tell HTCondor which file to look in to find each one. Log into =annex-cm.chtc.wisc.edu= and create those two files now; we'll tell you how to fill them in shortly. By convention, these files exist in your =~/.condor= directory, which is where =condor_annex -setup= will store the rest of the data it needs. {term} $ mkdir ~/.condor $ cd ~/.condor $ touch publicKeyFile privateKeyFile $ chmod 600 publicKeyFile privateKeyFile {endterm} The last command ensures that no user other than you can read or write to those files. (Like any other file on CHTC machines, these files will be readable by the CHTC administrative staff. If that bothers you, contact us for alternatives.) To donwload a new pair of security tokens for =condor_annex= to use, go to the {link: https://console.aws.amazon.com/iam/home?region=us-east-1#/users IAM console}; log in if you need to. The following instructions assume you are logged in as a user with the privilege to create new users. (The 'root' user for any account has this privilege; other accounts may as well.) 1: Click the "Add User" button. 1: Enter name in the *User name* box; "annex-user" is a fine choice. 1: Click the check box labelled "Programmatic access". 1: Click the button labelled "Next: Permissions". 1: Select "Attach existing policies directly". 1: Type "AdministratorAccess" in the box labelled "Filter". 1: Click the check box on the single line that will appear below (labelled "AdministratorAccess"). 1: Click the "Next: review" button (you may need to scroll down). 1: Click the "Create user" button. 1: From the line labelled "annex-user", copy the value in the column labelled "Access key ID" to =publicKeyFile=. 1: On the line labelled "annex-user", click the "Show" link in the column labelled "Secret access key"; copy the revealed value to =privateKeyFile=. 1: Hit the "Close" button. The 'annex-user' now has full privileges to your account. We're working on creating a CloudFormation template that will create a user with only the privileges =condor_annex= actually needs. {subsection: 1.2 Running the Set-Up Command} The following command will set-up your AWS account. It will create a number of persistent components, none of which will cost you anything to keep around. These components may take quite some time to create; =condor_annex= checks each for completion every ten seconds and prints an additional dot (past the first three) when it does so, to let you know that everything's still working. {term} $ condor_annex -setup Creating configuration bucket (this takes less than a minute)....... complete. Creating Lambda functions (this takes about a minute)........ complete. Creating instance profile (this takes about two minutes)................... complete. Creating security group (this takes less than a minute)..... complete. Setup successful. {endterm} {subsection: 1.3 Checking the Setup} You can verify at this point (or any later time) that the set-up procedure completed successfully by running the following command. {term} $ condor_annex -check-setup Checking for configuration bucket... OK. Checking for Lambda functions... OK. Checking for instance profile... OK. Checking for security group... OK. {endterm} {section: 2 Submit a Test Job} You haven't requested any resources yet, but if you submit a test job first, you'll spend less time and money waiting for it. Log into =submit-4.chtc.wisc.edu= and create the following submit file: {file: annex-test.submit} executable = /bin/sleep transfer_executable = false should_transfer_files = true universe = vanilla arguments = 600 log = sleep.log # You MUST include this when submitting from CHTC to let the annex see the job. +WantFlocking = TRUE # This is required, by default, to run a job in an annex. +MayUseAWS = TRUE # The first clause requires this job to run on EC2; that's what makes it # good as a test. The second clause prevents CHTC from setting a # requirement for OpSysMajorVer, allowing this job to run on any. requirements = regexp( ".*\.ec2\.internal", Machine ) && (TRUE || TARGET.OpSysMajorVer) queue 1 {endfile} Submit this file to the queue. It won't run until after you've completed the next step. {section: 3 Add Resources to the Pool} Entering the following on =annex-cm.chtc.wisc.edu= will add resources for your jobs to the pool. We call the set of resources you added an "annex". You have to supply a name for each annex you create; the example below uses 'MyFirstAnnex'. When you run =condor_annex=, it will print out what it's going to do, and then ask you if that's OK. You must type 'yes' (and hit enter) at the prompt to start an annex; if you do not, =condor_annex= will print out instructions about how to change whatever you may not like about what it said it was going to do, and then exit. {term} $ condor_annex -count 1 -annex-name MyFirstAnnex -idle 1 -duration 1 Will request 1 m4.large on-demand instance for 1 hours. Each instance will terminate after being idle for 1 hours. Is that OK? (Type 'yes' or 'no'): yes Starting annex... Annex started. Its identity with the cloud provider is 'TestAnnex0_f2923fd1-3cad-47f3-8e19-fff9988ddacf'. It will take about three minutes for the new machines to join the pool. {endterm} You won't need to know the annex's identity with the cloud provider unless something goes wrong. Before starting the annex, =condor_annex= will check to make sure that the instances will be able to contact CHTC. Contact your machine's administrator if =condor_annex= reports a problem with this step. Otherwise, wait a few minutes and run the following to make sure your annex has started up and joined the pool: {term} $ condor_annex status -annex TestAnnexOne Name OpSys Arch State Activity LoadAv slot1@ip-172-31-15-209.ec2.internal LINUX X86_64 Unclaimed Idle 0.000 slot2@ip-172-31-15-209.ec2.internal LINUX X86_64 Unclaimed Idle 0.000 Machines Owner Claimed Unclaimed Matched Preempting Drain X86_64/LINUX 2 0 0 2 0 0 0 Total 2 0 0 2 0 0 0 {endterm} An annex (by default) will only runs jobs which (a) you submitted and (b) have MayUseAWS set to true. You can confirm this by running the following command: {term} $ condor_annex -annex TestAnnexOne status -af:r START (MayUseAWS == true) && stringListMember(Owner,"tlmiller") (MayUseAWS == true) && stringListMember(Owner,"tlmiller") {endterm} There are {wiki: HowToUseCondorAnnexWithOnDemandInstancesEightSevenFour additional instructions} for general annex use. For now, we'll move on to actually running on your new resources. {section: 4 Run Jobs on Those Resources} It might take a while for =submit-4.chtc.wisc.edu= to try the other possibilities before giving the annex a chance to run the job. Run condor_q in the terminal logged in to that machine to keep track of the test job; it should eventually run. (Check its log if it's gone when you check; the job may have run and finished.) You can make use of the annex resources for your own jobs in two ways: by submitting new jobs and by editing existing ones. To submit new jobs, you can follow the example of the test job, above; you'll need the '+MayUseAWS = TRUE' line and the '+WantFlocking = TRUE' line. A reminder: this means these jobs will flock! For now, you shouldn't use =condor_annex= if you don't want your jobs flocking. (Your job, like the test job, can require that it be run on a machine whose name ends in '.ec2.internal', but that's not a secure solution.) You may also add 'regexp( ".*\.ec2\.internal", Machine )' to your requirements expression if you want to make sure a job doesn't run anywhere but on your annex. You will also need to something to your requirements to address the issue that annex resources don't presently advertise =OpSysMajorVer=. A requirements expression like the following should do the trick, where the '6' at the end is whatever version you actually need. By default, the resources =condor_annex= adds to your pool are EL6(-ish). {verbatim} requirements = (regexp( ".*\.ec2\.internal", Machine ) || IsCHTC) && (OpSysMajorVer isnt defined || OpSysMajorVer == 6) {endverbatime} You can also edit existing jobs by using =condor_q=. More on using this tool will be forthcoming. {section: 5 Cleaning Up (Optional)} The resources =condor_annex= rents for you from Amazon will, as we mentioned before, shut themselves down after the duration, or if they're idle for longer than the time-out. At that point, no more charges will accrue -- it costs you nothing to leave your account set-up to use =condor_annex=. If your jobs all finish early, you can run (on =annex-cm.chtc.wisc.edu=) =condor_annex -annex MyFirstAnnex off= to shut off all the resources you rented immediately.