{section: Introduction} -The =condor_annex= tool rents computational resources from Amazon's cloud service and adds them to an HTCondor pool for your jobs to use. These instructions document how to use =condor_annex= for CHTC jobs. Some restrictions apply: +The =condor_annex= tool rents computational resources from Amazon's cloud service and adds those resources to an HTCondor pool for your jobs to use. These instructions document how to use =condor_annex= for CHTC jobs. Some restrictions apply: *: At the moment, you can only use =condor_annex= for jobs on =submit-4.chtc.wisc.edu=. *: You will need a log-in on =annex-cm.chtc.wisc.edu=. (Ask your research computing facilitator about this.) -*: The jobs you want to run on AWS must have =MayUseAWS= set in their ads. -*: The jobs you want to run on AWS must have =WantFlocking= set in their ads. This means your jobs will flock! If you don't know what that means, don't use =condor_annex= until you've talked to your research computing facilitator. -*: Your jobs' requirements must allow them to run on Amazon's cloud. When =condor_annex= acquires resources from AWS, they will by default run an EL6-like operating system, but they won't have =OpSysMajorVer= set. +*: The jobs you want to run on AWS must have =MayUseAWS= set. +*: The jobs you want to run on AWS must have =WantFlocking= set. This means your jobs will flock! If you don't know what that means, don't use =condor_annex= until you've talked to your research computing facilitator. +*: The jobs you want to run on AWS must have =requirements= which match the resources acquired by =condor_annex=. By default, those resources will run an EL6-like operating system, but they won't have =OpSysMajorVer= set. *: Intentionally or otherwise, other users of =condor_annex= on the CHTC may run jobs they don't own, including yours, on their resources. We're working on a solution to this, but if that possibility worries you, don't use =condor_annex= for now. Working with these restrictions will be covered in the following instructions. @@ -15,26 +15,26 @@ {section: Overview} -1: Prepare your AWS account -1:: Obtaining an Access Key -1:: Running the Set-Up Command -1:: Checking the Set-Up +1: Prepare to Add Resources for Your Jobs +1:: Grant Access to Your AWS Account +1:: Lay the Groundwork in AWS +1:: Check the Groundwork 2: Submit a Test Job -3: Add Resources to the Pool -4: Run Jobs on Those Resources +3: Add Resources for Your Jobs +4: Run Jobs on Your Resources 5: Clean Up (optional) -These instructions assume this is the first time you're using =condor_annex= on CHTC. You'll want to have two terminal windows open: one for running =condor_annex= commands (logged into =annex-cm.chtc.wisc.edu=) and another for submitting jobs (logged into =submit-4.chtc.wisc.edu=). +These instructions assume this is the first time you're using =condor_annex= on CHTC. You'll want to have two terminal windows open: one for running =condor_annex= commands (logged into =annex-cm.chtc.wisc.edu=) and another for submitting jobs (logged into =submit-4.chtc.wisc.edu=). If you've used =condor_annex= on CHTC before, skip ahead to section 1.3 ("Check the Set-Up"). -{section: 1 Prepare your AWS account} +{section: 1 Prepare to Add Resources for Your Jobs} -The =condor_annex= tool includes a =-setup= command which will prepare your AWS account. If you're not sure if you've done the set-up before, it won't hurt to repeat it, but you may save some time if you follow section 1.3 ("Checking the Set-Up") first. If the set-up checks out OK, great; if not, return here and start at section 1.1 ("Obtaining an Access Key"). +Before you can add resources for your jobs, you must (a) give =condor_annex= access to your AWS account and (b) have it do some one-time set-up. -{subsection: 1.1 Obtaining an Access Key} +{subsection: 1.1 Grant Access to Your AWS Account} -In order to use AWS, =condor_annex= needs a pair of security tokens (like a user name and password). Like a user name, the "access key" is (more or less) public information; the corresponding "secret key" is like a password and must be kept a secret. To help keep both halves secret, =condor_annex= (and HTCondor) are never told these keys directly; instead, you tell HTCondor which file to look in to find each one. +Like you, =condor_annex= needs an account to use AWS. You can grant =condor_annex= access to your account by acquiring a pair security "keys" that function like a user name and password. Like a user name, the "access key" is (more or less) public information; the corresponding "secret key" is like a password and must be kept a secret. To help keep both halves secret, you never tell =condor_annex= the keys themselves; instead, you put each key in its own protected file. -Log into =annex-cm.chtc.wisc.edu= and create those two files now; we'll tell you how to fill them in shortly. By convention, these files exist in your =~/.condor= directory, which is where =condor_annex -setup= will store the rest of the data it needs. +To create those two files, execute the following commands on =annex-cm.chtc.wisc.edu=: {term} $ mkdir ~/.condor @@ -45,7 +45,7 @@ The last command ensures that no user other than you can read or write to those files. (Like any other file on CHTC machines, these files will be readable by the CHTC administrative staff. If that bothers you, contact us for alternatives.) -To donwload a new pair of security tokens for =condor_annex= to use, go to the {link: https://console.aws.amazon.com/iam/home?region=us-east-1#/users IAM console}; log in if you need to. The following instructions assume you are logged in as a user with the privilege to create new users. (The 'root' user for any account has this privilege; other accounts may as well.) +To fill the files you just created, go to the {link: https://console.aws.amazon.com/iam/home?region=us-east-1#/users IAM console}; log in if you need to. The following instructions assume you are logged in as a user with the privilege to create new users. (The 'root' user for any account has this privilege; other accounts may as well.) 1: Click the "Add User" button. 1: Enter name in the *User name* box; "annex-user" is a fine choice. @@ -60,11 +60,11 @@ 1: On the line labelled "annex-user", click the "Show" link in the column labelled "Secret access key"; copy the revealed value to =privateKeyFile=. 1: Hit the "Close" button. -The 'annex-user' now has full privileges to your account. We're working on creating a CloudFormation template that will create a user with only the privileges =condor_annex= actually needs. +You have now granted =condor_annex= access to your AWS account. -{subsection: 1.2 Running the Set-Up Command} +{subsection: 1.2 Lay the Groundwork in AWS} -The following command will set-up your AWS account. It will create a number of persistent components, none of which will cost you anything to keep around. These components may take quite some time to create; =condor_annex= checks each for completion every ten seconds and prints an additional dot (past the first three) when it does so, to let you know that everything's still working. +It takes a few minutes for =condor_annex= to lay the groundwork it needs at AWS. Since this groundwork doesn't cost you anything to keep around, you can just create it once and forget about it. Run the following commands on =annex-cm.chtc.wisc.edu=; you should still have a terminal window logged in there from the previous step. {term} $ condor_annex -setup @@ -75,9 +75,9 @@ Setup successful. {endterm} -{subsection: 1.3 Checking the Setup} +{subsection: 1.3 Check the Groundwork} -You can verify at this point (or any later time) that the set-up procedure completed successfully by running the following command. +You can verify at this point (or any later time) that the groundwork was laid successfully by running the following command (also on =annex-cm.chtc.wisc.edu=). {term} $ condor_annex -check-setup @@ -87,9 +87,11 @@ Checking for security group... OK. {endterm} +If you don't see four "OK"s, return to step 1.1 and try again. If you've done that once already, contact your research computing facilitator for assistance. + {section: 2 Submit a Test Job} -You haven't requested any resources yet, but if you submit a test job first, you'll spend less time and money waiting for it. Log into =submit-4.chtc.wisc.edu= and create the following submit file: +It sounds a little strange, but if you submit a test job _before_ you add resources for your jobs, you won't have to wait as long for it to start, which will save you both time and money. Use a second terminal window to log into =submit-4.chtc.wisc.edu= and create the following submit file: {file: annex-test.submit} executable = /bin/sleep @@ -114,9 +116,9 @@ queue 1 {endfile} -Submit this file to the queue. It won't run until after you've completed the next step. +Submit this file to the queue; it won't run until after you've completed the next step. -{section: 3 Add Resources to the Pool} +{section: 3 Add Resources for Your Jobs} Entering the following on =annex-cm.chtc.wisc.edu= will add resources for your jobs to the pool. We call the set of resources you added an "annex". You have to supply a name for each annex you create; the example below uses 'MyFirstAnnex'. When you run =condor_annex=, it will print out what it's going to do, and then ask you if that's OK. You must type 'yes' (and hit enter) at the prompt to start an annex; if you do not, =condor_annex= will print out instructions about how to change whatever you may not like about what it said it was going to do, and then exit. The following command adds one resource (an "instance") for one hour; you should increase that if the job you want to run takes longer. Don't increase the number of resources if you haven't tested your job with =condor_annex= yet; you can easily add resources after you've verified that everything should work. @@ -161,7 +163,7 @@ There are {wiki: HowToUseCondorAnnexWithOnDemandInstancesEightSevenFour additional instructions} for general annex use. For now, we'll move on to actually running on your new resource. -{section: 4 Run Jobs on Those Resources} +{section: 4 Run Jobs on Your Resources} It might take a while for =submit-4.chtc.wisc.edu= to try the other possibilities before giving the annex a chance to run the job. Run condor_q in the terminal logged in to that machine to keep track of the test job; it should eventually run. (Check its log if it's gone when you check; the job may have run and finished.)