One approach to using cloud resources to run HTCondor jobs is to use the =condor_annex= tool to expand an existing pool onto the cloud (see HowToUseCondorAnnexWithOnDemandInstances). Another approach, documented here, is to create a new HTCondor pool entirely in the cloud. The HTCondor team maintains an AWS Marketplace entry to help simplify the process[, but FIXME target-audience disclaimer]. These instructions assume you already have an AWS account and a key pair. {section: Overview} The general approach will be to use the Marketplace entry to start a _head node_, which will be the brains of the new HTCondor pool, as well as be where you'll log in to and submit jobs from. Once the head node is up and running, you'll use =condor_annex= to add cloud resources to your new pool. Then you can start running jobs, and when they're done, shut everything down. 1: Start a Head Node 2: Add Cloud Resources to Your New Pool 2:: Log into your Head Node 2:: Obtain an Access Key 2:: Prepare Account 2:: Add Cloud Resources 3: Run Jobs 4: Clean Up 4:: The Cloud Resources 4:: The Head Node {section: 1 Start a Head Node} 1: Open HTCondor's {link: https://aws.amazon.com/marketplace/pp/B073WHVRPR Marketplace entry} in another tab. 1: Click the orange 'Continue' button to the right. You may need to log in to AWS at this point. 1: This is a busy page, but there's only one thing you may have to change on it: the "key pair" setting, which is all the way down at the bottom. Change the selected key pair, if necessary, to be one whose private half you have. 1: This step is where you will start spending money. Scroll back up; there will be section to the right titled "Price for your Selections." That's what Amazon will charge starting when you click the orange 'Launch with one-click' button. We'll remind you in these instructions, but you'll have to stop or terminate the instance your about to start on your own. 1: On the next page, click on the "Your Software" link in the green box. {section: 2 Add Cloud Resources to Your New Pool} Your head node (by default) starts with two CPUs and 8 GiB of RAM. Immediately after you log in -- see the next section -- you can start submitting and running jobs (see section 3). However, with only two CPUs, you'll only be able to run two jobs at a time. To add cloud resources to your pool, you'll use =condor_annex=. To do that, you'll need to obtain an access key for =condor_annex=, so it can act interact with AWS on your behalf. You'll only need to do that once. Likewise, =condor_annex= has to do some one-time set-up for each account. Once that's done, you can run =condor_annex= to add cloud resources as often as you like. {subsection: 2.1 Log into your Head Node} 1: Find the HTCondor entry and click the 'View Instances' button. 1: There should only be one instance; click on the "Manage in AWS Console" link. This will bring up the EC2 console with your head node selected. 1: Right-click on the selected instance and select 'Connect'. Follow the instructions, except replace =root@= with =ec2-user@=. [FIXME] {subsection: 2.2 Obtain an Access Key} Just being able to log into an EC2 instance doesn't give you the privilege to start additional EC2 instances. In order to use add cloud resources to your new pool, HTCondor needs a pair of security tokens (like a user name and password). Like a user name, the "access key" is (more or less) public information; the corresponding "secret key" is like a password and must be kept a secret. To help keep both halves secret, you never tell HTCondor these keys directly; instead, you tell HTCondor which file to look in to find each one. Create those two files now; we'll tell you how to fill them in shortly. By convention, these files exist in your =~/.condor= directory, which is where =condor_annex -setup= will store the rest of the data it needs. {term} $ cd ~/.condor $ touch publicKeyFile privateKeyFile $ chmod 600 publicKeyFile privateKeyFile {endterm} The last command ensures that only you can read or write to those files. If you saved your security tokens from the last time you used =condor_annex=, copy them into the files you just created and skip to section 2.3. To download a new pair of security tokens for =condor_annex= to use, go to the {link: https://console.aws.amazon.com/iam/home#users IAM console}; log in if you need to. {subsubsection: 2.2.1 Privileged (Administrator) Accounts} The following instructions assume you are logged in as a user with the privilege to create new users. (The 'root' user for any account has this privilege; other accounts may as well.) If your account has more limited privileges, skip to section 2.2.2. 1: Click the "Add User" button. 1: Enter name in the *User name* box; "annex-user" is a fine choice. 1: Click the check box labelled "Programmatic access". 1: Click the button labelled "Next: Permissions". 1: Select "Attach existing policies directly". 1: Type "AdministratorAccess" in the box labelled "Filter". 1: Click the check box on the single line that will appear below (labelled "AdministratorAccess"). 1: Click the "Next: review" button (you may need to scroll down). 1: Click the "Create user" button. 1: From the line labelled "annex-user", copy the value in the column labelled "Access key ID" to =publicKeyFile=. Also copy this value to your laptop or desktop computer; you'll want to have it if you use =condor_annex= again. 1: On the line labelled "annex-user", click the "Show" link in the column labelled "Secret access key"; copy the revealed value to =privateKeyFile=. Also copy this value your laptop or desktop computer; you'll want to have it if you use =condor_annex= again. 1: Hit the "Close" button. The 'annex-user' now has full privileges to your account. We're working on creating a CloudFormation template that will create a user with only the privileges =condor_annex= actually needs. Skip to section 2.2.3. {subsubsection: 2.2.2 Non-Privileged (Non-Administrator) Accounts} If you're using an account with limited privileges, your administrator may have already given you the credentials. If not, you may be able to create credentials for yourself at the {link: https://console.aws.amazon.com/iam/home#users IAM console}. 1: Click on your user name. 1: Click on the "security credentials" tab. 1: Click the "Create access key" button. 1: Copy the value in the column labelled "Access key ID" to =publicKeyFile=. Also copy this value to your laptop or desktop computer; you'll want to have it if you use =condor_annex= again. 1: Click the "Show" link in the column labelled "Secret access key"; copy the revealed value to =privateKeyFile=. Also copy this value to your laptop or desktop computer; you'll want to have it if you use =condor_annex= again. 1: Hit the "Close" button. {subsubsection: 2.2.3 Save the Access Key} You should keep a copy both files somewhere safe, so you don't have to recreate them every time you start a new pool in the cloud. {subsection: 2.3 Prepare your Account} The following command will prepare your AWS account for =condor_annex=. It will create a number of persistent components, none of which will cost you anything to keep around. These components can take quite some time to create; =condor_annex= checks each for completion every ten seconds and prints an additional dot (past the first three) when it does so, to let you know that everything's still working. {term} $ condor_annex -setup Creating configuration bucket (this takes less than a minute)....... complete. Creating Lambda functions (this takes about a minute)........ complete. Creating instance profile (this takes about two minutes)................... complete. Creating security group (this takes less than a minute)..... complete. Setup successful. {endterm} {subsubsection: 2.3.1 Verify Account Preparation} You can verify at this point (or any later time) that the account-preparation procedure completed successfully by running the following command. {term} $ condor_annex -check-setup Checking for configuration bucket... OK. Checking for Lambda functions... OK. Checking for instance profile... OK. Checking for security group... OK. Your setup looks OK. {endterm} {subsection: 2.4 Add Cloud Resources} Run the following command; if you type 'yes', it will add ten instances to the pool for no more than 24 hours: {term} $ condor_annex -count 10 -duration 24 -annex-name MyFirstAnnex Will request 10 m4.large on-demand instance for 24 hours. Each instance will terminate after being idle for 0.25 hours. Is that OK? (Type 'yes' or 'no'): yes Starting annex... Annex started. Its identity with the cloud provider is 'MyFirstAnnex_f2923fd1-3cad-47f3-8e19-fff9988ddacf'. It will take about three minutes for the new machines to join the pool. {endterm} Read the {link: https://htcondor-wiki.cs.wisc.edu/index.cgiwiki?p=HowToUseCondorAnnexWithOnDemandInstancesEightSevenTwo complete introduction} for more information; skip the "Using HTCondor Annex for the First Time" section, since you already have. {link: http://research.cs.wisc.edu/htcondor/manual/v8.7/6_Cloud_Computing.html Complete documentation} is also available. {section: 3 Run Jobs} To run on your new resources, a job's submit file must contain the following line: {verbatim} +MayUseAWS = TRUE {endverbatim} The new resources do _not_ share a file system with the head node, so you'll need to use file transfer: {verbatim} should_transfer_files = TRUE {endverbatim} {section: 4 Clean Up} {subsection: 4.1 The Cloud Resources} One of the benefits of using =condor_annex= is that it will automatically terminate instances after a certain amount of time (24 hours in the example above). This happens even if the instance is running a job at the time, to make sure that misbehaving jobs don't cause you to spend more than you intended. Additionally, if at any time it's been too long (15 minutes by default) since an instance ran a job, it will shut itself down to save you money. However, if you'd like to shut down the instances early, you can do so using the =condor_off= command, replacing =MyFirstAnnex= with the name of the annex you'd like to shut down: {term} $ condor_off -annex MyFirstAnnex {endterm} {subsection: 4.2 The Head Node} As noted above, you'll need to clean the head node up yourself. If you don't want to keep any of your changes, then you should "terminate" the head node to avoid paying for storage. If you just want to save money and pick up where you left off a bit later, you should instead "stop" the head node; you'll pay to keep its disk around until you start it again later. Both options are under "Instance State" if you right-click on the instance in the EC2 console.