_[If you're using v8.7.1, see the {wiki: UsingCondorAnnexForTheFirstTimeEightSevenOne v8.7.1 instructions}. These instructions are for v8.7.2. If you're using v8.7.3, see the {wiki: UsingCondorAnnexForTheFirstTimeEightSevenThree v8.7.3 instructions}.]_ This guide assumes that you already have an AWS account, as well as a log-in account on a Linux machine with a public address and a system administrator who's willing to open a port for you. All the terminal commands (shown on a grey background) and file edits (shown on a green background) take place on the Linux machine. You can perform the web-based steps from wherever is convenient, although it will save you some copying if you can run a browser on the Linux machine. Before using =condor_annex= for the first time, you'll have to do three things: 1: install a personal Condor 1: prepare your AWS account 1: configure =condor_annex= Instructions for each follow. {section: Install a personal Condor} We recommend that you install a personal condor to make use of =condor_annex=; it's simpler to configure that way. These instructions assume that it's OK to create a directory named =condor-8.7.2= in your home directory; adjust accordingly if you want to install HTCondor somewhere else. Start by {link: https://research.cs.wisc.edu/htcondor/downloads/ downloading} the 8.7.2 release from the "tarballs" section that matches your Linux version. (If you don't know your Linux version, ask your system administrator.) These instructions assume that the file you downloaded is located in your home directory on the Linux machine, so copy it there if necessary. Then do the following: {term} $ mkdir ~/condor-8.7.2; cd ~/condor-8.7.2; mkdir local $ tar -z -x -f ~/condor-8.7.2-*-stripped.tar.gz $ ./condor-8.7.2-*-stripped/condor_install --local-dir `pwd`/local --make-personal-condor $ . ./condor.sh $ condor_master {endterm} {subsection: Testing} Give HTCondor a few seconds to spin up and the try a few commands to make sure the basics are working. Your output will vary depending on the time of day, the name of your Linux machine, and its core count, but it should generally be pretty similar to the following. {term} $ condor_q - Schedd: submit-3.batlab.org : <127.0.0.1:12815?... @ 02/03/17 13:57:35 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended $ condor_status -any MyType TargetType Name Negotiator None NEGOTIATOR Collector None Personal Condor at 127.0.0.1@submit-3.bat Machine Job slot1@submit-3.batlab.org Machine Job slot2@submit-3.batlab.org Machine Job slot3@submit-3.batlab.org Machine Job slot4@submit-3.batlab.org Machine Job slot5@submit-3.batlab.org Machine Job slot6@submit-3.batlab.org Machine Job slot7@submit-3.batlab.org Machine Job slot8@submit-3.batlab.org Scheduler None submit-3.batlab.org DaemonMaster None submit-3.batlab.org Accounting none {endterm} You should also try to submit a job; create the following file {file: ~/condor-annex/sleep.submit} executable = /bin/sleep arguments = 600 queue {endfile} and submit it: {term} $ condor_submit ~/condor-annex/sleep.submit Submitting job(s). 1 job(s) submitted to cluster 1. $ condor_reschedule {endterm} After a little while: {term} $ condor_q -- Schedd: submit-3.batlab.org : <127.0.0.1:12815?... @ 02/03/17 13:57:35 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS tlmiller CMD: /bin/sleep 2/3 13:56 _ 1 _ 1 3.0 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended {endterm} {subsection: Configure public interface} The default personal Condor uses the "loopback" interface, which basically just means it won't talk to anyone other than itself. For =condor_annex= to work, your personal condor needs to use the Linux machine's public interface. In most cases, that's as simple as adding the following lines to =~/condor-8.7.2/local/condor_config.local=. {file: ~/condor-8.7.2/local/condor_config.local} NETWORK_INTERFACE = * CONDOR_HOST = $(FULL_HOSTNAME) {endfile} Restart HTCondor to force the changes to take effect: {term} $ condor_restart Sent "Restart" command to local master {endterm} Repeat the steps under "Testing" to make sure that this configuration works for you, and then proceed onto the next section. {subsection: Configure a pool password} In this section, you'll configure your personal Condor to use a pool password. This is a simple but effective method of securing Condor's communications to AWS. Add the following lines to =~/condor-8.7.2/local/condor_config.local=. {file: ~/condor-8.7.2/local/condor_config.local} SEC_PASSWORD_FILE = $(LOCAL_DIR)/condor_pool_password SEC_DAEMON_INTEGRITY = REQUIRED SEC_DAEMON_AUTHENTICATION = REQUIRED SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD SEC_NEGOTIATOR_INTEGRITY = REQUIRED SEC_NEGOTIATOR_AUTHENTICATION = REQUIRED SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD SEC_CLIENT_AUTHENTICATION_METHODS = FS, PASSWORD ALLOW_DAEMON = condor_pool@* {endfile} You also need to run the following command, which prompts you to enter a password: {term} $ condor_store_cred -c add -f `condor_config_val SEC_PASSWORD_FILE` Enter password: {endterm} Enter a password. (For more details, see HowToEnablePoolPassword.) {subsection: Tell HTCondor about the open port} By default, HTCondor will use port 9618. If the Linux machine doesn't already have HTCondor installed, and the admin is willing to open that port, then you don't have to do anything. Otherwise, you'll need to add a line like the following to =~/condor-8.7.2/local/condor_config.local=, replacing '9618' with whatever port the administrator opened for you. {file: ~/condor-8.7.2/local/condor_config.local} COLLECTOR_HOST = $(FULL_HOSTNAME):9618 {endfile} {subsection: Activate the new configuration} Force HTCondor to read the new configuration by restarting it: {term} $ condor_restart {endterm} {section: Prepare your AWS account} The =condor_annex= tool now includes a =-setup= command which will prepare your AWS account. {subsection: Obtaining an Access Key} In order to use AWS, =condor_annex= needs a pair of security tokens (like a user name and password). Like a user name, the "access key" is (more or less) public information; the corresponding "secret key" is like a password and must be kept a secret. To help keep both halves secret, =condor_annex= (and HTCondor) are never told these keys directly; instead, you tell HTCondor which file to look in to find each one. Create those two files now; we'll tell you how to fill them in shortly. By convention, these files exist in your =~/.condor= directory, which is where =condor_annex -setup= will store the rest of the data it needs. {term} $ mkdir ~/.condor $ cd ~/.condor $ touch publicKeyFile privateKeyFile $ chmod 600 publicKeyFile privateKeyFile {endterm} The last command ensures that only you can read or write to those files. To donwload a new pair of security tokens for =condor_annex= to use, go to the {link: https://console.aws.amazon.com/iam/home?region=us-east-1#/users IAM console}; log in if you need to. The following instructions assume you are logged in as a user with the privilege to create new users. (The 'root' user for any account has this privilege; other accounts may as well.) 1: Click the "Add User" button. 1: Enter name in the *User name* box; "annex-user" is a fine choice. 1: Click the check box labelled "Programmatic access". 1: Click the button labelled "Next: Permissions". 1: Select "Attach existing policies directly". 1: Type "AdministratorAccess" in the box labelled "Filter". 1: Click the check box on the single line that will appear below (labelled "AdministratorAccess"). 1: Click the "Next: review" button (you may need to scroll down). 1: Click the "Create user" button. 1: From the line labelled "annex-user", copy the value in the column labelled "Access key ID" to =publicKeyFile=. 1: On the line labelled "annex-user", click the "Show" link in the column labelled "Secret access key"; copy the revealed value to =privateKeyFile=. 1: Hit the "Close" button. The 'annex-user' now has full privileges to your account. We're working on creating a CloudFormation template that will create a user with only the privileges =condor_annex= actually needs. {subsection: Running the Setup Command} The following command will setup your AWS account. It will create a number of persistent components, none of which will cost you anything to keep around. These components can take quite some time to create; =condor_annex= checks each for completion every ten seconds and prints an additional dot (past the first three) when it does so, to let you know that everything's still working. {term} $ condor_annex -setup Creating configuration bucket (this takes less than a minute)....... complete. Creating Lambda functions (this takes about a minute)........ complete. Creating instance profile (this takes about two minutes)................... complete. Creating security group (this takes less than a minute)..... complete. Setup successful. {endterm} {subsection: Checking the Setup} You can verify at this point (or any later time) that the setup procedure completed successfully by running the following command. {term} $ condor_annex -check-setup Checking for configuration bucket... OK. Checking for Lambda functions... OK. Checking for instance profile... OK. Checking for security group... OK._[If you're using v8.7.1 or earlier, see the {wiki: HowToUseCondorAnnexWithOnDemandInstancesEightSevenOne v8.7.1 instructions}. These instructions are for v8.7.2.]_ {section: Using HTCondor Annex for the First Time} We assume you already have an AWS account, as well as a log-in account on a Linux machine with a public IP address and an administrator who's willing to open a port for you. If so, you can follow the instructions for {wiki: UsingCondorAnnexForTheFirstTimeEightSevenTwo using HTCondor Annex for the first time}. If you're not sure if you've configured =condor_annex= before, you may enter the following to check. {term} $ . ~/condor-8.7.2/condor.sh $ condor_annex -check-setup Checking for configuration bucket... OK. Checking for Lambda functions... OK. Checking for instance profile... OK. Checking for security group... OK. Your setup looks OK. {endterm} If you see anything else, follow the instructions above. {section: What You'll Need to Know} To create a HTCondor annex with on-demand instances, you'll need to know two things: 1: A name for it. "MyFirstAnnex" is a fine name for your first annex. 1: How many instances you want. For your first annex, when you're checking to make sure things work, you may only want one instance. {section: Start an Annex} Entering the following will start an annex named "MyFirstAnnex" with one instance. =condor_annex= will print out what it's going to do, and then ask you if that's OK. You must type 'yes' (and hit enter) at the prompt to start an annex; if you do not, =condor_annex= will print out instructions about how to change whatever you may not like about what it said it was going to do, and then exit. {term} $ condor_annex -count 1 -annex-name MyFirstAnnex Will request 1 m4.large on-demand instance for 0.83 hours. Each instance will terminate after being idle for 0.25 hours. Is that OK? (Type 'yes' or 'no'): yes Starting annex... Annex started. Its identity with the cloud provider is 'TestAnnex0_f2923fd1-3cad-47f3-8e19-fff9988ddacf'. It will take about three minutes for the new machines to join the pool. {endterm} You won't need to know the annex's identity with the cloud provider unless something goes long. Before starting the annex, =condor_annex= will check to make sure that the instances will be able to contact your pool. Contact your machine's administrator if =condor_annex= reports a problem with this step. {subsection: instance types} Each {link: https://aws.amazon.com/ec2/instance-types/ instance type} provides a different number (and/or type) of CPU cores, amount of RAM, local storage, and the like. If you're not sure, we recommend starting with 'm4.large', which has 2 CPU cores and 8 GiB of RAM. As noted in the example output above, you can specify an instance type with the =-aws-on-demand-instance-type= flag. {subsection: leases} By default, =condor_annex= arranges for your annex's instances to be terminated after =0.83= hours (50 minutes) have passed. Once it's in place, this lease doesn't depend on your machine, but it's only checked every five minutes, so give your deadlines a lot of cushion to make you don't get charged for an extra hour. The lease is intended to help you conserve money by preventing the annex instances from accidentally running forever. As noted in the example output above, you can specify a lease duration (in decimal hours) with the =-duration= flag. If you need to adjust the lease for a particular annex, you may do so by specifying an annex name and a duration, but not a count. When you do so, the new duration is set starting at the current time. For example, if you'd like "MyFirstAnnex" to expire eight hours from now: {term} $ condor_annex -annex-name MyFirstAnnex -duration 8 Lease updated. {endterm} {subsection: idle time} By default, =condor_annex= will configure your annex's instances to terminate themselves after being idle for =0.25= hours (fifteen minutes). This is intended to help you conserve money in case of problems or an extended shortage of work. As noted in the example output above, you can specify a max idle time (in decimal hours) with the =-idle= flag. =condor_annex= considers an instance idle if it's {link: http://research.cs.wisc.edu/htcondor/manual/v8.6/3_7Policy_Configuration.html#37887 unclaimed}, so it won't get tricked by jobs with long quiescent periods. {subsection: multiple annexes} You may have up to fifty (or fewer, depending what else you're doing with your AWS account) differently-named annexes running at the same time. Running =condor_annex= again with the same annex name before stopping that annex will both add instances to it and change its duration. Only instances which start up after an invocation of =condor_annex= will respect that invocation's max idle time. That may include instances still starting up from your previous (first) invocation of =condor_annex=, so be sure your instances have all joined the pool before running =condor_annex= again with the same annex name if you're changing the max idle time. Each invocation of =condor_annex= requests a fixed number of instances of a given type; you may specify either or both with each invocation, but neither will change either about the previous request. {section: Monitor your Annex} You can find out if that instance has successfully joined the pool in the following way. {term} $ condor_status -annex MyFirstAnnex slot1@ip-172-31-48-84.ec2.internal LINUX X86_64 Unclaimed Idle 0.640 3767 slot2@ip-172-31-48-84.ec2.internal LINUX X86_64 Unclaimed Idle 0.640 3767 Total Owner Claimed Unclaimed Matched Preempting Backfill Drain X86_64/LINUX 2 0 0 2 0 0 0 0 Total 2 0 0 2 0 0 0 0 {endterm} This example shows that the annex instance you requested has joined your pool. (The default annex image configures one static slot for each CPU it finds on start-up.) You can also get a report about the instances which have not joined your pool: {term} $ condor_annex -annex MyFirstAnnex -status STATE COUNT pending 1 TOTAL 1 Instances not in the pool, grouped by state: pending i-06928b26786dc7e6e {endterm} {section: Run a Job} Starting in v8.7.1, the default behaviour for an annex instance is to run only jobs submitted by the user who ran the =condor_annex= command. If you'd like to allow other users to run jobs, list them (separated by commas; be sure to include yourself) as arguments to the =-owner= flag when you start the instance. If you're creating an annex for general use, use the =-no-owner= flag to run jobs from anyone. Also starting in v8.7.1, the default behaviour for an annex instance is to run only jobs which have the =MayUseAWS= attribute set (to true). To submit a job with =MayUseAWS= set to true, add =+MayUseAWS = TRUE= to the submit file somewhere before the =queue= command. To allow an existing job to run in the annex, use =condor_q_edit=. For instance, if you'd like cluster 1234 to run on AWS: {term} $ condor_qedit 1234 "MayUseAWS = TRUE" Set attribute "MayUseAWS" for 21 matching jobs. {endterm} {section: Stop an Annex} The following command shuts HTCondor off on each instance in the annex; if you're using the default annex image, doing so causes each instance to shut itself down. {term} $ condor_off -annex MyFirstAnnex Sent "Kill-Daemon" command for "master" to master ip-172-31-48-84.ec2.internal {endterm} {section: Advanced Usage} The information is this section is for advanced users and may not apply (or make sense) to everyone. {subsection: Configure the Annex} You can customize the configuration of your annex. If you pass the full path to a directory (for example, =/home/annex/config.d=) to =condor_annex= using the =-config-dir= option, condor_annex will copy the files in that directory to the HTCondor config directory on each annex instance. This does _not_ replace the customization that =condor_annex= is already doing to configure security and tell the annex instances which pool to join; those changes will be laid down on top of (a temporary copy of) the directory you specified before being transferred to the instances. Your setup looks OK. {endterm} {subsection: Undoing the Setup Command} There is not as yet a way to undo the setup command automatically, but it won't cost you anything extra to leave your account setup for =condor_annex= indefinitely. If, however, you want to be tidy, you may delete the components setup created by going to the {link: https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filter=active CloudFormation console} and deleting the entries whose names begin with 'HTCondorAnnex-'. The setup procedure also creates an SSH key pair which may be useful for debugging; the private key was stored in =~/.condor/HTCondorAnnex-KeyPair.pem=. To remove the corresponding public key from your AWS account, go to the {link: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:sort=keyName key pair console} and delete the 'HTCondorAnnex-KeyPair' key. You're ready to run =condor_annex=! Return to HowToUseCondorAnnexWithOnDemandInstancesEightSevenTwo.