*: At the moment, you can only use =condor_annex= for jobs on =submit-4.chtc.wisc.edu=. *: You will need a log-in on =annex-cm.chtc.wisc.edu=. (Ask your RCF about this.) -*: Your jobs must have =MayUseAWS= set in their ads. -*: Your jobs must have =WantFlocking= set in their ads. +*: The jobs you want to run on AWS must have =MayUseAWS= set in their ads. +*: The jobs you want to run on AWS must have =WantFlocking= set in their ads. Be careful! This means your jobs will flock! Do not use =condor_annex= for now if you don't want that happen. (Ask your RCF if you don't know what that means.) *: Your jobs' requirements must, of course, allow them to run on Amazon's cloud. These restrictions will be covered in the following instructions. +In these instructions, we've included sample output after the commands. Lines you should execute start with the dollar sign (=$=) character; do not include the dollar sign (=$=). + {section: Overview} 1: Prepare your AWS account @@ -17,11 +19,11 @@ 1:: Running the Set-Up Command 1:: Checking the Set-Up 2: Submit a Test Job -3: Run =condor_annex= -4: Running Jobs at Amazon -5: Cleaning Up (optional) +3: Add Resources to the Pool +4: Run Jobs on Those Resources +5: Clean Up (optional) -These instructions assume this is the first time you're using =condor_annex= on CHTC. +These instructions assume this is the first time you're using =condor_annex= on CHTC. You'll want to have two terminal windows open: one for running =condor_annex= commands (logged into =annex-cm.chtc.wisc.edu=) and another for submitting jobs (logged into =submit-4.chtc.wisc.edu=). {section: 1 Prepare your AWS account} @@ -86,13 +88,93 @@ {section: 2 Submit a Test Job} -You haven't requested any resources yet, but ... +You haven't requested any resources yet, but if you submit a test job first, you'll spend less time and money waiting for it. Log into =submit-4.chtc.wisc.edu= and create the following submit file: + +{file: annex-test.submit} +executable = /bin/sleep +transfer_executable = false +should_transfer_files = true +universe = vanilla +arguments = 600 + +log = sleep.log + +# You MUST include this when submitting from CHTC to let the annex see the job. ++WantFlocking = TRUE + +# This is required, by default, to run a job in an annex. ++MayUseAWS = TRUE + +# The first clause requires this job to run on EC2; that's what makes it +# good as a test. The second clause prevents CHTC from setting a +# requirement for OpSysMajorVer, allowing this job to run on any. +requirements = regexp( ".*\.ec2\.internal", Machine ) && (TRUE || TARGET.OpSysMajorVer) + +queue 1 +{endfile} + +Submit this file to the queue. It won't run until after you've completed the next step. + +{section: 3 Add Resources to the Pool} + +Entering the following on =annex-cm.chtc.wisc.edu= will add resources for your jobs to the pool. We call the set of resources you added an "annex". You have to supply a name for each annex you create; the example below uses 'MyFirstAnnex'. When you run =condor_annex=, it will print out what it's going to do, and then ask you if that's OK. You must type 'yes' (and hit enter) at the prompt to start an annex; if you do not, =condor_annex= will print out instructions about how to change whatever you may not like about what it said it was going to do, and then exit. + +{term} +$ condor_annex -count 1 -annex-name MyFirstAnnex -idle 1 -duration 1 +Will request 1 m4.large on-demand instance for 1 hours. Each instance will terminate after being idle for 1 hours. +Is that OK? (Type 'yes' or 'no'): yes +Starting annex... +Annex started. Its identity with the cloud provider is +'TestAnnex0_f2923fd1-3cad-47f3-8e19-fff9988ddacf'. It will take about three minutes for the new machines to join the pool. +{endterm} + +You won't need to know the annex's identity with the cloud provider unless something goes wrong. + +Before starting the annex, =condor_annex= will check to make sure that the instances will be able to contact CHTC. Contact your machine's administrator if =condor_annex= reports a problem with this step. + +Otherwise, wait a few minutes and run the following to make sure your annex has started up and joined the pool: + +{term} +$ condor_annex status -annex TestAnnexOne +Name OpSys Arch State Activity LoadAv + +slot1@ip-172-31-15-209.ec2.internal LINUX X86_64 Unclaimed Idle 0.000 +slot2@ip-172-31-15-209.ec2.internal LINUX X86_64 Unclaimed Idle 0.000 + + Machines Owner Claimed Unclaimed Matched Preempting Drain + + X86_64/LINUX 2 0 0 2 0 0 0 + + Total 2 0 0 2 0 0 0 +{endterm} + +An annex (by default) will only runs jobs which (a) you submitted and (b) have MayUseAWS set to true. You can confirm this by running the following command: + +{term} +$ condor_annex -annex TestAnnexOne status -af:r START +(MayUseAWS == true) && stringListMember(Owner,"tlmiller") +(MayUseAWS == true) && stringListMember(Owner,"tlmiller") +{endterm} + +There are {wiki: HowToUseCondorAnnexWithOnDemandInstancesEightSevenFour additional instructions} for general annex use. For now, we'll move on to actually running on your new resources. + +{section: 4 Run Jobs on Those Resources} + +It might take a while for =submit-4.chtc.wisc.edu= to try the other possibilities before giving the annex a chance to run the job. Run condor_q in the terminal logged in to that machine to keep track of the test job; it should eventually run. (Check its log +if it's gone when you checked; the job may have run and finished.) + +You can make use of the annex resources for your own jobs in two ways: by submitting new jobs and by editing existing ones. + +To submit new jobs, you can follow the example of the test job, above; you'll need the '+MayUseAWS = TRUE' line and the '+WantFlocking = TRUE' line. A reminder: this means these jobs will flock! For now, you shouldn't use =condor_annex= if you don't want your jobs flocking. (Your job, like the test job, can require that it be run on a machine whose name ends in '.ec2.internal', but that's not a secure solution.) You may also add 'regexp( ".*\.ec2\.internal", Machine )' to your requirements expression if you want to make sure a job doesn't run anywhere but on your annex. You will also need to something to your requirements to address the issue that annex resources don't presently advertise =OpSysMajorVer=. A requirements expression like the following should do the trick, where the '7' at the end is whatever version you actually need. + +{verbatim} +requirements = (regexp( ".*\.ec2\.internal", Machine ) || IsCHTC) && (OpSysMajorVer isnt defined || OpSysMajorVer == 7) +{endverbatime} -{section: 3 Run condor_annex} -{section: 4 Altering your Existing Jobs} +You can also edit existing jobs by using =condor_q=. More on using this tool will be forthcoming. {section: 5 Cleaning Up (Optional)} The resources =condor_annex= rents for you from Amazon will, as we mentioned before, shut themselves down after the duration, or if they're idle for longer than the time-out. At that point, no more charges will accrue -- it costs you nothing to leave your account set-up to use =condor_annex=. -If, however, you want to be tidy, you may delete the components setup created by going to the {link: https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filter=active CloudFormation console} and deleting the entries whose names begin with 'HTCondorAnnex-'. The setup procedure also creates an SSH key pair which may be useful for debugging; the private key was stored in =~/.condor/HTCondorAnnex-KeyPair.pem=. To remove the corresponding public key from your AWS account, go to the {link: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:sort=keyName key pair console} and delete the 'HTCondorAnnex-KeyPair' key. +If your jobs all finish early, you can run (on =annex-cm.chtc.wisc.edu=) =condor_annex -annex MyFirstAnnex off= to shut off all the resources you rented immediately.