HTCondorWiki: Experimental Annex Daemon

Page History

description

The annex daemon will be the production implementation of the condor_annex tool. (See ExperimentalCondorAnnex.) At present, it provides only the ability to provision leased AWS instances efficiently (in bulk).

The lease implementation requires an AWS Lambda function; rather than upload it every time, for efficiency the annex daemon must be provided the function's ARN. This may be automated in the future; see the installation instructions, below, for the manual process.

installation

Install the pre-release package(s) as normal.

Add the following lines to your HTCondor configuration (maybe in config.d/80annexd). If you don't want your instances to show up in us-east-1 by default, edit the URLs in the last three lines appopriately.

# Turn the annex daemon on.
DAEMON_LIST = $(DAEMON_LIST) ANNEXD

# Optional: configure the default endpoints.  All three endpoints need to be the same region.
ANNEX_DEFAULT_EC2_URL = https://ec2.us-east-1.amazonaws.com
ANNEX_DEFAULT_CWE_URL = https://events.us-east-1.amazonaws.com
ANNEX_DEFAULT_LAMBDA_URL = https://lambda.us-east-1.amazonaws.com

Lambda function

The examples directory includes a CloudWatch template [FIXME] (the file template-3.json) you can use to create the necessary AWS Lambda function (and grant it the permissions necessary for it do its job); instructions follow for readers who haven't created a stack from a template file before. After logging into the AWS web console, do the following for each region you intend to use:

Switch to the region. (The second drop-down box in from the upper right.)
Switch to CloudFormation. (In the Services menu, under Management.)
Click the "Create Stack" button.
Upload the template using the "Browse..." button
Click the "Next" button.
Name the stack; "HTCondorLeaseImplementation" is a good name.
Click the "Next" button. (You don't to change anything on the options screen.)
Check the box next to "I acknowledge" (down near the bottom) and click the "Create" button (where the "Next" button was).
AWS should return the list of stacks; select the one you just created and select the "outputs" tab.
Copy the long string labelled "LeaseFunctionARN"; you'll need it to for each invocation of condor_annex. It may take some time for that string to appear (you may need to reload, as well.) Wait the stack to enter the 'CREATE_COMPLETE' state before using the LeaseFunctionARN (see below).

usage

In this prototype, the condor_annex tool requires a JSON file describing the kind of instances you'd like. The easiest way to generate this JSON file is using the AWS web console; when you request a Spot Fleet, on the last page before you submit the request, there's a button in the upper-right labelled "JSON config"; click it to download a file. Save the file (the default name is config.json, which is fine). FIXME: If you're not familiar with Spot Fleet, the prototype includes a reasonable example config.json you can use to get started. It uses an image maintained by HTCondor (Amazon Linux with HTCondor pre-installed).

After you save the config, open it in your favorite text editor and remove the two lines containing "ValidFrom" and "ValidUntil". (FIXME: If condor_annex were to just ignore these entries, would it need to support additional CLI flags to set them?)

Now you're ready to run condor_annex. If you've been using the example's filenames, the command will look something like the following.

condor_annex \
   # FIXME: should eiter be -access/-secret or -public/-private.
   -public <public (access) key file> -secret <secret (private) access key file> \
   -lease-duration <seconds> \
   # FIXME: Leaving this out entirely doesn't produce a reasonable error message.
   # FIXME: The daemon crashes on restart after the above failure.
   -lease-function-arn <LeaseFunctionARN string> \
   # FIXME: Add a CL flag to override this file's count.
   config.json

The tool will print out the Spot Fleet request ID generated by the daemon if it was successful.

options

Like most other HTCondor tools, you can specify which daemon condor_annex should contact using the -pool and -name options. This is probably not useful in this version.
You may specify the endpoints with the -service-url, -events-url, and -lambda-url options.
If you'd like to change (or specify) the user data from the command line, use the -[default-]user-data[-file] options. The -user-data flag sets the user data for each instance to the value passed on the command line; appending -file sets the user data for each instance to the data in the file, instead. Prepending -default only sets the user data for instances which would otherwise lack it. You may set default user data from a file.
The tool also support the -debug and -help flags.

advanced usage

The usage above should suffice for the efficient provisioning of many instance using the images you already are. If you'd like to use the prototype to start annexes, the procedure is somewhat involved. The basic idea is as follows: when HTCondor starts up (this may be replaced by "when the OS finishes booting" in future releases), it runs a script which looks at the permissions which have been granted to the instance. (Obviously, this fails if the instance hasn't been granted permission to look at its own permissions.) If one of those permisssions is read access to a specific file in a specific S3 bucket, the script downloads the file into the HTCondor config.d directory, or, if the file is a tarball, untars it there. Because this mechanism is entirely independent of the usual userdata-dependent contextualization methods, it can be used to dynamically configure HTCondor regardless of an how an instance otherwise might configure itself. We expect this ability to generally be used to configure HTCondor to join one specific pool or another.

We have provided a second CloudWatch template to help construct this mechanism. Actually, because it was easier, we have provided a script which -- given the S3 bucket name and file -- outputs a CloudWatch which creates the corresponding IAM role and instance profile. ....

Attachments:

template-3.json 3993 bytes added by tlmiller on 2016-Dec-30 21:09:32 UTC.
This template creates the lease infrastructure for your AWS account.

generate-role 2006 bytes added by tlmiller on 2016-Dec-30 21:11:58 UTC.
This script generates a template. When instantiated, that template provides an instance profile. Instances run under that profile can introspectively discover that they were granted permission to download a specific file in S3.

49ec2-instance.sh 4334 bytes added by tlmiller on 2017-Jan-04 20:07:23 UTC.
This script sets EC2PublicIP and EC2InstanceID (for later configuration to use) and also downloads and extracts the configuration file pointed to by the instance's role. (It also turns off nonroot access to the metadata, so that user jobs don't get access to the role's privileges.)