HTCondorWiki: Experimental Annex Daemon

description

The annex daemon will be the production implementation of the condor_annex tool. (See ExperimentalCondorAnnex.) At present, it provides only the ability to provision leased AWS instances efficiently (in bulk).

The lease implementation requires an AWS Lambda function; rather than upload it every time, for efficiency the annex daemon must be provided the function's ARN. This may be automated in the future; see the installation instructions, below, for the manual process.

installation

Install the pre-release package(s) as normal.

Add the following lines to your HTCondor configuration (maybe in config.d/80annexd). If you don't want your instances to show up in us-east-1 by default, edit the URLs in the last three lines appropriately.

# Turn the annex daemon on.
DAEMON_LIST = $(DAEMON_LIST) ANNEXD

# Optional: configure the default endpoints.  All three endpoints need to be the same region.
ANNEX_DEFAULT_EC2_URL = https://ec2.us-east-1.amazonaws.com
ANNEX_DEFAULT_CWE_URL = https://events.us-east-1.amazonaws.com
ANNEX_DEFAULT_LAMBDA_URL = https://lambda.us-east-1.amazonaws.com

Start (or reconfigure) HTCondor to make sure that the annex daemon is running.

Lambda function

The attachments include a CloudWatch template, template-3.json, that you can use to create the necessary AWS Lambda function (and grant it the permissions necessary for it do its job); instructions follow for readers who haven't created a stack from a template file before. After logging into the AWS web console, do the following for each region you intend to use (start with us-east-1, since that has the example AMI):

Switch to the region. (The second drop-down box in from the upper right.)
Switch to CloudFormation. (In the Services menu, under Management.)
Click the "Create Stack" button.
Upload the template using the "Browse..." button
Click the "Next" button.
Name the stack; "HTCondorLeaseImplementation" is a good name.
Click the "Next" button. (You don't to change anything on the options screen.)
Check the box next to "I acknowledge" (down near the bottom) and click the "Create" button (where the "Next" button was).
AWS should return the list of stacks; select the one you just created and select the "outputs" tab.
Copy the long string labelled "LeaseFunctionARN"; you'll need it to for each invocation of condor_annex. It may take some time for that string to appear (you may need to reload, as well.) Wait the stack to enter the 'CREATE_COMPLETE' state before using the LeaseFunctionARN (see below).

usage

In this prototype, the condor_annex tool requires a JSON file describing the kind of instances you'd like. The easiest way to generate this JSON file is using the AWS web console. Start a Spot Fleet request for the instance types and other configuration you'd like; on the last page before you submit the request, there's a button in the upper-right labelled "JSON config"; click it and save the file (the default name is config.json, which is fine).

After you save the config, open it in your favorite text editor and remove the two lines containing "ValidFrom" and "ValidUntil".

Now you're ready to run condor_annex. If you've been using the example's filenames, the command will look something like the following. The -count flag is optional; condor_annex will otherwise request the target capacity you specify in the JSON file.

condor_annex \
   -access <access key file> -secret <secret key file> \
   -lease-duration <seconds> \
   -lease-function-arn <LeaseFunctionARN string> \
   [-count <target-capacity>]
   config.json

The tool will print out the Spot Fleet request ID generated by the daemon if it was successful.

options

Like most other HTCondor tools, you can specify which daemon condor_annex should contact using the -pool and -name options. This is probably not useful in this version.
You may specify the endpoints with the -service-url, -events-url, and -lambda-url options.
If you'd like to change (or specify) the user data from the command line, use the -[default-]user-data[-file] options. The -user-data flag sets the user data for each instance to the value passed on the command line; appending -file sets the user data for each instance to the data in the file, instead. Prepending -default only sets the user data for instances which would otherwise lack it. You may set default user data from a file.
The tool also support the -debug and -help flags.

advanced usage

The usage above should suffice for the efficient provisioning of many instance of the images you're already using. If you'd like to use the prototype to start annexes, the procedure is somewhat involved. The basic idea is as follows: when HTCondor starts up (this may be replaced by "when the OS finishes booting" in future releases), it runs a script which looks at the permissions which have been granted to the instance. (Obviously, this fails if the instance hasn't been granted permission to look at its own permissions.) If one of those permisssions is read access to a specific file in a specific S3 bucket, the script downloads the file into the HTCondor config.d directory, or, if the file is a tarball, untars it there. Because this mechanism is entirely independent of the usual userdata-dependent contextualization methods, it can be used to dynamically configure HTCondor regardless of an how an instance otherwise might configure itself. We expect this ability to generally be used to configure HTCondor to join specific pool at run-time (as opposed to image creation time).

We have provided a second CloudWatch template to help construct this mechanism. Actually, because we hope it's easier than editing the template, we have provided a script, attached as generate-role, which takes the S3 bucket name and file and outputs a CloudFormation template which creates the corresponding IAM role and instance profile. Run the script as follows: generate-role bucketName fileName > role.json and follow the instructions above (under "Lambda function") to create the corresponding stack (you'll have to name it something else). The bucketName must be a bucket you can write to, and the fileName should probably be something like config.tar.gz. Note -- you don't need to, nor do you probably want to, give the configuration file read permission in S3; the role provides the necessary authorization. This keeps fileName private. For this stack, the output will be called "InstanceConfigurationProfile".

You'll need to create some configuration to upload. For testing purposes, any valid configuration will do; you just want to be able to check that HTCondor is using the configuration. (For example, you could create a file name 17custom, with the following contents:

isCustomConfig = TRUE
STARTD_ATTRS = $(STARTD_ATTRS) isCustomConfig

and then tar -z -c -f config.tar.gz 17custom in order to generate the file to upload.)

The example AMI (ami-aacfc2bd, in us-east-1) is already configured to take advantage of profiles like the one you just created, so if you start an instance with the profile you just created, you should be able to SSH into it, and if you followed the example, run condor_config_val -startd isCustomConfig, and get back 'TRUE'. (By querying a specific daemon, you ensure that you're checking the live configuration, not any configuration change that happened on disk after HTCondor startup.)

The configuration script (49ec2-instance.sh) is attached. Add the following line to your base condor_config, if it isn't there already, to use it:

include ifexist command into $(LOCAL_CONFIG_DIR)/49ec2-instance.config : \
        /etc/condor/config.d/49ec2-instance.sh

This, of course, assumes that you've installed the script as /etc/condor/config.d/49ec2-instance.sh.

Attachments:

template-3.json 3993 bytes added by tlmiller on 2016-Dec-30 21:09:32 UTC.
This template creates the lease infrastructure for your AWS account.

generate-role 2006 bytes added by tlmiller on 2016-Dec-30 21:11:58 UTC.
This script generates a template. When instantiated, that template provides an instance profile. Instances run under that profile can introspectively discover that they were granted permission to download a specific file in S3.

49ec2-instance.sh 4334 bytes added by tlmiller on 2017-Jan-04 20:07:23 UTC.
This script sets EC2PublicIP and EC2InstanceID (for later configuration to use) and also downloads and extracts the configuration file pointed to by the instance's role. (It also turns off nonroot access to the metadata, so that user jobs don't get access to the role's privileges.)