10,000 foot view

When someone sends email to condor-admin or condor-support, it gets entered into RUST. RUST sends them email telling them we got their message, assigning them a ticket number, and telling them to direct all replies to condor-admin or condor-support. It then shows up in one of the two RUST queues (condor-admin is unpaid support, condor-support is paid support).

Every ticket has a bunch of info associated with it:

  1. A number and subject line
  2. The email address of the person who sent the problem report (the "user", in RUST-speak)
  3. The email address of the person dealing with the problem (the "owner", in RUST-speak)
  4. A list of everyone who should get copies of any email sent about the problem.
  5. A "status" ("new" = no one has done anything with it yet, "open" = someone has replied to it, "resolved" = problem solved, "source" = request for source, etc. See the section at the bottom of this on "RUST Status Settings" for the whole story).

All RUST email goes through condor-admin or condor-support. You never send a message directly to user, and the user never sends email directly to the owner. Every time RUST sends a message to anyone, it sets the Reply-To file to point to itself, so if RUST sends you email for some reason, you should just reply to it and RUST will do the rest. The reason for this is that RUST maintains the list of all people who should get email about a given ticket. If anyone (owners or users) ever CC's someone when they're sending a message to RUST, that email address will get added to the list. So, you never have to remember to CC anyone (like you do with the CSL's setup), since RUST deals with all of that for you.

Someone (usually the RUST jocky for the week) queries the RUST queues periodically, and assigns tickets out to people on the team. When this happens, RUST will send a copy of the original message to the new ticket owner. When you've solved the problem, you should "resolve" the ticket, by changing its status.

If you've discovered that you have a ticket really destined for the CAE, this is how you move it there:

  /p/condor/home/bin/tocae.pl <queuename> <ticket number>

Example:

  /p/condor/home/bin/tocae.pl condor-admin 1234

10 foot view

You can access the HTCondor RUST queue from any dept. linux workstation. To setup, add /p/condor/home/rust/bin/sys to your PATH. If you don't have it already, you probably also want to add /p/condor/home/bin to your PATH, since there are some handy scripts in there to help deal with RUST.

To manage the RUST queues from a consolidated GUI tool, fire up /p/condor/home/bin/xrust. The rest of this section details the RUST command line interface, which is ultimately more powerful.

To see all active items in our two queues:

  query --queue condor-admin
  query --queue condor-support

To see all resolved items in our queues:

  query --queue condor-admin --status resolved
  query --queue condor-support --status resolved

To see active items owned by jbasney:

  query --queue condor-admin --owner jbasney
  query --queue condor-support --owner jbasney

To see resolved items owned by jbasney:

  query --queue condor-admin --owner jbasney --status resolved
  query --queue condor-support --owner jbasney --status resolved

For all of the above commands, the "query --queue condor-*" part lives in the "qa" and "qs" scripts (for condor-admin and condor-support, respectively). So, you can just use:

  qa --status resolved
  qs
  qs --owner wright
  ...

To see the entire contents of a RUST item #1 in the condor-admin queue:

  query --queue condor-admin --show 1

Or, you can just use the following script:

  showa 1

There's also a version for condor-support:

  shows 23

To resolve ticket #1 in the condor-admin queue:

  action --queue condor-admin --num 1 --set status --value resolved

There are aliases installed in /p/condor/home/bin to help this:

  resa <number>		# resolves <number> from condor-admin
  ress <number>		# resolves <number> from condor-support

To assign ticket #1 in the condor-admin queue to jbasney:

  action --queue condor-admin --num 1 --assign jbasney

Again, there are aliases in /p/condor/home/bin to help this:

  assigna <number> <user>	# for condor-admin
  assigns <number> <user>	# for condor-support

In general, to use the action command with the --queue and -num arguments handled for you, you can use the aliases:

  acta <number>	<other action flags>
  acts <number>	<other action flags>

You can also add comments to a ticket from the command line. This allows you to enter text into the ticket without it being sent to the ticket user. For example, when I was going through RUST, discovering old tickets, I sent email to various HTCondor folks asking them where a ticket stood, since I couldn't tell from the ticket itself. I've just entered their responses into the appropriate tickets as a comment. It's pretty slick.

Here's how it works: You either need to use the acta/acts scripts, or provide --queue and --num arguments to "action", but the last thing is just "--comment". After that, it reads from stdin whatever comment you want, until it sees an EOF. So, you can either type it in right there, and hit ^D, or you can redirect stdin from a file, pipe, whatever.

So, for example:

  cat comment.txt | acta 2383 --comment

or:

  % acta 2383 --comment
  Adding comments to condor-admin request 2383

  Enter comments (End with ^D):
  typing in comments here...
  ^D
  %

Flagging Spam

If you're looking at spam that you want to permenantly purge from RUST (such as handling SPAM that inevitably gets through), use spama:

  spama 12345

spama sets the ETA on a ticket to be 'spam', and then resolves it. We can later look through and calculate how much of our ticket traffic is spam. There's an equivalent spams command for condor-support.

If it looks like a given spammer is hitting us repeatedly with the same source email address, consider adding them to /p/condor/home/rust/scripts/filter_mail. Search for "spam" to find them. See the filter_mail section for more on how to edit filter_mail.

Merging tickets

If, instead of responding to the last message you sent them, a user decides to send a new RUST query pertaining to an already-open ticket, you can merge the two:

  mergea 6283 6289

or

  merges 711 713

This aliases will merge ticket #1 into ticket #2. This is the equivalent to

  action --queue condor-admin --merge --num1 X --num2 Y

or

  action --queue condor-support --merge --num1 X --num2 Y

Resend a ticket to yourself

To get rust to resend the e-mail for a rust ticket without assigning it to someone else and then re-assigning it yourself:

  [acta | acts] <admin rust #> --send email-address

for example:

  acta 1822 --send zmiller@cs.wisc.edu

Copying off-list email to an existing ticket

Peter C. provided the scripts "rustmea" and "rustmes" (in /p/condor/home/bin) to take an e-mail message from stdin, and bounce it into rust-admin or rust-support, respectively. If you provide an argument of a rust-ticket id, they will append the message to an existing ticket.

for example:

  cat msg | rustmea

or

  cat msg | rustmea 4242

Adding people to a Rust CC list

You can add users to the CC list using appenda and appends:

  appenda 9444 another.person@globus.example.com

X-Rust header commands

Also, when you're sending email to RUST, you can do certain actions with "X-Rust headers". These are special mail headers you can put in your message to tell RUST to do certain things. I have hacked RUST here so that these "mail headers" don't really have to be headers for the benefit of all the poor souls using Eudora or Microshaft's mail program that can't add their own email headers to their outgoing email. Now, not only does RUST scan through your email headers looking for an X-Rust: header, you can have such a line anywhere in your message body (so long as it's at the very beginning of a line). When your message goes to RUST, RUST would rip that line out of the message (so no one would even see it) and perform the request actions.

There are a number of things you can do with an X-Rust header:

  X-Rust: resolve

resolves the ticket

  X-Rust: assign user

assigns the ticket to user

  X-Rust: set header value

Sets RUST header to value, for example:

  X-Rust: set status pending

Would change the status from "open" (or whatever it is now) to "pending"

You can also do multiple commands at once, stringing them together with commas:

  X-Rust: set status pending, set eta March 15, assign stanis

This would set the status to pending, the estimated time of arrival (action?) to March 15, and assign the ticket to Tom. That's probably way more complicated than most of you need to do... mostly you just need a way to resolve (and maybe re-assign tickets to someone else).

More help

For more info:

  query --help
  action --help

Also, system documentation is at /p/condor/home/rust/docs/HOWTO.

Searching RUST

(As of 2012-10-09, the commands mentioned below don't exist.)

To ease searching of RUST Alain Roy added a protoype glimpse search. Because of the size of the index (150mB as of June 20, 2003), it's on the local disk of chopin. You'll need to log into chopin to do searches. Use /scratch/roy/rustglimpse/search-rust to return the names of files that include your search term. use /scratch/roy/rustglimpse/search-rust-long to return the actual lines that match (ala grep).

(Unfinished as of June 4 when Alain released his prototype: 1. Move the index into AFS? It's big and AFS might by slow. 2. Regenerate the index nightly, probably through cron. The indexing takes an hour or so.)

RUST etiquette

1) When to respond

If you have a condor-support ticket assigned to you, that pretty much means you need to stop everything you're doing and deal with that RUST. We're under contract to reply within a day or something, and have a solution or work-around within 3 days. So, if you have condor-support RUST, that has to be the highest priority thing on your plate. Condor-admin tickets are generally a little more low-key, but you should at least try to respond ASAP and let people know the status of their request. Condor-admin RUST from local users in CS should be highest priority. After that, there are some important sites for us, via the Alliance, PACS, etc. Then, the lowest priority are all the other random users and pools out there.

2) How to respond

RUST includes a lot of extra stuff in messages when it gets sent to ticket owners for their own info. This is potentially confusing for users, and certainly lots of noise, so PLEASE delete that stuff if you include an original message in your reply. B/c all email in and out of RUST is stored in an archive, and sent to lots of people, it's a good idea to trim out extra junk. In particular, users will often send their entire config file, or log files, etc. DO NOT include the whole thing in your reply, only site the important part they need to look at to solve their problem. In general, include AS LITTLE from the original message as you can to make your point.

3) Resolve as you go

Once you're done with a ticket, please resolve it, so we don't have to go back through old RUST, trying to resolve old tickets that no longer need to be open. It's much easier that way, for everyone.

4) If you can't resolve, set the status accordingly

We have a lot of valid status settings we use, to help keep our tickets sorted. If you can't resolve a ticket relatively quickly, please set its status to the appropriate one described in the section below.

5) Don't hesitate to reply with "RTFM" (Read The Friggin' Manual)

The only things to watch out for here are try to be nice about it, and try to point people to the specific section of the manual they need to read. We put a lot of work into our docs so that we don't have to spend as much time writing RUST about the same things over and over again. We should probably put a FAQ in the manual, too, but that's another story.

6) Dealing with LCG Savannah tickets

As of 5/2005 tickets, HTCondor related tickets from the LCG Savannah (think Bugzilla) server are re-posted into RUST. Our RUST server keeps things in sync with Savannah using the savannah_repost and savannah_tag scripts in /p/condor/home/rust/scripts. The only thing you need to do differently is update the states in both RUST and Savannah. They have a much more complex state model and there's no good way to do a 1->1 mapping. All changes can be made by following the URL in the ticket. If you don't have an LCG Savannah account you can sign up for one (should be a link on the left) and email Erwin Laure (Erwin.Laure@cern.ch) to get write access.

When a Savannah ticket comes in (it should say it's from LCG Savannah in the first line) it's in the "none" state. After taking a quick look you should change the state to either "accepted" or "rejected". If you accept the ticket, move it to "in progress" when you start working on it. Don't do this too soon -- if tickets are in progress for a long time people will start asking questions. When everything's done you mark the ticket "ready for integration" and resolve the RUST.

RUST Status Settings

We handle a lot of RUST. It's much easier to view the status of a ticket and get a sense of the deal than to have to read through the whole ticket log to figure out what's going on. So, it's very important that everyone working with RUST make a big effort to set the appropriate status for tickets assigned to them. This will make everyone's life easier.

Here are all the possibilities:

The main status settings:

new

open

pending

Special status settings (all of these require human intervention to get a ticket into one of these states):

bug

feature

release

manual

port

source

1 foot view (ticket lock files, editing header files, etc)

<disclaimer>

Potentially dangerous information is included in this section. Please only use this knowledge if you really need it, and do so with great caution.

</disclaimer>

There are a few cases that call for really low-level mucking with RUST. In particular, you might have to remove a RUST lock file, if RUST doesn't properly clean up after itself in certain fatal conditions. You might also need to edit the ticket meta-data directly, also known as the ticket "headers". This is all the stuff you see at the beginning of the message when you show a ticket, under the heading "Ticket Information". This info is kept in a separate file from the ticket history itself.

The lock files and header files are kept in the same location, described below. After that, there are separate sections which describe how to actually manipulate these files.

Location of Lock and Header Files

RUST headers and lock files are kept in the following directory tree:

  /p/condor/home/rust/headers

In there, there are subdirectories for each queue. Within those directories, there is one more level of directories, to try to prevent there from being single directories with too many files. So, the full path to a given ticket's headers would be something like:

  /p/condor/home/rust/headers/condor-(admin|support|mm)/X/Y

The "Y" is the actual ticket number (e.g. 1697). The "X" is a subdirectory (e.g. 1000). Tickets 1-999 go into "1", 1000-1999 go into "1000", and so on.

The lock files live in the same directory with the ticket headers, but are named "Y.lock". So for example, the lock file for condor-admin ticket #1697 would be:

  /p/condor/home/rust/headers/condor-admin/1000/1697.lock

Manipulating Lock Files

Lock files don't contain any information, they only exist or don't exist. If the lock file exists, RUST won't do anything to the ticket. Sometimes you'll see lock files left around by RUST (processes or machines die in the middle of doing a RUST action or something), and nothing will work on that ticket again until you remove the lock file. If, when trying to do something to a ticket, you see something like:

  wright@perdita%  statusa 1511 bug
  Try # 1 to lock...
  Try # 2 to lock...
  Try # 3 to lock...
  ...

chances are good you've got to go delete the lock file. Nothing RUST does should leave a lock around for more than a second... if you're upto try #3, you're almost certainly in a situation with a lock file leak.

The other reason you might want to manipulate a lock file is to create a lock while you edit the header file directly (which is described below). To create a lock, simply touch the appropriate file. Just remember to remove the lock when you're done editing the ticket headers.

Manipulating Header Files

The most common reason to edit the header files directly is to change who RUST thinks should get email about a given ticket, usually to remove people from the "Cc" line. Header fields are of the form:

  attribute_name::value

For example, the ticket user (the person who created the ticket) is listed in the line:

  user::email@address

The people who RUST will CC are listed in the line:

  email::email1@address1, email2@address2, etc

If no one is getting CC'ed, you'd just have a blank entry:

  email::

You may need to edit either of these fields, especially if someone CC'ed lab@cs.wisc.edu on their ticket, since having the two email robots talking to each other is a bad thing. It's possible you'd want to edit another field, but generally, there are better ways to do that (using assign*, status*, etc) than direct editing. There's no tool for manipulating the user and email fields though, so that's fairly common.

So if you ever need to change something about a ticket and there's no other good way to do it, just edit the headers file. Just BE CAREFUL. :) RUST will start to do bad things if you mess up something in there. If you want to be extra paranoid^H^H^H^H^H^H^H^Hcareful, you could lock the ticket before you edit it, just in case RUST is trying to edit it too (a new message comes in from the user, someone on the team is trying to assign the ticket to someone else, etc). Creating a ticket lock is described in the previous section.

If you don't want to be that careful (I rarely am), just be quick... i.e. don't leave a ticket header open in an emacs buffer for days on end.

Never automatically add a CS email address to the CC list

If someone is CCed on a Rust message, Rust automatically adds those addresses to the CC list (the "email" value) for the ticket. This is usually the Right Thing, but occasionally is wrong when the CCed person is a mailing list or something similar. (Also note that anyone who is a Rust user; that is who can be assigned tickets and run the Rust commands, is also exempt from this behavior.)

To avoid this for cs.wisc.edu addresses, see /p/condor/home/rust/etc/dont_make/*. Copy an existing entry like "csl" to the name of the recipient you don't wnat to email, say, "condor-world". Edit the file so the REAL_NAME is more descriptive. Check the file in ("cvs add newfilename; cvs ci newfilename") and you should be set.

Adding a Rust user

Go to /p/condor/home/rust/etc/auth, copy one of the exiting files and name it with your login name, then edit the file to update all information that's specific to you (REAL_NAME, and the comments that reference a login name). You must also add their login name to the list in: /p/condor/home/rust/etc/RUST.names. Finally, please commit any changes you make to these files into CVS, since all of the RUST configuration files are controlled by CVS. A typical sequence of commands might look like this:

  cd /p/condor/home/rust/etc/auth
  cp adesmet yourusername
  vim yourusername
  cvs add yourusername
  cd ..
  vim RUST.names
  cvs ci RUST.name auth/yourusername

condor-mm and req

Mail sent to condor-mm uses req instead of rust. To use it:

  cp /p/condor/home/req/reqrc ~/.reqrc
  qm, showm
  /p/condor/home/req/bin/gtkreq

Emergency maintenance

filter_mail

condor-admin@cs.wisc.edu is a global alias that points to condor-admin@chopin.cs.wisc.edu:

  % getalias condor-admin
  condor-admin aliased to:
          condor-admin@chopin.cs.wisc.edu

on chopin, condor-admin is locally overridden by the lab's configuration scripts:

  % getalias condor-admin
  condor-admin aliased to:
          "|/p/condor/home/rust/scripts/filter_mail"

filter_mail is a script that Nate M wrote. It steers mail into either the condor-admin, condor-support, the vdt-support queues. It also pulls out the download logs from the binary CGIs and the HTCondor pool reports (one of the data sources for the HTCondor world maps). It also runs spamassasin on the incoming mail and drops spam. It logs to /tmp/filter_mail.log on chopin.

A broken filter_mail is very bad. Rust will bounce. Always test your changes. You can test that it compiles at all by using "perl -c filter_mail"; if that fails complaining about Mail::Audit, temporarily uncomment the line "use lib 'test-lib/site-perl';". You can also try sending message to condor-admin from a non-CS address and verify that it creates a new ticket.

filter_mail is currently in CVS and should be checked in after any changes.

Open issue: How is Mail::Audit ending up in filter_mail's path when invoked by the CSL?

Breaking mail loops

A quick way to break mail loops is to change filter_mail to drop messages with a known sender, which is what I did. This way, filter_mail never invokes the RUST program, so no autoresponse is generated. See the filter_mail section for more on editing filter_mail.

Another way, which I've never done, is to set

  do_initial_autoreply=0

in /p/condor/home/rust/etc/rust_configs but that squashes everything for that queue.

(chopin's /etc/mail/aliases file is a little bit wrong, since it's got two entries for condor-mm. This could potentially be a problem after an upgrade. The correct one is an alias for the cndr-rst user, which uses procmail to decide where to send messages. cndr-rst has an AFS token, so it can insert into postgres. Currently, filter_mail runs without an AFS token, which is why /p/condor/home/rust/active is net:cs write)

RUST script setup

More recent information from epaulson

We're most of the way to having SpamAssassin in front of RUST. We've reached the point where I don't really know what I'm doing with procmail, and am not sure how much of the existing infrastructure we want to keep, so I'd like to have someone else involved with it.

Right now, there are 4 RUST queues:

Tickets, for the most part, enter RUST via email to those addresses. RUST stores all of it's data in files in AFS. Each address is really an alias to a script:

  % ssh chopin getalias condor-admin
  condor-admin aliased to:
          "|/p/condor/home/rust/scripts/filter_mail"

(condor-admin and condor-mm are locally delievered to chopin, and then chopin runs the script. condor-support and vdt-support are still on perdita.)

Because the filter_mail script is forked off right by sendmail, it runs without an AFS token. This is why we have 'net:cs write' permission on the RUST data.

To get rid of the net:cs permissions, we need to have the filter_mail script be run by a real user. The lab setup a user for us, cndr-rst, which will automatically get a new Kerberos ticket on chopin every hour.

I had the lab switch the condor-mm queue to point to the cndr-rst user. condor-admin is moved to chopin but running the old way under sendmail at the moment - by having it on chopin already, we can update the alias file in one place during the day and test it, instead of waiting for the CSL scripts to update everything at 4am and hoping for the best for a few hours.

cndr-rst has the following .procmailrc:

export LOGFILE=/scratch/cndr-rst/logs/procmaillog

:0
* ^To.*agt-acct*
| cd /u/c/n/cndr-rst/accounting-files/incoming; /s/std/bin/munpack
:0fw:/scratch/cndr-rst/logs/spamassain.lock
| /s/std/bin/runauth /afs/cs.wisc.edu/u/t/a/tannenba/public/spamass/usr/bin/spamassassin
:0w:/scratch/cndr-rst/logs/procmail.lock
* ^To.*condor-mm*
| /s/std/bin/runauth /p/condor/home/rust/scripts/condor-mm

(as an aside, Todd's spam assassin is out of date - it only scores the diploma spam a 4.8 :( )

The /p/condor/home/rust/scripts/condor-mm script dumps the message into RUST. It's a much simpler version of the script that runs condor-admin and condor-support. The condor-mm script looks for the X-Spam-Flag, and if it sees it doesn't put the message into RUST (which in turn means that there is no autoresponder)

The condor-admin/support script does a bunch of stuff to figure out which queue it's going to, and also if it's license information from the Italian mirror site or weekly updates from collectors in the wild, so we probably want to keep that script in place, and just have spam assassin sit in front of the script. It could either drop things on the floor it believes are spam, or set the X-Spam-Flag and let filter_mail kill it. I don't know which way is better.

Also, my procmailrc script will need to be a bit smarter - currently, it will not figure out that mail to condor-mm is meant for condor-mm unless it's explicitly To: condor-mm - if you CC condor-mm, it doesn't figure that out. I'm not sure that just looking for condor-mm in the header is the right thing to do, we've been burned by things like this before.

If you want to edit the cndr-rst user's stuff, it's a bit of jumping through hoops. In order to have cndr-rst always have a Kerberos ticket, we can't have know what the password is. (We could change our mind, and just stash a condor ticket for cndr-rst every 30 days, but it's better if we let something automatically generate this ticket)

condor:condor-admin has permission to edit the cndr-rust users home directory. If you want to get an actual cndr-rst token:

ksu
su - cndr-rst
(probably figure out what the KRB5CCNAME is supposed to be and set it)
k52token
source ~/.cshrc

or

ksu
su - cndr-rst
/s/std/bin/runauth /bin/tcsh