BOSCO Smoke Test

How to run a basic smoke test of the remote batch submission via ssh (nee BOSCO). This is one of the steps for each new release of HTCondor.

You need two machines: the submit machine (containing the new HTCondor release being tested) and the batch system machine (containing SLURM or similar batch system). The submit machine must be able to ssh to the batch system machine with an ssh key and no Multi-Factor Authorization. In CHTC, we have configured this to be possible going from submit1.chtc.wisc.edu to hpclogin3.chtc.wisc.edu. (Note that the initial setup will require Duo one time, but further logins with ssh key should not.) We will use these machine names in the following instructions.

First, login to the batch system machine and clean out any previous BOSCO installation.

batch% rm -rf ~/bosco
batch%

Next, login to the submit machine, copy the release tarball, and setup a personal HTCondor.

submit% scp moria.cs.wisc.edu:/p/condor/public/html/htcondor/tarball/10/10.x/10.4.2/rc/condor-10.4.2-x86_64_CentOS7-stripped.tar.gz .
condor-10.4.2-x86_64_CentOS7-stripped.tar.gz  100%   18MB  18.8MB/s   00:00
submit% tar xf condor-10.4.2-x86_64_CentOS7-stripped.tar.gz
submit% cd condor-10.4.2-0.642084-x86_64_CentOS7-stripped
submit% ./bin/make-personal-from-tarball
submit% . condor.sh
submit% condor_master
submit% cd ..
submit%

Next, install the remote submission files onto the batch system machine from the submit machine.

submit% condor_remote_cluster --add hpclogin3.chtc.wisc.edu slurm
Cluster hpclogin3.chtc.wisc.edu already installed
Reinstalling on hpclogin3.chtc.wisc.edu
Enter the password to copy the ssh keys to hpclogin3.chtc.wisc.edu:
Password:
Duo two-factor login for jamesfrey

Enter a passcode or select one of the following options:

 1. Duo Push to XXX-XXX-7614

Passcode or option (1-1): 1
Downloading rc build for hpclogin3.chtc.wisc.edu.
Unpackingtar: condor-10.4.2-0.642084-x86_64_AlmaLinux8-stripped/etc/condor/config.d/10-stash-plugin.conf: implausibly old time stamp 1969-12-31 18:00:00
tar: condor-10.4.2-0.642084-x86_64_AlmaLinux8-stripped/usr/share/doc/condor-stash-plugin-6.10.0/README.md: implausibly old time stamp 1969-12-31 18:00:00
tar: condor-10.4.2-0.642084-x86_64_AlmaLinux8-stripped/usr/share/doc/condor-stash-plugin-6.10.0/LICENSE.txt: implausibly old time stamp 1969-12-31 18:00:00
.tar: condor-10.4.2-0.642084-x86_64_AlmaLinux8-stripped/usr/libexec/condor/stash_plugin: implausibly old time stamp 1969-12-31 18:00:00

Installing on cluster hpclogin3.chtc.wisc.edu
Installation complete
The cluster hpclogin3.chtc.wisc.edu has been added for remote submission
It is available to run jobs submitted with the following values:
> universe = grid
> grid_resource = batch slurm hpclogin3.chtc.wisc.edu
submit%

Now, make a simple submit description file and job test script.

submit% cat >job.sh
#!/bin/bash
/bin/date
/bin/hostname
sleep $1
echo Goodbye
submit% chmod a+rx job.sh
submit% cat >bosco.desc
universe = grid
grid_resource = batch slurm hpclogin3.chtc.wisc.edu
executable = job.sh
arguments = 60
log = job.log
output = job.out
error = job.err
queue 1
submit%

Now, submit your test job.

submit% condor_submit bosco.desc
Submitting job(s).
1 job(s) submitted to cluster 1.
submit%

Assuming you job finishes successfully, verify the output.

submit% cat job.out
Tue May  2 08:47:33 CDT 2023
spark-a061
Goodbye
submit%

Finally, shutdown and remove your Personal HTCondor.

submit% condor_off -master
Sent "Kill-Daemon" command for "master" to local master
submit% rm -rf condor-10.4.2-0.642084-x86_64_CentOS7-stripped
submit%

You're done!