HTCondorWiki: Test Add Howto

About the test suite

The HTCondor test suite lives in src/condor_tests

There's no toplevel Makefile (src/Imakefile) rule that builds and runs all of this.To build and run the test suite see section above. We will get into variations later.

The Imake rules know on each platform what compilers exist. For each compiler it finds on a platform, it creates a new directory underneath condor_tests that is the name of that compiler, and creates basically a symlink and copy image of everything else in condor_tests for that compiler. Then it goes through each compiler subdirectory and builds each test with that compiler, usually with the syscall library and a vanilla version of that test. Last it does any building of non-Standard Universe test in the top level.

batch_test.pl by default assumes you want to test with daemons found in your path and assumes they are currently running with your current configurations. This allows having a daemon in a debugger during the testing if needed.

batch_test.pl knows how to create a new test personal HTCondor from the binaries in your path using slightly modified generic configuration files. It creates the directory condor_tests/TestingPersonalCondor and sets up to run there. Sometimes to restart tests you may need to remove this directory so it knows to restart the personal HTCondor.

batchtest.pl runs tests serially one after another by default.

Variations within batch_test.pl

To run the entire test suite with your existing daemons:

./batch_test.pl

To set up the environment(including the personal HTCondor) and then run the entire suite:

./batch_test.pl -b

To set up a test environment in condor_tests/TestingPersonalCondor and run the quick class:

./batch_test.pl -f list_quick -b

batch_test.pl takes multiple arguments:

  -b <build&test>  Use generic config files from HTCondor examples
                   and start a personal HTCondor.
  -d <directory>   Run the tests in this directory, ie "-d gcc/"
  -s <filename>    Skip all the tests listed in <filename>
  -f <filename>    Run all of the tests listed in <filename>
  -a <count>       Run each test this many times
  -t <test>        Run just this test
  -p <pretest>     Just set up the testing personal HTCondor

Run all the top level tests( standard way for windows ) with a pristine personal HTCondor.

./batch_test.pl -d . -b

Run just one test and then do the same but run it five times.

./batch_test.pl -d . -t job_condorc_ab_van ./batch_test.pl -d . -t job_condorc_ab_van -a 5

The files that -s and -f are one test per line of things to run or skip.

wrapped tests(All are wrapped since 7.9.5)

We wrap tests because we want to change the environment(config settings) used while this test is running without impacting the other tests which are either running now or will run yet in the main personal HTCondor. It also ensures that there will be a log directory showing only what is going on during this test. With many tests running in the starting personal HTCondor it can be hard to see what is going on. All tests are wrapped within the HKCondor perl test module. To change the HTCondor config, wrap the test yourself.

Write the test to start a personal HTCondor and then run it's test

The most basic test done in a personal HTcondor is "example_personal_condor_test".

General expectations

"Tests" are perl scripts that end in .run. batch_test.pl takes the list of tests it's supposed to run, forks them off. It sets a timer of 18 hours that the test must complete in or it will automatically exit. stdout of the test is redirected to <testprogram>.out (and testprogram includes the .run, but not the compiler - that's implied by the directory it's running in) As each test exits, batch_test.pl collects the return code and decides if the test was successful or not. If a test exits with status 0, it was "successful". Anything else is considered a failure.

batch_test.pl prints to stdout a list of all the the tests it is running, and if they passed or succeeded. It looks like this:

  submitting gcc tests.......
  no_supp            succeeded
  loop               succeeded
  bigenv-scheduler   succeeded
  syscalltester      succeeded
  coredump           succeeded
  printer            succeeded
  big                succeeded

  submitting g++ tests........
  no_supp            succeeded
  loop               succeeded
  bigenv-scheduler   succeeded
  lseek              (job never checkpointed)
  syscalltester      succeeded
  coredump           succeeded
  printer            succeeded
  big                succeeded

  14 successful, 1 failed

The exit status is the number of tests that failed. If everything passed, it exits with 0. Additionally, it writes two files, successful_tests and failed_tests, which contain a list of compiler/testname of tests that worked and didn't.

A test can write a failure reason that batch_test.pl will display by including in it's stdout a line that looks like FAILURE (failure reason)

Adding a new test

READ the README files! The naming is supposed to be concise by rules in them. This helps everyone understand what the test is doing. Additionally understand classes! The default class run by batch_test.pl is list_quick. On windows the default list of tests run comes from the file Windows_list.

To add a new test to the test suite, you have to do three steps:

Write the code for the test program itself
Decide how your test should be built
Write a .run file that can run your test and collect the output.

Actually writing your test code is very easy. The first thing to think about is what are you trying to test. Most of the compiler directory tests in the current test suite are designed to test the remote system calls or checkpointing of HTCondor. There is a file ?testmethods.txt? in condor_tests that explains what many of the tests do.

Standard Universe compiler directory highlights: syscall_tester calls a number of generic POSIX functions with both valid and invalid arguments, and makes sure that it gets sane answers back. It's designed to test the HTCondor syscall library. floats does some floating point calculations, forces itself to checkpoint, and outputs the state of the floating point registers after the checkpoint, to ensure that we can safely checkpoint floating point registers. Most of the top level tests check other aspects of HTCondor?s behavior and are named to point at what they are testing.

There are very few restrictions on what your program has to do or how it has to be written. It's polite to have it run in a fairly short period, and to be deterministic. If you are going to use the standard universe, you should be aware of the ckpt_and_exit() call - you can directly force your program to save its state and exit. If your program is running under HTCondor, it will go back into the queue and be resubmitted, and it will be restarted as though it had just returned from the ckpt_and_exit() call.

Once you've written your program, you have to decide how it is built and where to place it. Most non-Standard Universe tests get added to Imakefile. Standard Universe tests get added to the CImakefile, CPPImakefile and FImakefile - in the test suite. The CImakefile(and others) are what actually build your programs. The Imakefile in src/condor_tests is used for building the testing infrastructure (picking out compilers, etc)and for most of the tests. For each compiler subdirectory, we create a symlink to the CImakefile, and treat the CImakefile as the Imakefile for the compiler subdirectory. condor_imake is invoked on it, so all of our standard Imake rules are available. The most-used rule is

BUILD( $(CC), <testname>, c )

That rule will build testname (which ends in .c) for every compiler on that platform (usually the vendor C C++ compilers, and the GNU C and C++ compilers) It will build two versions of your program for each compiler - a testname.remote and a testname.vanilla, which are standard and vanilla universe jobs, respectively. It will also by default place a symlink to the .run file for that test, and the .cmd file that is used to submit that job to HTCondor. If you specifically know that your test will not run under the standard universe, you can build it with BUILD_SPECIFIC_VANILLA, which will not try and relink it with the syscall library.

If your test program is a Perl script, or something that does not need to be "built", you can use all_symlink_target() to get it represented into the compiler subdirectories so batch_test.pl will run it.

Now that you've got your test program represented with a binary and a .run file, it's time to run your test. The first thing to remember is that the .run file is a perl script. It's responsible for running your test and deciding if it passed or failed - it technically does not have to use HTCondor for any of that, and we could put unit tests into batch_test.pl that simply run. The only strange thing is that they some have to appear in a "compiler" subdirectory. However, nearly all tests are going to be run as HTCondor jobs (which is why the default BUILD rule symlinks a <testname>.cmd file for you in the compiler subdirectory).

Because most of the tests are going to be submitting and running HTCondor jobs, they generally use the CondorTest.pm Perl module. The CondorTest perl module in turn builds on the Condor.pm perl module. The Condor.pm module provides a a Perl interface to submitting HTCondor jobs and receiving callbacks when events happen. (It monitors the UserLog file the job submit file specifies to discover this information.) CondorTest.pm adds several new functions on top of Condor.pm, mostly for managing expected output. Again, a .run file is just a Perl script, and as such does not have to use the HTCondor perl modules or even use HTCondor at all.

Sample Test

Here is an example .run file, for dirtest.c - dirtest.c is designed to check that the dirent() functions still work, so it creates a couple of directories and looks at them:

  #!/usr/bin/env perl

  use CondorTest;

  $cmd = 'dirtest.cmd';
  $testname = 'getdirentries Test';

  @CondorTest::expected_output = ( 'ENTRY: .',
  'ENTRY: ..',
  'ENTRY: first',
  'ENTRY: second',
  'ENTRY: third',
  'completed successfully' );

  CondorTest::RegisterExitedSuccess( $testname,
                                   \&CondorTest::DefaultOutputTest );

  if( CondorTest::RunTest($testname, $cmd, 0) ) {
    print "$testname: SUCCESS\n";
    exit(0);
  } else {
    die "$testname: CondorTest::RunTest() failed\n";
  }

Sample test walkthrough

The first thing to notice is the @CondorTest::expected_output array - this is the text that dirtest.c will output, and that we should be looking for. If there are lines that your test will produce that aren't invariant (say you print out the time of day at the beginning of your test) you can register certain output lines as "lines to be skipped":

@CondorTest::skipped_output_lines = ( 4, 5, 6);

means don't look at lines 4, 5, and 6.

CondorTest::RegisterExitedSuccess means to "Call this function HTCondor says the job has exited and left the queue." The CondorTest::DefaultOutputTest is a predefined function that compares the text from expected_output_array to the stdout that your job produced and ensures that they're the same. It will also complain if there is data in the stderr file. Your test is free to define it's own function to be called - on tests that don't care to check the output, a common trick is to say:

CondorTest::RegisterExitedSuccess( $testname, sub { return 1 } );

RegisterExitedSuccess has a friend, CondorTest::RegisterExitedAbnormal(), which will provide a callback when a job exits with some error condition. See coredump.run for an example. There are also other callbacks that are available through the CondorTest interface - see the source for CondorTest.pm and look in the function RunTest to see what all you can register.

CondorTest::RunTest ultimately submits the job to HTCondor, and blocks until the job has left the queue. The arguments are the name of the test, the submit file to use, and whether or not HTCondor should try and force a checkpoint. If you can't use ckpt_and_exit, but you need to checkpoint for your test to be valid, set this flag to be 1. However, realize that your job will need to run for "long enough" to make sure that:

The Condor.pm module sees that your job is executing
It has a chance to send the 'condor_vacate' to the remote machine
The remote machine has a chance to get the 'condor_vacate' command, and decide that you're authorized to send a 'condor_vacate' to it
The remote machine has a chance to actually send the SIGUSR2 to the job

Oftentimes, the job will finish before it successfully checkpoints. This sort of testing in a distributed system is hard.

Added tidbits:

There are many examples to look at in src/condor_tests/Imakefile. We also have a perl module called CondorPersonal.pm which allows us to setup tests which require a modified environment so as to not change the environment of the personal HTCondor running all of the tests. It allows us to emulate most pool configurations. Look at the job_flocking_to, job_condorc_ab_van and cmd_status_shows-avail for more complicated examples.

Setting up for testing on windows

Install windows msi
Add path to HTCondor binaries(bin and sbin) to PATH env variable
Install cygwin with cygwin perl
Install Active perl
Bring down sources for condor_examples, condor_scripts and condor_tests to c:\
Cp *.pm and batch_test.pl from condor_scripts to condor_tests
Cd to condor_tests
CONDOR_HOST = 127.0.0.1
NETWORK_INITERFACE = 127.0.0.1
If you are using a personal HTCondor in your path, then chmod go+rX <bindir and contentsl> so perl's -x works like how you expect.
./batch_test.pl -d . -b

Controlling Windows Testing

Windows testing uses the tests in the file "Windows_list" to determine what it tests when you do the above which will run all the current approved Window's tests.

For NMI, you can do a shorter run of tests. These tests are selected from the file Windows_shortlist.

Control is done by adding --test-args=short to the call to condor_nmi_submit.