HTCondorWiki: Continuous Build And Tests In Nmi

Page History

Overview

In addition to our nightly build and test we run a "continuous" build and test on a subset of platforms. The goal of these builds:

Provide quick feedback if a commit breaks the builds or tests (without waiting until the next day to view the nightly results)
Provide historical coverage to help trace back and find which commit broke tests/builds. It is over hard to track backwards from a broken test to the commit which broke the test. The more frequent the tests are, the smaller the commit windows is to find the guilty commit.
Help spot race conditions in tests that fail sporadically, especially when subsequent runs of the test produce different results against the same code
Eventually the desire is to run a build and test for each commit to Git. We don't currently have the CPU cycles available, largely because the tests take so long to run.

Current implementation

The continuous runs are submitted from nmi-s006.cs.wisc.edu in the /home/cndrauto/continuous/ directory. A single crondor job exists for each platform. This job simply calls condor_nmi_submit to do the dirty work. Occasionally, someone will condor_rm cndrauto, forgetting about these jobs. When that happens, the crondor job will get removed too, removing the continuous builds. The crondor schedule is set up in the submit files to run about once an hour, and we try to bias the hours to ones where developers are most likely to be committing. Note that condor_nmi_submit sets the JobPrio such that the most recently submitted build (and test) has the highest priority. Because tests are slow, they usually fall behind in the course of a day, but the most recently submitted onces are started first. The goal is to try to set the schedule so that all the tests "catch up" overnight. Even if it takes overnight for a test run to finish, it can still be very valuable. This is because it is over hard to track backwards from a broken test to the commit which broke the test. The more frequent the tests are, the smaller the commit windows is to find the guilty commit, even if it takes a long time for the test to finally finish.

Add a new continuous run

Login to nmi-s006 and sudo to the cndrauto user
Copy the template directory from Git. It is in the CONDOR_SRC/nmi_tools/continuous directory. Put the directory into /home/cndrauto/continuous/PLATFORM where PLATFORM is the name of the platform you are testing against. It should match the NMI platform name, e.g. x86_64_rhap_5
Replace "PLATFORM" in the files run.sh and submit
Submit the Condor-Cron job: condor_submit submit

Modify a continuous run

Login to nmi-s006 and switch to the cndrauto user
Stop the appropriate job using condor_rm. You can run condor_q | grep run.sh to figure out which job to remove.
Modify run.sh or submit as appropriate and then re-submit to condor: condor_submit submit