From Kerry Creeron at the Laboratory for Molecular and Computational Genomics here at UW:
I got R to successfully run on HTCondor. Here are my results:
First off, Nathan gave me the suggestion to use the 'tar' command to put together the R binary executable and all the libraries needed to run your program together in your file, and then to use a shell script to untar the archive on the remote machine. I extended the concept further to preserve the directory structure of the input files that you used, as well.
Secondly, when invoking R, I use
R --vanilla < inputfile.r
to run the program. Apparently R doesn't support standard
R arg1 arg2 arg3
1. I needed to compile the R binary from source. For whatever reason, simply
taking the binary from the /usr/bin/
directory did not work. I used version
2.11.1, because it appears to be the latest that our version of CentoOS
supports, though there isn't necessarily any reason that a newer version of
R couldn't be used. I also chose R version 2.11.1 because in order to get
MCMCpack, one of the libraries that your program requires, to install using
CRAN (similar to CPAN on perl), a version of R at least2.10 was needed.
2. Next, I turned to getting the libraries to work. I initially investigated
using environment variables like the R_LIBS_USER
to set where the remote
computer is supposed to look for libraries, but that did not work. So,
instead, I chose to insert the following line into the R program to change
the default library search directories to look to the current directory
first:
.libPaths(c(".","/usr/lib64/R/library","/usr/share/R/library"))
test.r
.
3. Next, you'll need to make sure that all the libraries are included in the
tar file. The file tarcmd.txt
as follows:
tar -cf r.tar ./R -C /usr/lib64/R/library/ MASS/ coda/ MCMCpack/ lattice/ -C /exports/scratch/<user>/04-2010/ ch01/refsigs/sig104_A.txt ch01/sigsnp/sigint104_A.txt
4. The next issue was that the output directories don't exist on the remote
machine, so I included in r.sh
(the shell script) a few mkdir -p
commands to
create the output directories. The output files don't actually have to exist
on the remote machine---just the directories.
Just a side note: <user>
, your R files currently have all the file paths
specified with back slashes ("\\"). On Linux, this will not work. The
filenames will need forward slashes instead.
5. The output files aren't currently being transferred from the remote machine back to the submit machine. What needs to be done is to add the output files to the job file so that they are transferred when the job terminates. I will insert this code when I get the chance.
6. The job file also needs to have a constraint that requires that the machine is 64-bit. I tested the R binary on a 32-bit machine and it didn't work, so I assume that's why.
7. All told, running the job took about 5 minutes on the remote machine. So,
I don't know how that compares to what you've tried on <machine>
.
Kerry