How to make VMware jobs run

  1. Have a script in the correct run level linked to one in /etc/init.d which will do your job if it is present but allow the machine to come up for adjustments if not. (sample scripts below)

  2. Have a second disk configured in your VM which is used to place input data and extract output data

  3. Have patience as the job happens in a black box.

In order for this all to work you must have a VM which is rich enough to have "mkfs" which has the "-t vfat" option to make a windows vat32 partition.

Given that you have such a second disk created. The trick now is to use a combination of gemu-img and mtools to build a disk you can mount from within the VM and based on it's contents, run the job or simply bring up the VM for inspection.

Qemu-img can be downloaded from "http://www.qemu.org/download.html".

Mtools can be downloaded from "http://www.gnu.org/software/mtools/download.html".

Once you have a VmWare raw disk you can use mtools on it to add things to the disk or pull things from it. However note that every use of the disk makes it larger so you should retain a pristine copy of the original empty disk.

Mtools requires an rc file to define drives. A sample is here:

drive n: file="/home/bt/SOAR/sources/sscc_cbf/v1/code/DSLstata-0.raw" offset=32256 mtools_skip_check=1

One places such information into a file which you point to with the environment variable "MTOOLSRC".

When HTCondor completes a VmWare job its returns all the files of the VM to the run directory allowing one to restart the VM and look inside. However if you use the procedures here, you will want to copy in an empty second disk or it will startup and run the job again or error as it did the last time depending where you are in creating your job within the VM.

A simple submit file for HTCondor which has the VmWare files in a directory called vmware_dir looks like this and note that the executable is simply the name of the job that condor_q uses to display it in the queue:

  universe = vm
  vm_type = vmware
  vmware_dir = vmware_dir
  executable = sscc_cbf_vm
  log = sscc_cbf_vm.log
  vmware_should_transfer_files = YES
  vmware_snapshot_disk = false
  vm_memory = 1000
  notification = never
  queue

Below are some Perl functions to use the above tools:

  #####################################################
  #
  #   VM Universe code for VmWare
  #
  #   Initializing the VM for a job and accessing the
  #   results are done by including a suitable second
  #   VmWare disk. This disk is created for dynamic
  #   allocation and created in a VM which can use fdisk
  #   to create a fat32 partition and must have an
  #   mkfs which allows a "-t vfat" to complete creation
  #   of the disk.
  #
  #   Once one has this disk it can be used as a template
  #   second drive for any VmWare VM. Note keep the core
  #   empty disk and copy it when needed as it gets
  #   larger with every use.
  #
  #   We rely on "mtools" and "qemu" for the rest.
  #   "qemu-img" is used to convert the disk back and forth
  #   between vmdk and raw formats. "mcopy, mdir etc"
  #   are used to place and fetch files off the raw
  #   disk.
  #
  #####################################################

  sub VMwareAddFile
  {
      my $file = shift;
      runcmd("mcopy $file n:");
      runcmd("mdir n:");
  }

  sub VMwareFetchByPattern
  {
      my $pattern = shift;
      runcmd("mcopy n:\*$pattern* .");
      runcmd("mdir n:");
  }

  sub VMwareRemakeDataDisk
  {
      my $data = shift;
      my $raw = shift;

      my $cmd = "qemu-img convert -f raw $raw -O vmdk $data";
      runcmd($cmd);
  }

  sub VMwareCreateRawDisk
  {
      my $empty = shift;
      my $raw = shift;

      my $cmd = "qemu-img convert -f vmdk $empty -O raw $raw";
      runcmd($cmd);
  }

  sub VMwareCreateMTOOLSRC
  {
      my $rawdisk = shift;
      my $top = getcwd();
      my $baseentry = "drive n: file=\"$top/$rawdisk\" offset=32256 mtools_skip_check=1";
      print "Making MTOOLSRC here<$top>\n";
      open(MTRC,">$top/mtoolsrc") or die "Can not open mtoolsrc in <$top>:$!\n";
      print MTRC "$baseentry\n";
      close(MTRC);
      $ENV{MTOOLSRC} = "$top/mtoolsrc";
      runcmd("mdir n:");
  }

Below is a suitable script for mounting a second disk at the right run level and deciding if a job should be run or the VM simply brought up for changes or inspection.

  #!/bin/sh

  PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/stata

  case "$1" in
    start)
          echo "Testing VM Job Status"
          mount /dev/hdb1 /vmjob
          ls /vmjob
          if test -f "/vmjob/statajob"
          then
                  /vmjob/runstatajob.pl /vmjob statajob
                  shutdown -h now
          else
                  echo "Bring up for non-vm work"
                  sleep 20
          fi
          echo "."
          ;;
  esac

  exit 0

Here is a perl script which mounts the second disk and looks for a file holding a list of Stata jobs to run.

  #!/usr/bin/env perl

  my $dir = $ARGV[0];
  my $file = $ARGV[1];

  chdir("/vmjob");
  my $dbinstalllog =  "stataVM.LOG";
  print "Trying to open logfile... $dbinstalllog\n";
  open(OLDOUT, ">&STDOUT");
  open(OLDERR, ">&STDERR");
  open(STDOUT, ">>$dbinstalllog") or die "Could not open $dbinstalllog: $!";
  open(STDERR, ">&STDOUT");
  select(STDERR);
   $| = 1;
  select(STDOUT);
   $| = 1;

  print "runstatajob.pl called with dir<$dir> file<$file>\n";

  system("mkdir /tmp/vmjob");
  system("cp * /tmp/vmjob");
  #chdir($dir);
  chdir("/tmp/vmjob");

  print "Lets look at disk availability\n";
  system("pwd");
  system("df -h");
  system("ls -l");
  system("which stata-se");
  print "Do the following job:\n";
  print "*****************************\n\n";
  system("cat $file");
  print "*****************************\n\n";

  # ado directory has to go in a stata "sysdir"
  # one such location is /usr/local/stata

  # extract out ado.tar.gz
  system("tar -zxvf ado.tar.gz");
  system("chmod -R 777 ado");
  system("cd ado");
  system("mkdir /usr/local/ado");
  # place in one of many sysdir directories
  # visible by typing "sysdir" at the stata prompt
  system("cp -r plus/* /usr/local/ado");
  system("cp -r personal/* /usr/local/ado");
  system("cd /tmp/vmjob");

  open(JOB,"<$file") or die "Can not open <$file>:$!\n";
  my $line = "";
  while(<JOB>) {
          chomp();
          $line = $_;
          print "Running stata on $line\n";
          system("stata-se -b $line");
          print "done\n";
          system("ls");
  }
  close(JOB);
  print "Copy results back to /vmjob\n";
  system("cp *.log /vmjob");