HTCondorWiki: Reproducible Builds

 {section: Goal}
 
-Any given build done via NMI should be reproducible for the foreseeable future.  This includes having all of the necessary input files and build scripts and being able to regenerate the resulting output files.  NMI should be able to do so even if the upstream provider of a given package disappears.
+If the Condor Project were to disappear, Condor's work, in the form of buildable software, should survive.  NMI should have all of the input files necessary to rebuild a given release, independent of the existence of upstream Condor.
+
+It is not necessary to be able to reproduce a given build exactly.  In particular, the platforms that Condor was built on may no longer be part of the NMI pool
 
 {section: Current Situation}
 
@@ -8,16 +10,28 @@
 
 The Condor source and documentation are pulled together on an NMI submit node.  The externals are pulled on each individual build machine on an as needed basis, completely outside of NMI management.  If the externals cache is being used, the input file will never be loaded.
 
-Because each machine pulls together externals on its own, it's hard to say canonically how much disk space a given set of inputs takes.  The Condor source is about 20MB while the documentation is about 1MB.  The externals, excluding glibc, are about 107MB; a given build may not use all of the externals, but different builds typically share externals.  glibc is complex, only platforms supporting the standard universe include it, and different platforms may use different glibc.  A given glibc source is about 163MB, and we might use 2 or 3 across all of our builds.  This gives a total size of about 20+1+107+(163*3)= 617MB.  These are all compressed tarballs, further compression is unlikely to provide significant savings.
+Because each machine pulls together externals on its own, it's hard to say canonically how much disk space a given set of inputs takes.  The Condor source is about 20MB while the documentation is about 1MB.  The externals, excluding glibc, are about 107MB; a given build may not use all of the externals, but different builds typically share externals.  glibc is complex, only platforms supporting the standard universe include it, and different platforms may use different glibc.  A given glibc source is about 163MB, and we might use 2 or 3 across all of our builds.  This gives a total size of about 20+1+107+(163*3)= 617MB.  These are all compressed tarballs, further compression is unlikely to provide significant savings.  Assuming we archive two builds per night (one stable, one developer), we're looking at about 440GB per year of storage.  Assuming a single backup, we're close to needing 1TB per year.  These numbers ignore developer builds.
+
+{section: Open Questions}
+
+*: Do we archive every build, or just "important" builds?
+
+*:: If we're not archiving every build, do we use the archive system always (for consistency), but mark non-important builds for deletion "soon."
 
-{section: Thoughts}
+*: Do we move old archives to offline or otherwise "slow" storage?
 
-*: Do we archive every build, or just "important" builds.  Archiving every build seems ideal, but will necessitate a significant disk investment as a single build could easily have 617MB of input.
+*: Do we maintain backups?
 
-*: Do we need to consider moving old archives to offline or otherwise "slow" storage?
+*: Condor's full(ish) revision history is available via Git.  Should an archive contain the entire repository?  Chews up more space, but likely more useful in event of upstream disappearing.  Something clever like a shared Git repo would theoretically work, but ignores risk that upstream repo would be corrupted, damaging our past copies.
+
+{section: Complications}
 
 *:Condor pulls externals directly onto build machines, completely outside of NMI's knowledge.  Logically this should be moved onto the NMI submit machine, where it can be archived easily and where duplicates can be merged.  However, this increases disk and network utilization of the submit nodes, nodes that historically have been overloaded.
 
-*:Input files will typically be compressed tarballs, so there isn't much advantage to futher trying to package them up.  It's easier to just give each archive its own directory.
+{section: Plans}
+
+*:Input files will typically be compressed tarballs, so there isn't much advantage to further trying to package them up.  It's easier to just give each archive its own directory.
+
+*:If we are archiving a build, we should do the build out of the archive, identically to if we were trying to rebuild from the archive in the future.  This ensures that the archive works and simplifies the code as we always follow the same code path.  This will increase disk I/O on whatever machine it is done on, likely the already heavily loaded NMI submit node.
 
-*:If we are archiving a build, we should do the build out of the archive, identically to if we were trying to rebuild from the archive in the future.  This ensures that the archive works and simplifies the code as we always follow the same code path.
+*:Proposal: Add a new machine: "Packager" which makes read-only disk space available.  A submitted job is done on the packager.  The packager collects all of the input into the archive location, then submits the "real" jobs the NMI submit point.  Initially the packager would expose the archive location as NFS or similar to the submit point, otherwise looking like a "normal" build.  This needlessly routes data through the submit node, but should work without Metronome changes. As Condor's capability to draw files from web or other inputs grows, individual execute nodes could pull files directly from the packager.