Kent Vander Velden, Freelance Software-Hardware Creator

A recent PLoS Comp. Bio. article (W.S. Noble, PLoS Comp. Bio. V5(7)) contains one person's suggestions for organizing computational projects. I particularly enjoyed the sections on experiment structure and on scripts.

Many of the scripting goals discussed in the article might be accomplished by using makefiles to drive analyses. Makefile rules could implement building block operations, and dependencies be used to ensure all steps are updated if the data are changed. Also, a properly crafted Makefile may enable "easy" parallelization. E.g. the GNU implementation of make allows parallel build operations (make -j) where dependencies allow it.

Also, I got to wondering how virtualization might be used for maintaining snapshots of an experiment. For example, a delta VM of an existing base VM could be created at the initiation of an experiment. Changes (creation/modification of files, etc.) is all that would be stored in the delta VM. Subsequent changes to state would cause a branching from the base VM. Using a VM allows one to return to an experiment, with the entire "machine" in the same state as it was when the experiment was originally performed. Contextual changes can be optionally brought forward, including updated datasets and changes in the OS. One also gains the ability to migrate long running experiments in the face of required system maintenance. These benefits are gained at the cost of minimal extra space and minimal performance loss.

If you've tried either of these approaches, or have others suggestions for organizing computational experiments, please share your experiences.

Kent Vander Velden

Freelance Analytic Creator

KvvCreates

Organizing computational experiments