Lazy programmers often prefer to substitute computing effort for programming effort. I am just such a programmer. For my research, I often need to design and run algorithms over large datasets ranging into the scale of terabytes. As a fellow at the NIH, I have access to Biowulf, a 100,000+ processor cluster, so it's usually not worth spending a ton of time optimizing single-threaded performance for a single experiment when I can just perform a big MapReduce.
Last year, we announced the creation of Squiggle, an algorithm for intuitive DNA sequence visualization and analysis along with an accompanying software package to enable interactive exploration of DNA sequences. This works well for small sequences, but runs into performance issues when the length of and number of DNA sequences increases. Furthermore, users must have the software installed on their computer, which may not be optimal for analyzing the sequences in parallel.
In the course of our genomics research at Lab41, we often look at raw DNA sequences. Unfortunately, DNA sequences don’t come to us looking like a double helix.