An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
79<br />
preparation should further induce tolerance to employ the resource sequencing less sparingly.<br />
While most current studies due to economical considerations rely on no more than two replicate<br />
experiments, gathering extensive experimental evidence in the form of many replicate data sets<br />
could become st<strong>and</strong>ard in the future. Combined, these factors may entail improved statistical<br />
power from quantitative sequencing experiments, while at the same time facilitating statistical<br />
modeling of the sequencing <strong>and</strong> sample preparation process.<br />
To this eect, the fundamental approaches to quantitative sequencing data analysis should<br />
require rather gradual modication compared to genome structure <strong>and</strong> variation analysis. Reference<br />
sequence-based analysis as implemented in our modules SHORE peak or SHORE count will<br />
probably remain the preferred strategy in the near future, although it may become more commonplace<br />
to generate appropriate reference genomes de-novo along with the quantitative data. A<br />
medium-term goal might be complementing the available algorithms for quantitative data with<br />
methods that allow to better leverage the statistical power acquired by increasing the number of<br />
replicate experiments. Short-term improvements could possibly be achieved by investing in more<br />
sophisticated models <strong>and</strong> algorithms to capture <strong>and</strong> estimate an increasing number of parameters<br />
of <strong>lib</strong>rary preparation, sequencing <strong>and</strong> data analysis. However, the characteristics of current<br />
data sets may be the result of superimposition <strong>and</strong> complex interplay of a variety of eects, <strong>and</strong><br />
are furthermore subject to constant change as protocols <strong>and</strong> sequencing chemistry evolve.<br />
With progress with respect to sequencing read length, sequence assembly <strong>and</strong> automated<br />
genome annotation algorithms, reference-based variation detection as implemented by<br />
SHORE consensus or SHORE qVar modules could be superseded by fundamentally dierent strategies.<br />
However, SHORE <strong>and</strong> <strong>lib</strong>shore may still constitute a valuable framework for implementation<br />
of such future methods. While increasing read length should automatically resolve many of<br />
the ambiguities that current analysis algorithms are confronted with, the biological questions<br />
to address by future algorithms are likely to gain complexity. They should therefore pose<br />
computationally intensive tasks that benet from application programming frameworks oriented<br />
towards the ecient processing of large amounts of biological sequence data.<br />
While the focus of this work was on the technical <strong>and</strong> algorithmic aspects of obtaining primary<br />
analysis results from various types of high-throughput DNA sequencing data set, a future<br />
challenge will lie with moving from elucidation of isolated properties of the cellular mechanism<br />
to building consistent theoretical frameworks through integration of the data made accessible by<br />
dierent sequencing applications <strong>and</strong> further complementary technologies.