28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

79<br />

preparation should further induce tolerance to employ the resource sequencing less sparingly.<br />

While most current studies due to economical considerations rely on no more than two replicate<br />

experiments, gathering extensive experimental evidence in the form of many replicate data sets<br />

could become st<strong>and</strong>ard in the future. Combined, these factors may entail improved statistical<br />

power from quantitative sequencing experiments, while at the same time facilitating statistical<br />

modeling of the sequencing <strong>and</strong> sample preparation process.<br />

To this eect, the fundamental approaches to quantitative sequencing data analysis should<br />

require rather gradual modication compared to genome structure <strong>and</strong> variation analysis. Reference<br />

sequence-based analysis as implemented in our modules SHORE peak or SHORE count will<br />

probably remain the preferred strategy in the near future, although it may become more commonplace<br />

to generate appropriate reference genomes de-novo along with the quantitative data. A<br />

medium-term goal might be complementing the available algorithms for quantitative data with<br />

methods that allow to better leverage the statistical power acquired by increasing the number of<br />

replicate experiments. Short-term improvements could possibly be achieved by investing in more<br />

sophisticated models <strong>and</strong> algorithms to capture <strong>and</strong> estimate an increasing number of parameters<br />

of <strong>lib</strong>rary preparation, sequencing <strong>and</strong> data analysis. However, the characteristics of current<br />

data sets may be the result of superimposition <strong>and</strong> complex interplay of a variety of eects, <strong>and</strong><br />

are furthermore subject to constant change as protocols <strong>and</strong> sequencing chemistry evolve.<br />

With progress with respect to sequencing read length, sequence assembly <strong>and</strong> automated<br />

genome annotation algorithms, reference-based variation detection as implemented by<br />

SHORE consensus or SHORE qVar modules could be superseded by fundamentally dierent strategies.<br />

However, SHORE <strong>and</strong> <strong>lib</strong>shore may still constitute a valuable framework for implementation<br />

of such future methods. While increasing read length should automatically resolve many of<br />

the ambiguities that current analysis algorithms are confronted with, the biological questions<br />

to address by future algorithms are likely to gain complexity. They should therefore pose<br />

computationally intensive tasks that benet from application programming frameworks oriented<br />

towards the ecient processing of large amounts of biological sequence data.<br />

While the focus of this work was on the technical <strong>and</strong> algorithmic aspects of obtaining primary<br />

analysis results from various types of high-throughput DNA sequencing data set, a future<br />

challenge will lie with moving from elucidation of isolated properties of the cellular mechanism<br />

to building consistent theoretical frameworks through integration of the data made accessible by<br />

dierent sequencing applications <strong>and</strong> further complementary technologies.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!