An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
78 CHAPTER 4. CLOSING REMARKS<br />
algorithms. With the sample preparation procedure leading up to the sequencing device being<br />
application-specic, weakly st<strong>and</strong>ardized, involving signicant amounts of manual work <strong>and</strong> thus<br />
ultimately highly variable, development of robust statistical models has proved intricate. Furthermore,<br />
strategies for result validation are usually laborious, cost-intensive or have yet to be<br />
developed. In many cases, specicity or sensitivity of predictions can be improved by intersection<br />
or union of results obtained through partially complementary algorithms, respectively. For<br />
example, with the SHORE mapowcell utility, we provide a unied interface that allows to readily<br />
obtain <strong>and</strong> combine the results of a variety of available read mapping algorithms in consistent<br />
<strong>and</strong> immediately comparable output formats. Implementation of further wrapper infrastructure<br />
to similarly integrate third-party analysis algorithms for applications such as variation calling or<br />
enrichment proling might be a future direction to obtain a valuable resource to readily determine<br />
individual algorithms' strengths <strong>and</strong> weaknesses through large-scale comparison. However,<br />
the various algorithms available for a specic high-throughput sequencing application often share<br />
similar fundamental ideas, <strong>and</strong> thus suer from corresponding systematic issues.<br />
The initially rapid pace at which innovations improving throughput, read lengths or error<br />
rates have been achieved now seems to be diminishing for the current generation of sequencing<br />
technologies. However, conceptually dierent high-throughput sequencing approaches such as<br />
nanopore sequencing are being investigated, <strong>and</strong> might in the medium term account for further<br />
drastic changes in the eld. With cost reduced by a further magnitude <strong>and</strong> improved reliability<br />
<strong>and</strong> automation of sample preparation, DNA sequencing could nd its way into routine medical<br />
application. However, other than cost-per-base, changes in the properties of the data should<br />
prove most relevant for scientic application <strong>and</strong> data analysis.<br />
Sequencing read length has long been perceived as a critical factor for genome assembly or<br />
assessment of genome structure. It seems likely that a technology able to deliver read lengths<br />
of multiple tens of kilobases at competitive accuracy <strong>and</strong> price would bring about a major shift<br />
towards multiple whole-genome comparison strategies. In addition to whole-genome topology,<br />
such approaches will implicitly yield traits like copy number variation as well as single nucleotide<br />
polymorphisms <strong>and</strong> all other types of localized, small-scale variation, <strong>and</strong> possibly render the<br />
corresponding reference-based resequencing approaches irrelevant. However, to fully leverage the<br />
advantages compared to reference sequence-guided analysis, signicant further challenges with<br />
respect to data analysis <strong>and</strong> representation will have to be overcome. Reasonable relations must<br />
be devised to capture homology, replacing the linear reference genome coordinate system. Providing<br />
manageable breakdown <strong>and</strong> visualization of such data will be essential, <strong>and</strong> likely require<br />
a variety of dierent approaches depending on the respective focus of the investigation. To assess<br />
the functional implications of primary analysis results, gene annotation or annotation transfer<br />
algorithms will have to be simplied <strong>and</strong> improved. Finally, ensuring immediate comparability<br />
of results across dierent studies should prove challenging.<br />
Technological innovations that could potentially replace or drastically improve the power<br />
of current deep sequencing approaches such as whole-genome enrichment or expression pro-<br />
ling are more dicult to envision. Single-molecule sequencing methods however may in the<br />
future permit amplication- <strong>and</strong> ligation-free sequencing of samples from small amounts of starting<br />
material. From such elimination of steps of the sample preparation procedure might follow a<br />
signicant reduction of sequence-specic biases. Furthermore, this simplication should present<br />
opportunities for st<strong>and</strong>ardization <strong>and</strong> automation of sequencing <strong>lib</strong>rary production to further reduce<br />
technical variability. With minimization of the amount of required DNA <strong>and</strong> tissue, sample<br />
composition should correspondingly be rendered more controllable. Spike-in of st<strong>and</strong>ard quantities<br />
of DNA oligomers seems to have been ab<strong>and</strong>oned by the scientic community, but might in<br />
future rened <strong>and</strong> st<strong>and</strong>ardized incarnations provide an additional resource to data normalization<br />
<strong>and</strong> estimation of measurement variance. Dropping sequencing cost <strong>and</strong> simplication of <strong>lib</strong>rary