28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

78 CHAPTER 4. CLOSING REMARKS<br />

algorithms. With the sample preparation procedure leading up to the sequencing device being<br />

application-specic, weakly st<strong>and</strong>ardized, involving signicant amounts of manual work <strong>and</strong> thus<br />

ultimately highly variable, development of robust statistical models has proved intricate. Furthermore,<br />

strategies for result validation are usually laborious, cost-intensive or have yet to be<br />

developed. In many cases, specicity or sensitivity of predictions can be improved by intersection<br />

or union of results obtained through partially complementary algorithms, respectively. For<br />

example, with the SHORE mapowcell utility, we provide a unied interface that allows to readily<br />

obtain <strong>and</strong> combine the results of a variety of available read mapping algorithms in consistent<br />

<strong>and</strong> immediately comparable output formats. Implementation of further wrapper infrastructure<br />

to similarly integrate third-party analysis algorithms for applications such as variation calling or<br />

enrichment proling might be a future direction to obtain a valuable resource to readily determine<br />

individual algorithms' strengths <strong>and</strong> weaknesses through large-scale comparison. However,<br />

the various algorithms available for a specic high-throughput sequencing application often share<br />

similar fundamental ideas, <strong>and</strong> thus suer from corresponding systematic issues.<br />

The initially rapid pace at which innovations improving throughput, read lengths or error<br />

rates have been achieved now seems to be diminishing for the current generation of sequencing<br />

technologies. However, conceptually dierent high-throughput sequencing approaches such as<br />

nanopore sequencing are being investigated, <strong>and</strong> might in the medium term account for further<br />

drastic changes in the eld. With cost reduced by a further magnitude <strong>and</strong> improved reliability<br />

<strong>and</strong> automation of sample preparation, DNA sequencing could nd its way into routine medical<br />

application. However, other than cost-per-base, changes in the properties of the data should<br />

prove most relevant for scientic application <strong>and</strong> data analysis.<br />

Sequencing read length has long been perceived as a critical factor for genome assembly or<br />

assessment of genome structure. It seems likely that a technology able to deliver read lengths<br />

of multiple tens of kilobases at competitive accuracy <strong>and</strong> price would bring about a major shift<br />

towards multiple whole-genome comparison strategies. In addition to whole-genome topology,<br />

such approaches will implicitly yield traits like copy number variation as well as single nucleotide<br />

polymorphisms <strong>and</strong> all other types of localized, small-scale variation, <strong>and</strong> possibly render the<br />

corresponding reference-based resequencing approaches irrelevant. However, to fully leverage the<br />

advantages compared to reference sequence-guided analysis, signicant further challenges with<br />

respect to data analysis <strong>and</strong> representation will have to be overcome. Reasonable relations must<br />

be devised to capture homology, replacing the linear reference genome coordinate system. Providing<br />

manageable breakdown <strong>and</strong> visualization of such data will be essential, <strong>and</strong> likely require<br />

a variety of dierent approaches depending on the respective focus of the investigation. To assess<br />

the functional implications of primary analysis results, gene annotation or annotation transfer<br />

algorithms will have to be simplied <strong>and</strong> improved. Finally, ensuring immediate comparability<br />

of results across dierent studies should prove challenging.<br />

Technological innovations that could potentially replace or drastically improve the power<br />

of current deep sequencing approaches such as whole-genome enrichment or expression pro-<br />

ling are more dicult to envision. Single-molecule sequencing methods however may in the<br />

future permit amplication- <strong>and</strong> ligation-free sequencing of samples from small amounts of starting<br />

material. From such elimination of steps of the sample preparation procedure might follow a<br />

signicant reduction of sequence-specic biases. Furthermore, this simplication should present<br />

opportunities for st<strong>and</strong>ardization <strong>and</strong> automation of sequencing <strong>lib</strong>rary production to further reduce<br />

technical variability. With minimization of the amount of required DNA <strong>and</strong> tissue, sample<br />

composition should correspondingly be rendered more controllable. Spike-in of st<strong>and</strong>ard quantities<br />

of DNA oligomers seems to have been ab<strong>and</strong>oned by the scientic community, but might in<br />

future rened <strong>and</strong> st<strong>and</strong>ardized incarnations provide an additional resource to data normalization<br />

<strong>and</strong> estimation of measurement variance. Dropping sequencing cost <strong>and</strong> simplication of <strong>lib</strong>rary

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!