28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.3. APPLICATIONS OF HIGH-THROUGHPUT SEQUENCING 3<br />

DNA templates, <strong>and</strong> instead single DNA molecules <strong>and</strong> polymerases suce for obtaining a recognizable<br />

signal. DNA polymerases are immobilized in ZMW detection volumes, creating a slide<br />

suitable for the parallel observation of polymerase activity. Fluorescently labeled dNTPs are employed<br />

as the substrate, where nucleotides being incorporated are detected via laser excitation<br />

<strong>and</strong> a CCD detector. The uorophore itself is displaced upon nucleotide incorporation. [14]<br />

As opposed to sequencing-by-synthesis reliant on DNA polymerase, SOLiD sequencing constitutes<br />

a sequencing-by-ligation technique that exploits DNA ligase enzymes for sequence decoding.<br />

Probe oligomers partially consisting of degenerate <strong>and</strong> universal bases [16] are ligated to interrogate<br />

di-nucleotides at a spacing of three base pairs. When reaching the full read length, the<br />

template is reset <strong>and</strong> the ligation process is repeated ve times at dierent osets, <strong>and</strong> thus<br />

each of the template's bases is measured twice through dierent probe oligomers. Four dierent<br />

uorophores serve to label the 16 types of probe to obtain an ambiguous color space code. The<br />

properties of the dibase color encoding are such that for any base, the ambiguity may be resolved<br />

as long as the identity of the preceding base is known. [12]<br />

The following discussion centers on Illumina sequencing technology as the primary focus of<br />

the present work, although applicable to varying extent to multiple or all of the currently<br />

available high-throughput sequencing devices.<br />

1.3 Applications of High-Throughput Sequencing<br />

While high-throughput sequencing devices in general excel in the cost-eective generation of massive<br />

amounts of sequence data, the new technologies are still associated with weaknesses regarding<br />

error rates <strong>and</strong> maximal achievable sequencing read length (section 1.4). These weaknesses<br />

limit the capacity to generate base-by-base readouts of entire genome sequences. Nonetheless,<br />

concerted application of complementary technologies including Sanger sequencing has served to<br />

greatly accelerate the pace at which novel reference assemblies for manifold model organisms are<br />

being generated.<br />

Certainly, despite the reduction in cost-per-base that has been achieved, decoding larger,<br />

repeat-rich genome sequences remains a massive task with current DNA sequencing technology.<br />

Genomic inter-individual <strong>and</strong> inter-species variation may however be assessed circumventing the<br />

issue of global-scale sequence topology (section 1.3.1), which has thus evolved as one of the<br />

primary applications of high-throughput sequencing. In turn, interpretation of data generated<br />

with the aim of such consensus-based characterization of sequence variation is greatly facilitated<br />

by the availability of a suitable reference genome assembly.<br />

Similarly, the resequencing scenario facilitates data interpretation for an extensive family of<br />

deep sequencing <strong>and</strong> genome-wide screening applications. As throughput of sequencing devices<br />

has grown to enable routine sequencing even of large genomes to multiple depths of coverage, its<br />

potential for quantitative measurement has been leveraged for the assessment of diverse genomic<br />

properties. With stock genome sequencing protocol, deep sequencing allows estimation of quantiable<br />

statistics such as gene copy number [1719], application to forward genetic screens [20, 21]<br />

or dissection of environmental samples [2224]. To exploit its potential for quantication in a<br />

wide range of further research settings, high-throughput sequencing is complemented by a host<br />

of dierent sample <strong>and</strong> <strong>lib</strong>rary preparation protocols acting as adapters between the property of<br />

interest <strong>and</strong> DNA sequencing technology.<br />

RNA conversion protocols facilitate investigation of transcript structure <strong>and</strong> abundance<br />

[2529] <strong>and</strong> address structure, expression <strong>and</strong> distribution of specic types of transcript<br />

such as long non-coding RNA or small RNA [27, 3033]. In ChIP-Seq, MeDIP-Seq or HITS-<br />

CLIP, immunoprecipitation-based sequence enrichment protocols provide insight into diverse

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!