An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.3. APPLICATIONS OF HIGH-THROUGHPUT SEQUENCING 3<br />
DNA templates, <strong>and</strong> instead single DNA molecules <strong>and</strong> polymerases suce for obtaining a recognizable<br />
signal. DNA polymerases are immobilized in ZMW detection volumes, creating a slide<br />
suitable for the parallel observation of polymerase activity. Fluorescently labeled dNTPs are employed<br />
as the substrate, where nucleotides being incorporated are detected via laser excitation<br />
<strong>and</strong> a CCD detector. The uorophore itself is displaced upon nucleotide incorporation. [14]<br />
As opposed to sequencing-by-synthesis reliant on DNA polymerase, SOLiD sequencing constitutes<br />
a sequencing-by-ligation technique that exploits DNA ligase enzymes for sequence decoding.<br />
Probe oligomers partially consisting of degenerate <strong>and</strong> universal bases [16] are ligated to interrogate<br />
di-nucleotides at a spacing of three base pairs. When reaching the full read length, the<br />
template is reset <strong>and</strong> the ligation process is repeated ve times at dierent osets, <strong>and</strong> thus<br />
each of the template's bases is measured twice through dierent probe oligomers. Four dierent<br />
uorophores serve to label the 16 types of probe to obtain an ambiguous color space code. The<br />
properties of the dibase color encoding are such that for any base, the ambiguity may be resolved<br />
as long as the identity of the preceding base is known. [12]<br />
The following discussion centers on Illumina sequencing technology as the primary focus of<br />
the present work, although applicable to varying extent to multiple or all of the currently<br />
available high-throughput sequencing devices.<br />
1.3 Applications of High-Throughput Sequencing<br />
While high-throughput sequencing devices in general excel in the cost-eective generation of massive<br />
amounts of sequence data, the new technologies are still associated with weaknesses regarding<br />
error rates <strong>and</strong> maximal achievable sequencing read length (section 1.4). These weaknesses<br />
limit the capacity to generate base-by-base readouts of entire genome sequences. Nonetheless,<br />
concerted application of complementary technologies including Sanger sequencing has served to<br />
greatly accelerate the pace at which novel reference assemblies for manifold model organisms are<br />
being generated.<br />
Certainly, despite the reduction in cost-per-base that has been achieved, decoding larger,<br />
repeat-rich genome sequences remains a massive task with current DNA sequencing technology.<br />
Genomic inter-individual <strong>and</strong> inter-species variation may however be assessed circumventing the<br />
issue of global-scale sequence topology (section 1.3.1), which has thus evolved as one of the<br />
primary applications of high-throughput sequencing. In turn, interpretation of data generated<br />
with the aim of such consensus-based characterization of sequence variation is greatly facilitated<br />
by the availability of a suitable reference genome assembly.<br />
Similarly, the resequencing scenario facilitates data interpretation for an extensive family of<br />
deep sequencing <strong>and</strong> genome-wide screening applications. As throughput of sequencing devices<br />
has grown to enable routine sequencing even of large genomes to multiple depths of coverage, its<br />
potential for quantitative measurement has been leveraged for the assessment of diverse genomic<br />
properties. With stock genome sequencing protocol, deep sequencing allows estimation of quantiable<br />
statistics such as gene copy number [1719], application to forward genetic screens [20, 21]<br />
or dissection of environmental samples [2224]. To exploit its potential for quantication in a<br />
wide range of further research settings, high-throughput sequencing is complemented by a host<br />
of dierent sample <strong>and</strong> <strong>lib</strong>rary preparation protocols acting as adapters between the property of<br />
interest <strong>and</strong> DNA sequencing technology.<br />
RNA conversion protocols facilitate investigation of transcript structure <strong>and</strong> abundance<br />
[2529] <strong>and</strong> address structure, expression <strong>and</strong> distribution of specic types of transcript<br />
such as long non-coding RNA or small RNA [27, 3033]. In ChIP-Seq, MeDIP-Seq or HITS-<br />
CLIP, immunoprecipitation-based sequence enrichment protocols provide insight into diverse