28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.4. PROPERTIES OF SEQUENCING DATA 7<br />

cost-per-base further dierentiated considerations including data h<strong>and</strong>ling <strong>and</strong> analysis cost as<br />

well as the unique capabilities of either respective technology are increasingly moving into focus.<br />

Dierences between the various technologies also imply dierent behavior regarding important<br />

parameters such as read length, frequency, type <strong>and</strong> r<strong>and</strong>omness of sequencing errors or sequence<br />

dependent depth-of-coverage biases. With the targeted eld of application, the impact of either<br />

property varies. Table 1.1 provides a comparison of read length, throughput <strong>and</strong> run time of<br />

sequencing instruments. Despite specications of the young technologies being in constant ux,<br />

the overview highlights the technology's strengths <strong>and</strong> weaknesses discussed in the following.<br />

Reversible chain terminator as well as sequencing-by-ligation approaches enforce in-phase<br />

measurement of all DNA templates being sequenced. Therefore, these devices provide a xed,<br />

user selectable read length. Albeit currently providing the highest level of parallelism <strong>and</strong> of<br />

per-run throughput in terms of sequenced bases, they oer a signicantly shorter maximal read<br />

length compared to the other methods discussed.<br />

Sequencing-by-synthesis methods that as in the case of 454 <strong>and</strong> Ion Torrent instruments<br />

imply measurement of the homopolymer sequence of the respective DNA templates generate sequence<br />

reads of variable length determined by the actual nucleotide composition of the molecules.<br />

Whereas minimum read length equals the number of sequencing cycles, i. e. one quarter of the<br />

number of dNTP ows, the average read length thus depends on the type of DNA being sequenced.<br />

By contrast, no phasing of str<strong>and</strong> elongation is enforced by the single molecule real time<br />

sequencing approach, neither on single base nor on the homopolymer level. The method instead<br />

attempts near-continuous tracking of the str<strong>and</strong> elongation process. SMRT devices produce<br />

variable length reads, with the read length largely dependent on the binding anity of the<br />

polymerase enzyme utilized by the technology. The method currently achieves the longest average<br />

read length, with however a broad spectrum of read lengths <strong>and</strong> below-average level of parallelism.<br />

The dideoxynucleotide chain-termination sequencing method has undergone a considerable<br />

time span of continued renement. Therefore, although next-generation instruments are improving<br />

steadily with more <strong>and</strong> more sophisticated chemistry <strong>and</strong> signal processing, Sanger sequencing<br />

technology still denes the gold st<strong>and</strong>ard in terms of single read accuracy. Apart from stochastic<br />

measurement uctuations, all present technologies feature an error prole with an increase in<br />

error frequency towards the end of the read. Typical sources of systematic decline in sequence<br />

quality include deterioration of the sequencing reagents over time or loss of phasing for methods<br />

relying on clonally amplied DNA. Orthogonally, overall read quality is inuenced by factors like<br />

cross-talk between adjacent spots, mixed clusters or optical distortion. [63]<br />

Due to phased probing of the template DNA, with the Illumina <strong>and</strong> SOLiD instruments<br />

measurement error mostly induces base substitution errors. Conversely, 454 <strong>and</strong> Ion Torrent<br />

instruments are primarily susceptible to over- or underestimation of homopolymer lengths, which<br />

results in an elevated frequency of base insertion <strong>and</strong> deletion errors. SMRT suers from an<br />

elevated frequency of nucleotide deletion errors.<br />

All platforms including Sanger sequencing are furthermore subject to sequence specic biases,<br />

where both the probability of sampling specic sequences as well as single base accuracy can be<br />

strongly inuenced by the local sequence context. Besides the implications for sequencing data<br />

analysis <strong>and</strong> interpretation, uneven error rates <strong>and</strong> sampling probabilities result in an increase<br />

of required average sequence coverage <strong>and</strong> thus elevated cost-per-base.<br />

Sampling bias may be introduced not only during the actual steps of the sequencing procedure,<br />

but also the weakly st<strong>and</strong>ardized sequencing <strong>lib</strong>rary preparation, with e. g. PCR amplication an<br />

important <strong>and</strong> frequently cited source of bias [64]. Sequencing bias is classied either according<br />

to the introducing reaction or step of the <strong>lib</strong>rary preparation, such as ligation or fragmentation<br />

bias, or according to the sequence properties it is attributed to, such as base composition or end

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!