An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.4. PROPERTIES OF SEQUENCING DATA 7<br />
cost-per-base further dierentiated considerations including data h<strong>and</strong>ling <strong>and</strong> analysis cost as<br />
well as the unique capabilities of either respective technology are increasingly moving into focus.<br />
Dierences between the various technologies also imply dierent behavior regarding important<br />
parameters such as read length, frequency, type <strong>and</strong> r<strong>and</strong>omness of sequencing errors or sequence<br />
dependent depth-of-coverage biases. With the targeted eld of application, the impact of either<br />
property varies. Table 1.1 provides a comparison of read length, throughput <strong>and</strong> run time of<br />
sequencing instruments. Despite specications of the young technologies being in constant ux,<br />
the overview highlights the technology's strengths <strong>and</strong> weaknesses discussed in the following.<br />
Reversible chain terminator as well as sequencing-by-ligation approaches enforce in-phase<br />
measurement of all DNA templates being sequenced. Therefore, these devices provide a xed,<br />
user selectable read length. Albeit currently providing the highest level of parallelism <strong>and</strong> of<br />
per-run throughput in terms of sequenced bases, they oer a signicantly shorter maximal read<br />
length compared to the other methods discussed.<br />
Sequencing-by-synthesis methods that as in the case of 454 <strong>and</strong> Ion Torrent instruments<br />
imply measurement of the homopolymer sequence of the respective DNA templates generate sequence<br />
reads of variable length determined by the actual nucleotide composition of the molecules.<br />
Whereas minimum read length equals the number of sequencing cycles, i. e. one quarter of the<br />
number of dNTP ows, the average read length thus depends on the type of DNA being sequenced.<br />
By contrast, no phasing of str<strong>and</strong> elongation is enforced by the single molecule real time<br />
sequencing approach, neither on single base nor on the homopolymer level. The method instead<br />
attempts near-continuous tracking of the str<strong>and</strong> elongation process. SMRT devices produce<br />
variable length reads, with the read length largely dependent on the binding anity of the<br />
polymerase enzyme utilized by the technology. The method currently achieves the longest average<br />
read length, with however a broad spectrum of read lengths <strong>and</strong> below-average level of parallelism.<br />
The dideoxynucleotide chain-termination sequencing method has undergone a considerable<br />
time span of continued renement. Therefore, although next-generation instruments are improving<br />
steadily with more <strong>and</strong> more sophisticated chemistry <strong>and</strong> signal processing, Sanger sequencing<br />
technology still denes the gold st<strong>and</strong>ard in terms of single read accuracy. Apart from stochastic<br />
measurement uctuations, all present technologies feature an error prole with an increase in<br />
error frequency towards the end of the read. Typical sources of systematic decline in sequence<br />
quality include deterioration of the sequencing reagents over time or loss of phasing for methods<br />
relying on clonally amplied DNA. Orthogonally, overall read quality is inuenced by factors like<br />
cross-talk between adjacent spots, mixed clusters or optical distortion. [63]<br />
Due to phased probing of the template DNA, with the Illumina <strong>and</strong> SOLiD instruments<br />
measurement error mostly induces base substitution errors. Conversely, 454 <strong>and</strong> Ion Torrent<br />
instruments are primarily susceptible to over- or underestimation of homopolymer lengths, which<br />
results in an elevated frequency of base insertion <strong>and</strong> deletion errors. SMRT suers from an<br />
elevated frequency of nucleotide deletion errors.<br />
All platforms including Sanger sequencing are furthermore subject to sequence specic biases,<br />
where both the probability of sampling specic sequences as well as single base accuracy can be<br />
strongly inuenced by the local sequence context. Besides the implications for sequencing data<br />
analysis <strong>and</strong> interpretation, uneven error rates <strong>and</strong> sampling probabilities result in an increase<br />
of required average sequence coverage <strong>and</strong> thus elevated cost-per-base.<br />
Sampling bias may be introduced not only during the actual steps of the sequencing procedure,<br />
but also the weakly st<strong>and</strong>ardized sequencing <strong>lib</strong>rary preparation, with e. g. PCR amplication an<br />
important <strong>and</strong> frequently cited source of bias [64]. Sequencing bias is classied either according<br />
to the introducing reaction or step of the <strong>lib</strong>rary preparation, such as ligation or fragmentation<br />
bias, or according to the sequence properties it is attributed to, such as base composition or end