An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6 CHAPTER 1. INTRODUCTION<br />
Instrument<br />
Max. Average Read Pairs / Run Time<br />
Read Length Flowcell<br />
Illumina MiSeq 2x250bp 16M 39h<br />
Illumina Genome <strong>An</strong>alyzer IIx 2x150bp 300M 14 days<br />
Illumina HiSeq 2x100bp 1400M 11 days<br />
454 Life Sciences GS Junior ∼400bp 100000 10h<br />
454 Life Sciences GS FLX+ ∼700bp 1M 23h<br />
Life Technologies Ion PGM up to 400bp 1000005M 3.7h7.3h<br />
Life Technologies Ion Proton up to 200bp ∼70M 2h4h<br />
Life Technologies 5500W (SOLiD) 1x75bp / 2x50bp ∼1600M 6 days<br />
Pacic Biosciences PacBio RS ∼4.5kbp ∼22000 2h<br />
Table 1.1: DNA Sequencer Specications According to Device Vendors (3/2013)<br />
a sequencing <strong>lib</strong>rary. Following deep sequencing, thus precipitated tags facilitate identication<br />
<strong>and</strong> localization of putative protein binding sites.<br />
Immunoprecipitation is complemented by techniques that seek to deplete segments of the<br />
DNA that are not protein-bound, e. g. using exonucleases like micrococcal nuclease (MNase) to<br />
digest the unbound parts of the DNA fragments following cross-linking <strong>and</strong> sonication. MNase-<br />
Seq is used on its own to investigate nucleosome positioning [39, 57], but can also be combined<br />
with ChIP-Seq for improved binding site resolution.<br />
When available, immunoprecipitation of the DPC is achieved using a specic antibody raised<br />
against the protein of interest. Alternatively, the experiment is performed using transgenic<br />
organisms where the protein is tagged by an appropriate adapter protein, e. g. GFP [58], for<br />
which a high quality antibody is available. However, this fusion protein approach comes with<br />
the drawback of possible tag-induced alteration of protein function or binding anity.<br />
Apart from its application to the identication of transcription factor targets [34, 35], ChIP-<br />
Seq is most prominently applied for genome-wide screening of histone modications [36]. Using<br />
5-methylcytosine-specic antibodies, immunoprecipitation is further applicable for generating<br />
genome-wide maps of DNA methylation (MeDIP-Seq) [37].<br />
RNA analogues to the ChIP-Seq strategy, high-throughput sequencing of RNA isolated<br />
by crosslinking immunoprecipitation (HITS-CLIP) [38] <strong>and</strong> photoactivatable-ribonucleosideenhanced<br />
crosslinking <strong>and</strong> immunoprecipitation (PAR-CLIP) [59] are used in a similar<br />
fashion to investigate in-vivo binding of RNA associates like e. g. the miRNA binding protein<br />
ARGONAUTE.<br />
1.4 Properties of Sequencing <strong>Data</strong><br />
Modications to both the instruments themselves as well as to the sequencing chemistry utilized<br />
have led to vast improvements in per-run output since the rst iteration of next generation sequencers,<br />
which has frequently drawn comparisons to the famous Moore's Law of the development<br />
of integrated circuits [60, 61].<br />
A further reduction of overall sequencing cost remains a primary objective in the further<br />
development of the technologies, often illustrated by the catch phrase 1000 dollar [human]<br />
genome [13, 62]. Eorts directed at this goal include the ongoing pursuit of new sequencing<br />
technologies as well as streamlining key operational aspects such as ease of <strong>lib</strong>rary preparation,<br />
run time <strong>and</strong> quantization of throughput i. e. the minimal amount of sequence that must<br />
be generated per sequencing run to achieve acceptable cost-per-base. However, with dropping