28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Contents<br />

1 Introduction 1<br />

1.1 High-Throughput DNA Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />

1.2 Approaches to Parallel DNA Sequencing . . . . . . . . . . . . . . . . . . . . . . . 2<br />

1.3 Applications of High-Throughput Sequencing . . . . . . . . . . . . . . . . . . . . 3<br />

1.3.1 Genome <strong>and</strong> Transcriptome Sequencing . . . . . . . . . . . . . . . . . . . 4<br />

1.3.2 ChIP-Seq <strong>and</strong> further Immunoprecipitation Protocols . . . . . . . . . . . 5<br />

1.4 Properties of Sequencing <strong>Data</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.5 Sequencing <strong>Data</strong> <strong>An</strong>alysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.5.1 Base Calling <strong>and</strong> Read Quality Assessment . . . . . . . . . . . . . . . . . 8<br />

1.5.2 Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

1.5.3 Short Read Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

1.5.4 Variation <strong>and</strong> Genotype Calling . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1.5.5 Quantication by Deep Sequencing . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.5.6 ChIP-Seq <strong>An</strong>alysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

1.6 Storage <strong>and</strong> Representation of Sequencing <strong>Data</strong> . . . . . . . . . . . . . . . . . . . 13<br />

1.6.1 Storage Formats for Next-Generation Sequencing <strong>Data</strong> . . . . . . . . . . . 13<br />

1.6.2 Accelerated Queries on Sequencing <strong>Data</strong> . . . . . . . . . . . . . . . . . . . 15<br />

1.7 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

2 A High-Throughput DNA Sequencing <strong>Data</strong> <strong>An</strong>alysis <strong>Suite</strong> 19<br />

2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

2.2 Ecient Storage of High-Throughput Sequencing <strong>Data</strong> Using Text-Based File Formats<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.2.2 Widely Compatible Indexed Block-Wise Compression . . . . . . . . . . . 23<br />

2.2.3 Ecient Queries on Text Files . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.2.4 Improved Compression of Read Mapping <strong>Data</strong> . . . . . . . . . . . . . . . 27<br />

2.2.5 Compression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

2.2.6 <strong>Data</strong> Storage Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

2.3 A Non-Destructive Read Filtering <strong>and</strong> Partitioning Framework . . . . . . . . . . 33<br />

2.4 A Flexible Sequencing Read Demultiplexing System . . . . . . . . . . . . . . . . 34<br />

2.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />

2.4.2 A Format for Description of Multiplexing Setups . . . . . . . . . . . . . . 36<br />

2.4.3 Barcode Recognition <strong>and</strong> Sample Resolution . . . . . . . . . . . . . . . . 38<br />

2.5 Versatile Oligomer Detection <strong>and</strong> Read Clipping . . . . . . . . . . . . . . . . . . 38<br />

2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

2.5.2 Dynamic <strong>Programming</strong> Alignment <strong>and</strong> Backtracing Pipeline . . . . . . . 41<br />

xi

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!