28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.8. VISUALIZATION OF SEQUENCING READ AND ALIGNMENT DATA 53<br />

Quality Score | Nucleotide Count<br />

0 8 16 24 32 40 48<br />

0 5356 10712 16067 21423 26779 32135<br />

Median Base Quality<br />

Inner Quality Quartiles<br />

Average Quality ±SD<br />

Support<br />

4A<br />

4C<br />

4G<br />

4T<br />

Length Distribution<br />

Dashed: Unique Sequences<br />

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 160<br />

Cycle [bp]<br />

Figure 2.7: Read <strong>and</strong> Quality Statistics for a Paired-End Sequencing Run Following Quality-<br />

Based Read Trimming<br />

likelihood of reads originating from duplicates having the same nucleotide sequence decreases with<br />

increasing read length (sections 1.4, 2.7.2). Frequency <strong>and</strong> nucleotide composition of unique read<br />

sequences can however still provide valid clues on ChIP-Seq <strong>lib</strong>rary quality <strong>and</strong> complexity <strong>and</strong><br />

reveal important sample properties in small RNA sequencing applications. Thus, support <strong>and</strong><br />

nucleotide composition after pruning of redundant read sequences is indicated by a corresponding<br />

dashed line for each property.<br />

Finally, distribution of read lengths is dicult to pick up visually from support distributions<br />

alone. For improved appraisal, the distributions are therefore additionally indicated by short<br />

vertical bars at the bottom of the gure.<br />

While the presented superimposition of quality <strong>and</strong> nucleotide composition curves may in<br />

some cases serve to obfuscate the interpretability of either individual property, it renders multivariate<br />

entities such as correlations between read quality issues <strong>and</strong> skews in nucleotide composition<br />

or the eectiveness of read trimming algorithms visually immediately appraisable.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!