28.02.2014 Views

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.5. VERSATILE OLIGOMER DETECTION AND READ CLIPPING 41<br />

The conguration depicted in the rst column indicates the type of end overhang that should<br />

not be penalized by the alignment algorithm's scoring method, with the lower line dened as<br />

reference (ref) <strong>and</strong> the upper line as query (qry). The dierent end alignment modes form a<br />

hierarchy of constraints, with dangling_any a subset of local, dangling_qry <strong>and</strong> dangling_ref<br />

subsets of dangling_any, <strong>and</strong> global a common subset dangling_qry <strong>and</strong> dangling_ref.<br />

The term query will be used in the following to always indicate the sequencing read, <strong>and</strong><br />

reference may thus refer to a possibly much shorter oligomer sequence. Alignment of a short<br />

read to a part of a genomic reference sequence constitutes an example of overlap conguration 11,<br />

dened by the keyword pair (dangling_ref;dangling_ref). Detection of sequencing adapters<br />

or 3 ′ bar codes in small RNA sequencing data corresponds to congurations 6 or 7, depending<br />

on whether the read contains the entire or only the partial adapter sequence. This subset of<br />

congurations is dened by the pair (dangling_qry;dangling_any). Detection of bar codes in<br />

5 ′ bar-coded sequences is represented by case 2 (global;dangling_qry).<br />

For Cre/lox-type recombinant mate pair sequencing varied overlap constraints may be appropriate<br />

depending on the desired sensitivity-specicity tradeo. Conservative detection of<br />

linker sequences in valid 454 type mate pairs corresponds to conguration 6. Illumina Cre/lox<br />

mate pair protocols obtain read pairs where the linker sequence is expected towards the end<br />

of either read (conguration 6 or 7). Occasionally one of the enzymatic cuts of the circular<br />

DNA may also occur inside the linker DNA. Such cases are described by conguration<br />

10. The subset of congurations 6, 7 <strong>and</strong> 10 constitutes a case of mutually dependent end<br />

alignment congurations. Thus, it can not be accounted for by a single keyword pair, but the<br />

two pairs (dangling_qry;dangling_any) <strong>and</strong> (dangling_ref;dangling_qry) (or alternatively,<br />

(dangling_any;dangling_qry) <strong>and</strong> (dangling_qry;dangling_ref)).<br />

The SHORE oligo-match utility is capable of for each sequencing read selecting the optimal<br />

alignment or alignments out of multiple pairs of end alignment modes, multiple reference<br />

oligomers <strong>and</strong> optionally their reverse complemented sequence.<br />

By utilizing a fully customizable 16x16 scoring matrix, the program enables adjustable h<strong>and</strong>ling<br />

of ambiguous IUPAC nucleotide codes as well as asymmetric base mismatch penalties with<br />

respect to the direction of the match.<br />

A pair-wise sequence mapping is composed of two pairs of end coordinates as well as the<br />

alignment describing actual base pairings <strong>and</strong> sequence gaps. To increase specicity of read<br />

clipping <strong>and</strong> splitting operations, it is desirable to ensure that optimal pair-wise mappings feature<br />

a unique pair of end coordinates, whereas potential alternative alignments can be considered<br />

irrelevant. Our alignment algorithm is capable of either generating an exhaustive list of all<br />

possible alignments, a list featuring a representative of all alignments with a dierent pair of<br />

end coordinates, or just a single representative for each pair-wise mapping, which is optionally<br />

assessed for uniqueness of end coordinates.<br />

The utility provides lters for pair-wise mappings with respect to uniqueness of oligomer<br />

selection <strong>and</strong> end coordinates. Alignments valid following ltering may be comprehensively<br />

reported <strong>and</strong> applied to clipping or splitting sequencing reads at either, or at both ends of the<br />

detected oligomer.<br />

2.5.2 Dynamic <strong>Programming</strong> Alignment <strong>and</strong> Backtracing Pipeline<br />

Our alignment pipeline proceeds in three subsequent passes. Initially dynamic programming<br />

alignment of the sequencing read is performed to each oligomer <strong>and</strong> for each pair of provided end<br />

alignment modes. The optimal alignment or alignments are passed to the backtracing module.<br />

Finally, generated traces are ltered <strong>and</strong> may subsequently be applied to read manipulation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!