An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
An Integrated Data Analysis Suite and Programming ... - TOBIAS-lib
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.5. VERSATILE OLIGOMER DETECTION AND READ CLIPPING 41<br />
The conguration depicted in the rst column indicates the type of end overhang that should<br />
not be penalized by the alignment algorithm's scoring method, with the lower line dened as<br />
reference (ref) <strong>and</strong> the upper line as query (qry). The dierent end alignment modes form a<br />
hierarchy of constraints, with dangling_any a subset of local, dangling_qry <strong>and</strong> dangling_ref<br />
subsets of dangling_any, <strong>and</strong> global a common subset dangling_qry <strong>and</strong> dangling_ref.<br />
The term query will be used in the following to always indicate the sequencing read, <strong>and</strong><br />
reference may thus refer to a possibly much shorter oligomer sequence. Alignment of a short<br />
read to a part of a genomic reference sequence constitutes an example of overlap conguration 11,<br />
dened by the keyword pair (dangling_ref;dangling_ref). Detection of sequencing adapters<br />
or 3 ′ bar codes in small RNA sequencing data corresponds to congurations 6 or 7, depending<br />
on whether the read contains the entire or only the partial adapter sequence. This subset of<br />
congurations is dened by the pair (dangling_qry;dangling_any). Detection of bar codes in<br />
5 ′ bar-coded sequences is represented by case 2 (global;dangling_qry).<br />
For Cre/lox-type recombinant mate pair sequencing varied overlap constraints may be appropriate<br />
depending on the desired sensitivity-specicity tradeo. Conservative detection of<br />
linker sequences in valid 454 type mate pairs corresponds to conguration 6. Illumina Cre/lox<br />
mate pair protocols obtain read pairs where the linker sequence is expected towards the end<br />
of either read (conguration 6 or 7). Occasionally one of the enzymatic cuts of the circular<br />
DNA may also occur inside the linker DNA. Such cases are described by conguration<br />
10. The subset of congurations 6, 7 <strong>and</strong> 10 constitutes a case of mutually dependent end<br />
alignment congurations. Thus, it can not be accounted for by a single keyword pair, but the<br />
two pairs (dangling_qry;dangling_any) <strong>and</strong> (dangling_ref;dangling_qry) (or alternatively,<br />
(dangling_any;dangling_qry) <strong>and</strong> (dangling_qry;dangling_ref)).<br />
The SHORE oligo-match utility is capable of for each sequencing read selecting the optimal<br />
alignment or alignments out of multiple pairs of end alignment modes, multiple reference<br />
oligomers <strong>and</strong> optionally their reverse complemented sequence.<br />
By utilizing a fully customizable 16x16 scoring matrix, the program enables adjustable h<strong>and</strong>ling<br />
of ambiguous IUPAC nucleotide codes as well as asymmetric base mismatch penalties with<br />
respect to the direction of the match.<br />
A pair-wise sequence mapping is composed of two pairs of end coordinates as well as the<br />
alignment describing actual base pairings <strong>and</strong> sequence gaps. To increase specicity of read<br />
clipping <strong>and</strong> splitting operations, it is desirable to ensure that optimal pair-wise mappings feature<br />
a unique pair of end coordinates, whereas potential alternative alignments can be considered<br />
irrelevant. Our alignment algorithm is capable of either generating an exhaustive list of all<br />
possible alignments, a list featuring a representative of all alignments with a dierent pair of<br />
end coordinates, or just a single representative for each pair-wise mapping, which is optionally<br />
assessed for uniqueness of end coordinates.<br />
The utility provides lters for pair-wise mappings with respect to uniqueness of oligomer<br />
selection <strong>and</strong> end coordinates. Alignments valid following ltering may be comprehensively<br />
reported <strong>and</strong> applied to clipping or splitting sequencing reads at either, or at both ends of the<br />
detected oligomer.<br />
2.5.2 Dynamic <strong>Programming</strong> Alignment <strong>and</strong> Backtracing Pipeline<br />
Our alignment pipeline proceeds in three subsequent passes. Initially dynamic programming<br />
alignment of the sequencing read is performed to each oligomer <strong>and</strong> for each pair of provided end<br />
alignment modes. The optimal alignment or alignments are passed to the backtracing module.<br />
Finally, generated traces are ltered <strong>and</strong> may subsequently be applied to read manipulation.