29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Conservation of regulatory elements<br />

Ri<br />

Ri<br />

−15 −10 −5 0 5 10<br />

−10 −5 0 5 10 15<br />

P1: Raw comb<strong>in</strong>ed scores, −10,−35, UP (E.coli) (N=63)<br />

−500 −400 −300 −200 −100 0<br />

Position relative to 16S gene start<br />

(a)<br />

P2: Raw comb<strong>in</strong>ed scores, −10,−35, UP (E. coli) (N=63)<br />

−500 −400 −300 −200 −100 0<br />

Position relative to 16S gene start<br />

(c)<br />

Ri<br />

Ri<br />

−15 −10 −5 0 5 10 15<br />

−10 −5 0 5 10 15<br />

P1: Adjusted comb<strong>in</strong>ed scores, −10,−35, UP (E.coli) (N=63)<br />

−500 −400 −300 −200 −100 0<br />

Position relative to gene start<br />

(b)<br />

P2: Adjusted comb<strong>in</strong>ed scores, −10,−35, UP (E. coli) (N=63)<br />

−500 −400 −300 −200 −100 0<br />

Position relative to gene start<br />

Figure 3.6: Profiles show<strong>in</strong>g the maximum Ri(tot) scores of the <strong>in</strong>itial weight matrices applied to<br />

E. coli <strong>and</strong> Shigella: Unadjusted P1 scores (a), Adjusted P1 scores (b), Unadjusted P2 scores (c),<br />

<strong>and</strong> Adjusted P2 scores (d)<br />

3.3.2 Iterat<strong>in</strong>g weight matrix frequencies<br />

The program iscan was developed to query a given DNA sequence <strong>and</strong> for every position <strong>in</strong><br />

this sequence calculate the maximum Ri(tot) that can be obta<strong>in</strong>ed by try<strong>in</strong>g out different<br />

spac<strong>in</strong>g configuraitons with<strong>in</strong> a specified w<strong>in</strong>dow. The iscan algorithm aligns the first<br />

matrix with the query (<strong>in</strong> this case the –10 hexamer) <strong>and</strong> tries all distances between 13<br />

<strong>and</strong> 19 nucleotides towards the –35 hexamer, us<strong>in</strong>g 16 nucleotides as the center. Then<br />

the program locks the optimal of those distances, <strong>and</strong> cont<strong>in</strong>ues with the next box (<strong>in</strong><br />

this case the the UP element) until all elements have been <strong>in</strong>cluded. For source code, see<br />

appendix D.5. The spac<strong>in</strong>g configuration of the two models is shown <strong>in</strong> figure ??.<br />

The maximum Ri(tot) values of all operons were stacked <strong>and</strong> average <strong>and</strong> st<strong>and</strong>ard<br />

deviation values were plotted as function of position. Because the distance between P1/P2<br />

<strong>and</strong> the 16S gene varies slightly, the unadjusted plots appear noisy. By shift<strong>in</strong>g the plots<br />

slightly by align<strong>in</strong>g to local maxima around P1 <strong>and</strong> P2 renders the P1 <strong>and</strong> P2 model scores<br />

sharper (see figure 3.6).<br />

3.3.3 Ref<strong>in</strong><strong>in</strong>g E. coli <strong>and</strong> Shigella models<br />

All peaks of Ri(tot) around the regions of P1 <strong>and</strong> P2 have been collected, <strong>and</strong> the P1 <strong>and</strong><br />

P2 models were ref<strong>in</strong>ed by adjust<strong>in</strong>g matrix parameters accord<strong>in</strong>g to the observed base<br />

frequencies <strong>in</strong> the hits obta<strong>in</strong>ed. The logo plots of are shown <strong>in</strong> figure 3.7<br />

112<br />

(d)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!