29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Conservation of regulatory elements<br />

values.<br />

nb,p<br />

Rb,p = log2(4) + log2<br />

N<br />

L<br />

Rtot = RB,p<br />

p=1<br />

(3.1)<br />

–where b ∈ AT GC iterates through the four bases, p denotes the position <strong>in</strong> the<br />

alignment, L is the length of the alignment (or width of the matrix), <strong>and</strong> nb,p is the<br />

number of bases b at position p, <strong>and</strong> B denotes the nucleotide at position p <strong>in</strong> the query<br />

sequence. Shultzaberger <strong>and</strong> co-workers account for the helical fac<strong>in</strong>g by <strong>in</strong>troduc<strong>in</strong>g the<br />

accessibility, n(d) (equation 3.2) <strong>and</strong> the gap surprisal, GS(d) (see equation 3.3).<br />

n(d) = 1 + cos[ 2π<br />

(d − c)] (3.2)<br />

w<br />

–where c is the center distance between two b<strong>in</strong>d<strong>in</strong>g sites (e.g. optimally spaced), d is<br />

the query distance, w = 10.6 is the distance of a one helix turn of B-form DNA. F<strong>in</strong>ally,<br />

this gives GS(d) as follows:<br />

n(d)<br />

GS(d) = log2<br />

N<br />

(3.3)<br />

–where N is the sum of all n(d) (see equation 3.4). The sign of the GS(d) was changed<br />

from the orig<strong>in</strong>al equation described by Shultzaberger <strong>and</strong> co-workers to allow for comb<strong>in</strong><strong>in</strong>g<br />

all scores by addition.<br />

N =<br />

max<br />

<br />

d=m<strong>in</strong><br />

n(d) (3.4)<br />

–where m<strong>in</strong> <strong>and</strong> max are the boundaries of a given w<strong>in</strong>dow exam<strong>in</strong>ed. F<strong>in</strong>ally, summariz<strong>in</strong>g<br />

all Ri <strong>and</strong> GS(d) values gives the total <strong>in</strong>formation of all motifs <strong>and</strong> all spacers (see<br />

figure 3.5)<br />

Ri(tot) = Ri(m1) + GS(d, m1) + Ri(m2) + ... + GS(d, mn−1) + Ri(mn) (3.5)<br />

3.3.1 Model<strong>in</strong>g the P1 <strong>and</strong> P2 <strong>in</strong> selected enterics<br />

Exist<strong>in</strong>g experimentally verified –10 <strong>and</strong> –35 hexamers (Huerta & Collado-Vides, 2003)<br />

were converted <strong>in</strong>to Rb,p matrices together with data for known UP elements (Estrem<br />

et al., 1998) <strong>and</strong> FIS b<strong>in</strong>d<strong>in</strong>g sites (Hengen et al., 1997). Figure 3.4 shows logo plots of<br />

the <strong>in</strong>formation content of these studies. The <strong>in</strong>itial weight matrices founded the basis<br />

for iteratively build<strong>in</strong>g the f<strong>in</strong>al <strong>in</strong>formation model of the P1 <strong>and</strong> P2 promotor structure,<br />

us<strong>in</strong>g the follow<strong>in</strong>g procedure:<br />

1. E. coli <strong>and</strong> Shigella genomes<br />

108<br />

2. rRNA gene f<strong>in</strong>d<strong>in</strong>g <strong>and</strong> make upstream sequence<br />

3. Apply models based on literature weight matrices<br />

4. Ref<strong>in</strong>e weight matrices accord<strong>in</strong>g to observations<br />

5. Formulate f<strong>in</strong>al model

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!