Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Conservation of regulatory elements<br />
values.<br />
nb,p<br />
Rb,p = log2(4) + log2<br />
N<br />
L<br />
Rtot = RB,p<br />
p=1<br />
(3.1)<br />
–where b ∈ AT GC iterates through the four bases, p denotes the position <strong>in</strong> the<br />
alignment, L is the length of the alignment (or width of the matrix), <strong>and</strong> nb,p is the<br />
number of bases b at position p, <strong>and</strong> B denotes the nucleotide at position p <strong>in</strong> the query<br />
sequence. Shultzaberger <strong>and</strong> co-workers account for the helical fac<strong>in</strong>g by <strong>in</strong>troduc<strong>in</strong>g the<br />
accessibility, n(d) (equation 3.2) <strong>and</strong> the gap surprisal, GS(d) (see equation 3.3).<br />
n(d) = 1 + cos[ 2π<br />
(d − c)] (3.2)<br />
w<br />
–where c is the center distance between two b<strong>in</strong>d<strong>in</strong>g sites (e.g. optimally spaced), d is<br />
the query distance, w = 10.6 is the distance of a one helix turn of B-form DNA. F<strong>in</strong>ally,<br />
this gives GS(d) as follows:<br />
n(d)<br />
GS(d) = log2<br />
N<br />
(3.3)<br />
–where N is the sum of all n(d) (see equation 3.4). The sign of the GS(d) was changed<br />
from the orig<strong>in</strong>al equation described by Shultzaberger <strong>and</strong> co-workers to allow for comb<strong>in</strong><strong>in</strong>g<br />
all scores by addition.<br />
N =<br />
max<br />
<br />
d=m<strong>in</strong><br />
n(d) (3.4)<br />
–where m<strong>in</strong> <strong>and</strong> max are the boundaries of a given w<strong>in</strong>dow exam<strong>in</strong>ed. F<strong>in</strong>ally, summariz<strong>in</strong>g<br />
all Ri <strong>and</strong> GS(d) values gives the total <strong>in</strong>formation of all motifs <strong>and</strong> all spacers (see<br />
figure 3.5)<br />
Ri(tot) = Ri(m1) + GS(d, m1) + Ri(m2) + ... + GS(d, mn−1) + Ri(mn) (3.5)<br />
3.3.1 Model<strong>in</strong>g the P1 <strong>and</strong> P2 <strong>in</strong> selected enterics<br />
Exist<strong>in</strong>g experimentally verified –10 <strong>and</strong> –35 hexamers (Huerta & Collado-Vides, 2003)<br />
were converted <strong>in</strong>to Rb,p matrices together with data for known UP elements (Estrem<br />
et al., 1998) <strong>and</strong> FIS b<strong>in</strong>d<strong>in</strong>g sites (Hengen et al., 1997). Figure 3.4 shows logo plots of<br />
the <strong>in</strong>formation content of these studies. The <strong>in</strong>itial weight matrices founded the basis<br />
for iteratively build<strong>in</strong>g the f<strong>in</strong>al <strong>in</strong>formation model of the P1 <strong>and</strong> P2 promotor structure,<br />
us<strong>in</strong>g the follow<strong>in</strong>g procedure:<br />
1. E. coli <strong>and</strong> Shigella genomes<br />
108<br />
2. rRNA gene f<strong>in</strong>d<strong>in</strong>g <strong>and</strong> make upstream sequence<br />
3. Apply models based on literature weight matrices<br />
4. Ref<strong>in</strong>e weight matrices accord<strong>in</strong>g to observations<br />
5. Formulate f<strong>in</strong>al model