Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6.3 Distance between Non-Overlapping Hits 103<br />
Table 6.1 The values of the upper bound in Theorem 6.6 for different w and p after rounding to<br />
the nearest integer).<br />
❅ w 10 11 12 13 14<br />
p ❅<br />
0.6 49 76 121 196 315<br />
0.7 17 21 26 24 44<br />
0.8 11 12 14 15 17<br />
0.9 10 11 12 13 14<br />
Using the above theorem, the following explicit upper bound on µ π can be proved<br />
for non-uniformly spaced seeds π. Its proof is quite involved and so is omitted.<br />
Theorem 6.5. For any non-uniformly spaced seed π,<br />
µ π ≤<br />
w π<br />
∑<br />
i=1<br />
[<br />
]<br />
(1/p) i +(|π|−w π ) − (q/p) (1/p) (w π −2) − 1<br />
6.3.3 Why Do Spaced Seeds Have More Hits?<br />
Recall that, for the consecutive seed θ of weight w, µ θ = ∑ w i=1 ( 1 p )i . By Theorem 6.5,<br />
we have<br />
Theorem 6.6. Let π be a non-uniformly spaced seed and θ the consecutive seed of<br />
the same weight. If |π| < w π + q p [( 1 p )w π −2 − 1], then, µ π < µ θ .<br />
Non-overlapping hit of a spaced seed π is a recurrent event with the following<br />
convention: If a hit at position i is selected as a non-overlapping hit, then the<br />
next non-overlapping hit is the first hit at or after position i + |π|. By (B.45) in<br />
Section B.8, the expected number of the non-overlapping hits of a spaced seed π<br />
in a random sequence of length N is approximately N µ π<br />
. Therefore, if |π| < w π +<br />
q<br />
p [( 1 p )w π −2 − 1] (see Table 6.1 for the values of this bound for p = 0.6,0.7,0.8,0.9<br />
and 10 ≤ w ≤ 14), Theorem 6.6 implies that π has on average more non-overlapping<br />
hits than θ in a long homologous region with sequence similarity p in the Bernoulli<br />
sequence model. Because overlapping hits can only be extended into one local alignment,<br />
the above fact indicates that a homology search program with a good spaced<br />
seed is usually more sensitive than with the consecutive seed (of the same weight)<br />
especially for genome-genome comparison.