01.04.2015 Views

Sequence Comparison.pdf

Sequence Comparison.pdf

Sequence Comparison.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.2 Basic Formulas on Hit Probability 95<br />

s[i]s[k + i]s[2k + i]···s[(l − 1)k + i], i = r + 1,r + 2,...,k − 1,<br />

where l = ⌊ n k<br />

⌋,r = n − kl − 1. Because π hits the first r + 1 sequences with probability<br />

Π l+1 and the last k − 1 − r sequences with probability Π l , we have that<br />

For any i, j ≥|π|, ( ∩ i−1<br />

t=0Āt<br />

subevent of ( ∩ i−1 ) (<br />

0<br />

Ā t=0 ∩<br />

¯Π n ′ =(¯Π l+1 ) r+1 ( ¯Π l ) k−1−r . (6.4)<br />

)<br />

i+ j−1<br />

are independent and ∩t=0<br />

Ā t is a<br />

)<br />

∩ . Hence,<br />

)<br />

and<br />

i+ j−1<br />

(∩<br />

(∩ ) (<br />

i−1<br />

¯Π i ¯Π j = Pr[<br />

0<br />

Ā t=0 ∩ ∩<br />

t=i+|π|−1Āt<br />

i+ j−1<br />

t=i+|π|−1Āt<br />

i+ j−1<br />

t=i+|π|−1Āt<br />

)] [<br />

> Pr<br />

∩<br />

i+ j−1<br />

t=0<br />

Ā t<br />

]<br />

= ¯Π i+ j .<br />

Hence, formula (6.4) implies that ¯Π ′ n > ¯Π n or equivalently Π ′ n < Π n for any n ≥|π ′ |.<br />

6.2.1 A Recurrence System for Hit Probability<br />

We have shown that the non-hit probability of a consecutive seed θ satisfies equation<br />

(6.3). Given a consecutive seed θ and n > |θ|, it takes linear-time to compute<br />

the hit probability Θ n . However, calculating the hit probability for an arbitrary seed<br />

is rather complicated. In this section, we generalize the recurrence relation (6.3) to<br />

a recurrence system in the general case.<br />

For a spaced seed π,wesetm = 2 |π|−w π<br />

.LetW π be the set of all m distinct strings<br />

obtained from π by filling 0 or 1 in the ∗’s positions. For example, for π = 1∗11∗1,<br />

W π = {101101,101111,111101,111111}.<br />

The seed π hits the random sequence R at position n − 1 if and only if a unique<br />

W j ∈ W π occurs at the position. For each j, letB ( n<br />

j) denote the event that W j occurs<br />

at the position n − 1. Because A n denotes the event that π hits the sequences R at<br />

position n − 1, we have that A n = ∪ 1≤ j≤m B ( n j) and B ( n<br />

j) ’s are disjoint. Setting<br />

]<br />

π n<br />

( j) = Pr<br />

[Ā0 Ā 1 ···Ā n−2 B ( j)<br />

n−1<br />

, j = 1,2,···,m.<br />

We have<br />

and hence formula (6.2) becomes<br />

π n =<br />

∑<br />

1≤ j≤m<br />

π ( j)<br />

n<br />

¯Π n = ¯Π n−1 − π n (1) − π n<br />

(2) −···−π n (m) . (6.5)<br />

Recall that, for any W j ∈ W π and a,b such that 0 ≤ a < b ≤|π|−1, W j [a,b] denotes<br />

the substring of W j from position a to position b inclusively. For a string s, weuse

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!