01.04.2015 Views

Sequence Comparison.pdf

Sequence Comparison.pdf

Sequence Comparison.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

130 7 Local Alignment Statistics<br />

A = E(K 1 ), (7.30)<br />

which is the mean distance between two successive ladder points in the walk. Then<br />

the mean number of ladder points is approximately<br />

A n when n is large (see Section<br />

B.8). Ignoring edge effects, we derive the following asymptotic bounds from<br />

Lemma 7.2 by setting y = y ′ + lnA<br />

λ and m = A n :<br />

{<br />

}<br />

exp − C′ e λ<br />

[<br />

A e−λy′ ≤ Pr M(n) ≤ lnn ]<br />

}<br />

λ + y′ ≤ exp<br />

{− C′<br />

A e−λy′ . (7.31)<br />

Set<br />

K = C′<br />

A . (7.32)<br />

Replacing y ′ by (ln(K)+s)/λ, inequality (7.31) becomes<br />

{<br />

exp −e λ−s} ≤ Pr[M(n) ≤ ln(Kn)+s/λ] ≤ exp { −e −s} ,<br />

or equivalently<br />

exp<br />

{<br />

−e λ−s} ≤ Pr[λM(n) − ln(Kn) ≤ s] ≤ exp { −e −s} . (7.33)<br />

In the BLAST theory, the expression<br />

Y (n)=λM(n) − ln(Kn)<br />

is called the normalized score of the alignment. Hence, the P-value corresponding<br />

to an observed value s of the normalized score is<br />

P-value ≈ 1 − exp { −e −s} . (7.34)<br />

7.2.3 The Number of High-Scoring Segments<br />

By Theorem 7.1 and (7.28), the probability that any maximal-scoring segment has<br />

score s or more is approximately C ′ e −λs . By (7.30), to a close approximation there<br />

are N/A maximal-scoring segments in a fixed alignment of N columns as discussed<br />

in Section 7.2.2. Hence, the expected number of the maximal-scoring segments with<br />

score s or more is approximately<br />

NC ′<br />

A e−λs = NKe −λs , (7.35)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!