01.04.2015 Views

Sequence Comparison.pdf

Sequence Comparison.pdf

Sequence Comparison.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.2 Ungapped Local Alignment Scores 131<br />

where K is defined in (7.32).<br />

7.2.4 Karlin-Altschul Sum Statistic<br />

When two sequences are aligned, insertions and deletions can break a long alignment<br />

into several parts. If this is the case, focusing on the single highest-scoring<br />

segment could lose useful information. As an option, one may consider the scores<br />

of the multiple highest-scoring segments.<br />

Denote the r disjoint highest segment scores as<br />

M (1) = M(n),M (2) ,...,M (r)<br />

in an alignment with n columns. We consider the normalized scores<br />

S (i) = λM (i) − ln(Kn), i = 1,2,...,r. (7.36)<br />

Karlin and Altschul (1993) showed that the limiting joint density f S (x 1 ,x 2 ,···,x r )<br />

of S =(S (1) ,S (2) ,···,S (r) ) is<br />

(<br />

f S (x 1 ,x 2 ,···,x r )=exp −e −x r<br />

−<br />

r<br />

∑<br />

k=1<br />

x k<br />

)<br />

(7.37)<br />

in the domain x 1 ≥ x 2 ≥ ...≥ x r .<br />

Assessing multiple highest-scoring segments is more involved than it might first<br />

appear. Suppose, for example, comparison X reports two highest scores 108 and 88,<br />

whereas comparison Y reports 99 and 90. One can say that Y is not better than X,<br />

because its high score is lower than that of X. But neither is X considered better,<br />

because the second high score of X is lower than that of Y. The natural way to<br />

rank all the possible results is to consider the sum of the normalized scores of the r<br />

highest-scoring segments<br />

S n,r = S (1) + S (2) + ···+ S (r) (7.38)<br />

as suggested by Karlin and Altschul. This sum is now called the Karlin-Altschul<br />

sum statistic.<br />

Theorem 7.2. The limiting density function of Karlin-Altschul sum S n,r is<br />

f n,r (x)=<br />

e−x ∫ ∞ (<br />

y r−2 exp −e (y−x)/r) dy. (7.39)<br />

r!(r − 2)! 0<br />

Integrating f n,r (x) from t to ∞ gives the tail probability that S n,r ≥ t. This resulting<br />

double integral can be easily calculated numerically. Asymptotically, the tail

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!