You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
7.2 Ungapped Local Alignment Scores 131<br />
where K is defined in (7.32).<br />
7.2.4 Karlin-Altschul Sum Statistic<br />
When two sequences are aligned, insertions and deletions can break a long alignment<br />
into several parts. If this is the case, focusing on the single highest-scoring<br />
segment could lose useful information. As an option, one may consider the scores<br />
of the multiple highest-scoring segments.<br />
Denote the r disjoint highest segment scores as<br />
M (1) = M(n),M (2) ,...,M (r)<br />
in an alignment with n columns. We consider the normalized scores<br />
S (i) = λM (i) − ln(Kn), i = 1,2,...,r. (7.36)<br />
Karlin and Altschul (1993) showed that the limiting joint density f S (x 1 ,x 2 ,···,x r )<br />
of S =(S (1) ,S (2) ,···,S (r) ) is<br />
(<br />
f S (x 1 ,x 2 ,···,x r )=exp −e −x r<br />
−<br />
r<br />
∑<br />
k=1<br />
x k<br />
)<br />
(7.37)<br />
in the domain x 1 ≥ x 2 ≥ ...≥ x r .<br />
Assessing multiple highest-scoring segments is more involved than it might first<br />
appear. Suppose, for example, comparison X reports two highest scores 108 and 88,<br />
whereas comparison Y reports 99 and 90. One can say that Y is not better than X,<br />
because its high score is lower than that of X. But neither is X considered better,<br />
because the second high score of X is lower than that of Y. The natural way to<br />
rank all the possible results is to consider the sum of the normalized scores of the r<br />
highest-scoring segments<br />
S n,r = S (1) + S (2) + ···+ S (r) (7.38)<br />
as suggested by Karlin and Altschul. This sum is now called the Karlin-Altschul<br />
sum statistic.<br />
Theorem 7.2. The limiting density function of Karlin-Altschul sum S n,r is<br />
f n,r (x)=<br />
e−x ∫ ∞ (<br />
y r−2 exp −e (y−x)/r) dy. (7.39)<br />
r!(r − 2)! 0<br />
Integrating f n,r (x) from t to ∞ gives the tail probability that S n,r ≥ t. This resulting<br />
double integral can be easily calculated numerically. Asymptotically, the tail