Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
7.2 Ungapped Local Alignment Scores 123<br />
A position j in the alignment is called a ladder position if the accumulative score<br />
s j is lower than s i for any 1 ≤ i < j. In Figure 7.1. the ladder positions are indicated<br />
by circles. Consider two consecutive ladder positions a and b, where a < b. Fora<br />
position x between a and b, we define the relative accumulative score at x as<br />
R x = S x − S a ,<br />
which is the difference of the accumulative scores at the positions x and a.<br />
In a random ungapped alignment, the relative accumulative score is a random<br />
variable that can be considered as a random walk process (see Section B.7 for background<br />
knowledge). For example, if a match scores s and occurs with probability<br />
p, then R x can be considered as the distance from 0 in a random walk that moves<br />
right with probability p or left with probability 1 − p and stops at −1 on the state<br />
set {−1,0,1,...}.<br />
The local alignment between a ladder position to the position where the highest<br />
relative accumulative score attains before the next ladder position gives a maximalscoring<br />
segment (MSS) in the alignment. Accordingly, alignment statistics are based<br />
on the estimation of the following two quantities:<br />
(i) The probability distribution of the maximum value that the corresponding random<br />
walk ever achieves before stopping at the absorbing state -1, and<br />
(ii) The mean number of steps before the corresponding random walk first reaches<br />
the absorbing state -1.<br />
7.2.1 Maximum Segment Scores<br />
When two protein sequences are aligned, scores other than the simple scores 1 and<br />
−1 are used for match and mismatches. These scores are taken from a substitution<br />
matrix such as the BLOSUM62 matrix. Because match and mismatches score<br />
a range of integer values, the accumulative score performs a complicated random<br />
walk. We need to apply the advanced random walk theory to study the distribution<br />
of local alignment scores for protein sequences in this section.<br />
Consider a random walk that starts at 0 and whose possible step sizes are<br />
with respective probabilities<br />
such that<br />
−d,−d + 1,...,−1,0,1,...,c − 1,c<br />
p −d , p −d+1 ,...,p −1 , p 0 , p 1 ,...,p c−1 , p c<br />
(i) p −d > 0, p c > 0, and p i ≥ 0, −d < i < c, and<br />
(ii) the mean step size ∑ c j=−d jp j < 0.