01.04.2015 Views

Sequence Comparison.pdf

Sequence Comparison.pdf

Sequence Comparison.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7.2 Ungapped Local Alignment Scores 123<br />

A position j in the alignment is called a ladder position if the accumulative score<br />

s j is lower than s i for any 1 ≤ i < j. In Figure 7.1. the ladder positions are indicated<br />

by circles. Consider two consecutive ladder positions a and b, where a < b. Fora<br />

position x between a and b, we define the relative accumulative score at x as<br />

R x = S x − S a ,<br />

which is the difference of the accumulative scores at the positions x and a.<br />

In a random ungapped alignment, the relative accumulative score is a random<br />

variable that can be considered as a random walk process (see Section B.7 for background<br />

knowledge). For example, if a match scores s and occurs with probability<br />

p, then R x can be considered as the distance from 0 in a random walk that moves<br />

right with probability p or left with probability 1 − p and stops at −1 on the state<br />

set {−1,0,1,...}.<br />

The local alignment between a ladder position to the position where the highest<br />

relative accumulative score attains before the next ladder position gives a maximalscoring<br />

segment (MSS) in the alignment. Accordingly, alignment statistics are based<br />

on the estimation of the following two quantities:<br />

(i) The probability distribution of the maximum value that the corresponding random<br />

walk ever achieves before stopping at the absorbing state -1, and<br />

(ii) The mean number of steps before the corresponding random walk first reaches<br />

the absorbing state -1.<br />

7.2.1 Maximum Segment Scores<br />

When two protein sequences are aligned, scores other than the simple scores 1 and<br />

−1 are used for match and mismatches. These scores are taken from a substitution<br />

matrix such as the BLOSUM62 matrix. Because match and mismatches score<br />

a range of integer values, the accumulative score performs a complicated random<br />

walk. We need to apply the advanced random walk theory to study the distribution<br />

of local alignment scores for protein sequences in this section.<br />

Consider a random walk that starts at 0 and whose possible step sizes are<br />

with respective probabilities<br />

such that<br />

−d,−d + 1,...,−1,0,1,...,c − 1,c<br />

p −d , p −d+1 ,...,p −1 , p 0 , p 1 ,...,p c−1 , p c<br />

(i) p −d > 0, p c > 0, and p i ≥ 0, −d < i < c, and<br />

(ii) the mean step size ∑ c j=−d jp j < 0.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!