Sequence Comparison.pdf

More documents

Recommendations

Info

130 7 Local Alignment Statistics A = E(K 1 ), (7.30) which is the mean distance between two successive ladder points in the walk. Then the mean number of ladder points is approximately A n when n is large (see Section B.8). Ignoring edge effects, we derive the following asymptotic bounds from Lemma 7.2 by setting y = y ′ + lnA λ and m = A n : { } exp − C′ e λ [ A e−λy′ ≤ Pr M(n) ≤ lnn ] } λ + y′ ≤ exp {− C′ A e−λy′ . (7.31) Set K = C′ A . (7.32) Replacing y ′ by (ln(K)+s)/λ, inequality (7.31) becomes { exp −e λ−s} ≤ Pr[M(n) ≤ ln(Kn)+s/λ] ≤ exp { −e −s} , or equivalently exp { −e λ−s} ≤ Pr[λM(n) − ln(Kn) ≤ s] ≤ exp { −e −s} . (7.33) In the BLAST theory, the expression Y (n)=λM(n) − ln(Kn) is called the normalized score of the alignment. Hence, the P-value corresponding to an observed value s of the normalized score is P-value ≈ 1 − exp { −e −s} . (7.34) 7.2.3 The Number of High-Scoring Segments By Theorem 7.1 and (7.28), the probability that any maximal-scoring segment has score s or more is approximately C ′ e −λs . By (7.30), to a close approximation there are N/A maximal-scoring segments in a fixed alignment of N columns as discussed in Section 7.2.2. Hence, the expected number of the maximal-scoring segments with score s or more is approximately NC ′ A e−λs = NKe −λs , (7.35)
7.2 Ungapped Local Alignment Scores 131 where K is defined in (7.32). 7.2.4 Karlin-Altschul Sum Statistic When two sequences are aligned, insertions and deletions can break a long alignment into several parts. If this is the case, focusing on the single highest-scoring segment could lose useful information. As an option, one may consider the scores of the multiple highest-scoring segments. Denote the r disjoint highest segment scores as M (1) = M(n),M (2) ,...,M (r) in an alignment with n columns. We consider the normalized scores S (i) = λM (i) − ln(Kn), i = 1,2,...,r. (7.36) Karlin and Altschul (1993) showed that the limiting joint density f S (x 1 ,x 2 ,···,x r ) of S =(S (1) ,S (2) ,···,S (r) ) is ( f S (x 1 ,x 2 ,···,x r )=exp −e −x r − r ∑ k=1 x k ) (7.37) in the domain x 1 ≥ x 2 ≥ ...≥ x r . Assessing multiple highest-scoring segments is more involved than it might first appear. Suppose, for example, comparison X reports two highest scores 108 and 88, whereas comparison Y reports 99 and 90. One can say that Y is not better than X, because its high score is lower than that of X. But neither is X considered better, because the second high score of X is lower than that of Y. The natural way to rank all the possible results is to consider the sum of the normalized scores of the r highest-scoring segments S n,r = S (1) + S (2) + ···+ S (r) (7.38) as suggested by Karlin and Altschul. This sum is now called the Karlin-Altschul sum statistic. Theorem 7.2. The limiting density function of Karlin-Altschul sum S n,r is f n,r (x)= e−x ∫ ∞ ( y r−2 exp −e (y−x)/r) dy. (7.39) r!(r − 2)! 0 Integrating f n,r (x) from t to ∞ gives the tail probability that S n,r ≥ t. This resulting double integral can be easily calculated numerically. Asymptotically, the tail
Page 2 and 3:
Computational Biology Editors-in-Ch
Page 4 and 5:
Kun-Mao Chao·Louxin Zhang Sequence
Page 6 and 7:
KMC: To Daddy, Mommy, Pei-Pei and L
Page 8 and 9:
viii Foreword I invite you to study
Page 10 and 11:
x Preface Chapters 2 to 5 form the
Page 12 and 13:
Acknowledgments We are extremely gr
Page 14 and 15:
Contents Foreword .................
Page 16 and 17:
Contents xix Part II. Theory ......
Page 18 and 19:
Chapter 1 Introduction 1.1 Biologic
Page 20 and 21:
1.2 Alignment: A Model for Sequence
Page 22 and 23:
1.2 Alignment: A Model for Sequence
Page 24 and 25:
1.3 Scoring Alignment 7 ( ) k m a k
Page 26 and 27:
1.4 Computing Sequence Alignment 9
Page 28 and 29:
1.5 Multiple Alignment 11 1.5 Multi
Page 30 and 31:
1.8 Bibliographic Notes and Further
Page 32 and 33:
PART I. ALGORITHMS AND TECHNIQUES 1
Page 34 and 35:
18 2 Basic Algorithmic Techniques 2
Page 36 and 37:
20 2 Basic Algorithmic Techniques F
Page 38 and 39:
22 2 Basic Algorithmic Techniques s
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
28 2 Basic Algorithmic Techniques P
Page 46 and 47:
30 2 Basic Algorithmic Techniques a
Page 48 and 49:
32 2 Basic Algorithmic Techniques O
Page 50 and 51:
Chapter 3 Pairwise Sequence Alignme
Page 52 and 53:
3.3 Global Alignment 37 3.2 Dot Mat
Page 54 and 55:
3.3 Global Alignment 39 ⎧ ⎨ S[i
Page 56 and 57:
3.3 Global Alignment 41 ( ai b j )
Page 58 and 59:
3.4 Local Alignment 43 ⎧ 0, ⎪
Page 60 and 61:
3.4 Local Alignment 45 Algorithm LO
Page 62 and 63:
3.5 Various Scoring Schemes 47 Fig.
Page 64 and 65:
3.6 Space-Saving Strategies 49 Fig.
Page 66 and 67:
3.6 Space-Saving Strategies 51 Algo
Page 68 and 69:
3.6 Space-Saving Strategies 53 scor
Page 70 and 71:
3.7 Other Advanced Topics 55 ning,
Page 72 and 73:
3.7 Other Advanced Topics 57 (0,0).
Page 74 and 75:
3.7 Other Advanced Topics 59 3.7.4
Page 76 and 77:
3.8 Bibliographic Notes and Further
Page 78 and 79:
Chapter 4 Homology Search Tools The
Page 80 and 81:
4.1 Finding Exact Word Matches 65 F
Page 82 and 83:
4.1 Finding Exact Word Matches 67 F
Page 84 and 85:
4.3 BLAST 69 SALSDLHAHKLRVDPVNFKLLS
Page 86 and 87:
4.3 BLAST 71 length w, whereas for
Page 88 and 89:
4.3 BLAST 73 Fig. 4.9 A scenario of
Page 90 and 91:
4.5 PatternHunter 75 BLAT identifie
Page 92 and 93: 4.5 PatternHunter 77 can develop an
Page 94 and 95: 4.6 Bibliographic Notes and Further
Page 96 and 97: 82 5 Multiple Sequence Alignment S
Page 98 and 99: 84 5 Multiple Sequence Alignment S
Page 100 and 101: 86 5 Multiple Sequence Alignment Fi
Page 102 and 103: 88 5 Multiple Sequence Alignment al
Page 104 and 105: Chapter 6 Anatomy of Spaced Seeds B
Page 106 and 107: 6.2 Basic Formulas on Hit Probabili
Page 112 and 113: 6.3 Distance between Non-Overlappin
Page 118 and 119: 6.4 Asymptotic Analysis of Hit Prob
Page 124 and 125: 6.5 Spaced Seed Selection 111 Count
Page 126 and 127: 6.6 Generalizations of Spaced Seeds
Page 132 and 133: 120 7 Local Alignment Statistics tr
Page 134 and 135: 122 7 Local Alignment Statistics Fi
Page 136 and 137: 124 7 Local Alignment Statistics Le
Page 138 and 139: 126 7 Local Alignment Statistics Be
Page 140 and 141: 128 7 Local Alignment Statistics we
Page 144 and 145: 132 7 Local Alignment Statistics pr
Page 146 and 147: 134 7 Local Alignment Statistics wh
Page 148 and 149: 136 7 Local Alignment Statistics Ta
Page 150 and 151: 138 7 Local Alignment Statistics Be
Page 152 and 153: 140 7 Local Alignment Statistics 7.
Page 156 and 157: 144 7 Local Alignment Statistics al
Page 160 and 161: Chapter 8 Scoring Matrices With the
Page 162 and 163: 8.1 The PAM Scoring Matrices 151 AB
Page 164 and 165: 8.2 The BLOSUM Scoring Matrices 153
Page 166 and 167: 8.3 General Form of the Scoring Mat
Page 168 and 169: 8.4 How to Select a Scoring Matrix?
Page 170 and 171: 8.5 Compositional Adjustment of Sco
Page 172 and 173: 8.6 DNA Scoring Matrices 161 This i
Page 174 and 175: 8.7 Gap Cost in Gapped Alignments 1
Page 184 and 185: Appendix A Basic Concepts in Molecu
Page 186 and 187: A.4 The Genomes 175 ondary structur
Page 188 and 189: Appendix B Elementary Probability T
Page 190 and 191: B.3 Major Discrete Distributions 17
Page 192 and 193:
B.3 Major Discrete Distributions 18
Page 194 and 195:
B.5 Mean, Variance, and Moments 183
Page 196 and 197:
B.5 Mean, Variance, and Moments 185
Page 198 and 199:
B.6 Relative Entropy of Probability
Page 200 and 201:
B.7 Discrete-time Finite Markov Cha
Page 202 and 203:
B.8 Recurrent Events and the Renewa
Page 204 and 205:
B.8 Recurrent Events and the Renewa
Page 206 and 207:
196 C Software Packages for Sequenc
Page 208 and 209:
198 References 19. Bafna, V. and Pe
Page 210 and 211:
200 References 71. Fitch, W.M. and
Page 212 and 213:
202 References 122. Letunic, I., Co
Page 214 and 215:
204 References 173. Robinson, A.B.
Page 216 and 217:
Index O-notation, 18 P-value, 139 a
Page 218:
Index 209 heuristic, 85 progressive
show all

Sequence Comparison.pdf

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?