Sequence Comparison.pdf

More documents

Recommendations

Info

6.3 Distance between Non-Overlapping Hits 101 and By Theorem 6.3, C 11 (x)= |π| ∑ k=1 p |π|−k x |π|−k , w−1 A θ =[ p i ], ∑ i=0 [ ] 0 1 M θ = −p w ∑i=0 w−1 . pi µ θ = w ∑ i=1 (1/p) i . Example 6.5. Continue Example 6.3. For the spaced seed π = 1 a ∗1 b , a ≥ b ≥ 1, we have ⎡ A π = ⎣ ∑b−1 i=0 pa+i q + 1 ∑ a−1 ⎤ i=0 pb+i q ⎦. ∑ b−1 i=0 pa+1+i ∑ a+b i=0 pi Therefore, µ π = ∑a+b i=0 pi + ∑ b i=0 ∑ b−1 j=0 pa+i+ j q p a+b (1 + p(1 − p b . )) 6.3.2 An Upper Bound for µ π A spaced seed is uniform if its matching positions form an arithmetic sequence. For example, 1**1**1 is uniform with matching position set {0,3,6} in which the difference between two successive positions is 3. The unique spaced seed of weight 2 and length m is 1 ∗ m−2 1. Therefore, all the spaced seeds of weight 2 are uniform. In general, a uniform seed is of form (1∗ k ) l 1, l ≥ 1 and k ≥ 0. In Example 6.2, we have showed that Π i ≤ Θ i for any uniformly spaced seed π and the consecutive seed θ of the same weight. By (6.9), µ π ≥ µ θ . Now we consider non-uniformly spaced seeds. We have proved that µ θ = ∑ w θ i=1 (1/p)i for consecutive seed θ. For any spaced seed π, by definition, µ π ≥|π|. Thus, for any fixed probability p and the consecutive seed θ of the same weight, µ π can be larger than µ θ when the length |π| of π is large. In this subsection, we shall show that when |π| is not too big, µ π is smaller than µ θ for any non-uniformly spaced seed π. For any 0 ≤ j ≤|π|−1, define RP(π)+ j = {i 1 + j,i 2 + j,···,i wπ + j} and let o π ( j)=|RP(π) ∩ (RP(π)+ j)|.
102 6 Anatomy of Spaced Seeds Then, o π ( j) is the number of 1’s that coincide between the seed and the jth shifted version of it. Trivially, o π (0)=w π and o π (|π|−1)=1 hold for any spaced seed π. Theorem 6.4. For any spaced seed π, |π|−1 µ π ≤ ∑ i=0 (1/p) o π (i) . Proof. Noticed that the equality holds for the consecutive seeds. Recall that A j denotes the event that the seed π hits the random sequence at position j and Ā j the complement of A j .Letm π ( j)=w π − o π ( j).Wehave Pr [ A n−1 |A n− j−1 ] = p m π ( j) for any n and j ≤|π|. Because A n−1 is negatively correlated with the joint event Ā 0 Ā 1 ···Ā n− j−2 , Pr [ Ā 0 Ā 1 ···Ā n− j−2 |A n− j−1 A n−1 ] ≤ Pr [Ā0 Ā 1 ···Ā n− j−2 |A n− j−1 ] for any n and j ≤|π|. Combining these two formulas together, we have Pr [ Ā 0 Ā 1 ···Ā n− j−2 A n− j−1 A n−1 ] = Pr [ A n− j−1 A n−1 ] Pr [Ā0 Ā 1 ···Ā n− j−2 |A n− j−1 A n−1 ] ≤ Pr [ A n−1 |A n− j−1 ]{ Pr [ An− j−1 ] Pr [Ā0 Ā 1 ···Ā n− j−2 |A n− j−1 ]} = p m π ( j) π n− j Therefore, for any n ≥|π|, ¯Π n−|π| p w π = Pr[Ā 0 Ā 1 ···Ā n−|π|−1 A n−1 ] = Pr[Ā 0 Ā 1 ···Ā n−2 A n−1 ]+∑ |π|−1 i=1 Pr[Ā 0 Ā 1 ···Ā n−|π|+i−2 A n−|π|+i−1 A n−1 ] ≤ π n + ∑ |π|−1 i=1 p m π (i) π n−|π|+i where we assume ¯Π j = 1for j ≤|π|−1. Summing the above inequality over n and noting that ∑ ∞ i=|π| π i = Pr[∪ i A i ]=1, or µ π p w π = ∞ ∑ n=0 |π|−1 µ π ≤ |π|−1 ¯Π n p w π ≤ 1 + ∑ i=0 ∑ i=1 |π|−1 p m π (i)−w π = |π|−1 p m π (i) = ∑ i=0 ∑ i=0 (1/p) o π (i) . p m π (i) ⊓⊔
Page 2 and 3:
Computational Biology Editors-in-Ch
Page 4 and 5:
Kun-Mao Chao·Louxin Zhang Sequence
Page 6 and 7:
KMC: To Daddy, Mommy, Pei-Pei and L
Page 8 and 9:
viii Foreword I invite you to study
Page 10 and 11:
x Preface Chapters 2 to 5 form the
Page 12 and 13:
Acknowledgments We are extremely gr
Page 14 and 15:
Contents Foreword .................
Page 16 and 17:
Contents xix Part II. Theory ......
Page 18 and 19:
Chapter 1 Introduction 1.1 Biologic
Page 20 and 21:
1.2 Alignment: A Model for Sequence
Page 22 and 23:
1.2 Alignment: A Model for Sequence
Page 24 and 25:
1.3 Scoring Alignment 7 ( ) k m a k
Page 26 and 27:
1.4 Computing Sequence Alignment 9
Page 28 and 29:
1.5 Multiple Alignment 11 1.5 Multi
Page 30 and 31:
1.8 Bibliographic Notes and Further
Page 32 and 33:
PART I. ALGORITHMS AND TECHNIQUES 1
Page 34 and 35:
18 2 Basic Algorithmic Techniques 2
Page 36 and 37:
20 2 Basic Algorithmic Techniques F
Page 38 and 39:
22 2 Basic Algorithmic Techniques s
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
28 2 Basic Algorithmic Techniques P
Page 46 and 47:
30 2 Basic Algorithmic Techniques a
Page 48 and 49:
32 2 Basic Algorithmic Techniques O
Page 50 and 51:
Chapter 3 Pairwise Sequence Alignme
Page 52 and 53:
3.3 Global Alignment 37 3.2 Dot Mat
Page 54 and 55:
3.3 Global Alignment 39 ⎧ ⎨ S[i
Page 56 and 57:
3.3 Global Alignment 41 ( ai b j )
Page 58 and 59:
3.4 Local Alignment 43 ⎧ 0, ⎪
Page 60 and 61:
3.4 Local Alignment 45 Algorithm LO
Page 62 and 63:
3.5 Various Scoring Schemes 47 Fig.
Page 64 and 65: 3.6 Space-Saving Strategies 49 Fig.
Page 66 and 67: 3.6 Space-Saving Strategies 51 Algo
Page 68 and 69: 3.6 Space-Saving Strategies 53 scor
Page 70 and 71: 3.7 Other Advanced Topics 55 ning,
Page 72 and 73: 3.7 Other Advanced Topics 57 (0,0).
Page 74 and 75: 3.7 Other Advanced Topics 59 3.7.4
Page 76 and 77: 3.8 Bibliographic Notes and Further
Page 78 and 79: Chapter 4 Homology Search Tools The
Page 80 and 81: 4.1 Finding Exact Word Matches 65 F
Page 82 and 83: 4.1 Finding Exact Word Matches 67 F
Page 84 and 85: 4.3 BLAST 69 SALSDLHAHKLRVDPVNFKLLS
Page 86 and 87: 4.3 BLAST 71 length w, whereas for
Page 88 and 89: 4.3 BLAST 73 Fig. 4.9 A scenario of
Page 90 and 91: 4.5 PatternHunter 75 BLAT identifie
Page 92 and 93: 4.5 PatternHunter 77 can develop an
Page 96 and 97: 82 5 Multiple Sequence Alignment S
Page 98 and 99: 84 5 Multiple Sequence Alignment S
Page 100 and 101: 86 5 Multiple Sequence Alignment Fi
Page 102 and 103: 88 5 Multiple Sequence Alignment al
Page 104 and 105: Chapter 6 Anatomy of Spaced Seeds B
Page 106 and 107: 6.2 Basic Formulas on Hit Probabili
Page 112 and 113: 6.3 Distance between Non-Overlappin
Page 116 and 117: 6.3 Distance between Non-Overlappin
Page 118 and 119: 6.4 Asymptotic Analysis of Hit Prob
Page 124 and 125: 6.5 Spaced Seed Selection 111 Count
Page 126 and 127: 6.6 Generalizations of Spaced Seeds
Page 132 and 133: 120 7 Local Alignment Statistics tr
Page 134 and 135: 122 7 Local Alignment Statistics Fi
Page 136 and 137: 124 7 Local Alignment Statistics Le
Page 138 and 139: 126 7 Local Alignment Statistics Be
Page 140 and 141: 128 7 Local Alignment Statistics we
Page 142 and 143: 130 7 Local Alignment Statistics A
Page 144 and 145: 132 7 Local Alignment Statistics pr
Page 146 and 147: 134 7 Local Alignment Statistics wh
Page 148 and 149: 136 7 Local Alignment Statistics Ta
Page 150 and 151: 138 7 Local Alignment Statistics Be
Page 152 and 153: 140 7 Local Alignment Statistics 7.
Page 156 and 157: 144 7 Local Alignment Statistics al
Page 160 and 161: Chapter 8 Scoring Matrices With the
Page 162 and 163: 8.1 The PAM Scoring Matrices 151 AB
Page 164 and 165:
8.2 The BLOSUM Scoring Matrices 153
Page 166 and 167:
8.3 General Form of the Scoring Mat
Page 168 and 169:
8.4 How to Select a Scoring Matrix?
Page 170 and 171:
8.5 Compositional Adjustment of Sco
Page 172 and 173:
8.6 DNA Scoring Matrices 161 This i
Page 174 and 175:
8.7 Gap Cost in Gapped Alignments 1
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Appendix A Basic Concepts in Molecu
Page 186 and 187:
A.4 The Genomes 175 ondary structur
Page 188 and 189:
Appendix B Elementary Probability T
Page 190 and 191:
B.3 Major Discrete Distributions 17
Page 192 and 193:
B.3 Major Discrete Distributions 18
Page 194 and 195:
B.5 Mean, Variance, and Moments 183
Page 196 and 197:
B.5 Mean, Variance, and Moments 185
Page 198 and 199:
B.6 Relative Entropy of Probability
Page 200 and 201:
B.7 Discrete-time Finite Markov Cha
Page 202 and 203:
B.8 Recurrent Events and the Renewa
Page 204 and 205:
B.8 Recurrent Events and the Renewa
Page 206 and 207:
196 C Software Packages for Sequenc
Page 208 and 209:
198 References 19. Bafna, V. and Pe
Page 210 and 211:
200 References 71. Fitch, W.M. and
Page 212 and 213:
202 References 122. Letunic, I., Co
Page 214 and 215:
204 References 173. Robinson, A.B.
Page 216 and 217:
Index O-notation, 18 P-value, 139 a
Page 218:
Index 209 heuristic, 85 progressive
show all

Sequence Comparison.pdf

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?