Sequence Comparison.pdf

More documents

Recommendations

Info

26 2 Basic Algorithmic Techniques Fig. 2.7 Computing F 10 by a tabular computation. Now let us describe a more efficient dynamic-programming algorithm for this problem. Define S[i] to be the maximum sum of segments ending at position i of A. The value S[i] can be computed by the following recurrence: { ai + max{S[i − 1],0} if i > 1, S[i]= a 1 if i = 1. If S[i − 1] < 0, concatenating a i with its previous elements will give a smaller sum than a i itself. In this case, the maximum-sum segment ending at position i is a i itself. By a tabular computation, each S[i] can be computed in constant time for i from 1 to n, therefore all S values can be computed in O(n) time. During the computation, we record the largest S entry computed so far in order to report where the maximumsum segment ends. We also record the traceback information for each position i so that we can trace back from the end position of the maximum-sum segment to its start position. If S[i − 1] > 0, we need to concatenate with previous elements for a larger sum, therefore the traceback symbol for position i is “←.” Otherwise, “↑” is recorded. Once we have computed all S values, the traceback information is used to construct the maximum-sum segment by starting from the largest S entry and following the arrows until a “↑” is reached. For example, in Figure 2.8, A = 〈3, 2, -6,5,2,-3,6,-4,2〉. By computing from i = 1toi = n, wehaveS = 〈3, 5, -1, 5, 7, 4, 10,6, 8〉. The maximum S entry is S[7] whose value is 10. By backtracking from S[7], we conclude that the maximum-sum segment of A is 〈5, 2, -3, 6〉, whose sum is 10. Fig. 2.8 Finding a maximum-sum segment.
2.4 Dynamic Programming 27 Let prefix sum P[i]=∑ i j=1 a j be the sum of the first i elements. It can be easily seen that ∑ j k=i a k = P[ j] − P[i − 1]. Therefore, if we wish to compute for a given position the maximum-sum segment ending at it, we could just look for a minimum prefix sum ahead of this position. This yields another linear-time algorithm for the maximum-sum segment problem. 2.4.3 Longest Increasing Subsequences Given a sequence of numbers A = 〈a 1 ,a 2 ,...,a n 〉, the longest increasing subsequence problem is to find an increasing subsequence in A whose length is maximum. Without loss of generality, we assume that these numbers are distinct. Formally speaking, given a sequence of distinct real numbers A = 〈a 1 ,a 2 ,...,a n 〉, sequence B = 〈b 1 ,b 2 ,...,b k 〉 is said to be a subsequence of A if there exists a strictly increasing sequence 〈i 1 ,i 2 ,...,i k 〉 of indices of A such that for all j = 1,2,...,k, wehave a i j = b j . In other words, B is obtained by deleting zero or more elements from A. We say that the subsequence B is increasing if b 1 < b 2 < ... < b k . The longest increasing subsequence problem is to find a maximum-length increasing subsequence of A. For example, suppose A = 〈4, 8, 2, 7, 3, 6, 9, 1, 10, 5〉, both 〈2,3,6〉 and 〈2,7,9,10〉 are increasing subsequences of A, whereas 〈8,7,9〉 (not increasing) and 〈2,3,5,7〉 (not a subsequence) are not. Note that we may have more than one longest increasing subsequence, so we use “a longest increasing subsequence” instead of “the longest increasing subsequence.” Let L[i] be the length of a longest increasing subsequence ending at position i.They can be computed by the following recurrence: { 1 + max L[i]= j=0,...,i−1 {L[ j] | a j < a i } if i > 0, 0 if i = 0. Here we assume that a 0 is a dummy element and smaller than any element in A, and L[0] is equal to 0. By tabular computation for every i from 1 to n, each L[i] can be computed in O(i) steps. Therefore, they require in total ∑ n i=1 O(i)=O(n2 ) steps. For each position i, we use an array P to record the index of the best previous element for the current element to concatenate with. By tracing back from the element with the largest L value, we derive a longest increasing subsequence. Figure 2.9 illustrates the process of finding a longest increasing subsequence of A = 〈4, 8, 2, 7, 3, 6, 9, 1, 10, 5〉. Takei = 4 for instance, where a 4 = 7. Its previous smaller elements are a 1 and a 3 , both with L value equaling 1. Therefore, we have L[4] =L[1]+1 = 2, meaning that the length of a longest increasing subsequence ending at position 4 is of length 2. Indeed, both 〈a 1 ,a 4 〉 and 〈a 3 ,a 4 〉 are an increasing subsequence ending at position 4. In order to trace back the solution, we use array P to record which entry contributes the maximum to the current L value. Thus, P[4] can be 1 (standing for a 1 ) or 3 (standing for a 3 ). Once we have computed all L and
Page 2 and 3: Computational Biology Editors-in-Ch
Page 4 and 5: Kun-Mao Chao·Louxin Zhang Sequence
Page 6 and 7: KMC: To Daddy, Mommy, Pei-Pei and L
Page 8 and 9: viii Foreword I invite you to study
Page 10 and 11: x Preface Chapters 2 to 5 form the
Page 12 and 13: Acknowledgments We are extremely gr
Page 14 and 15: Contents Foreword .................
Page 16 and 17: Contents xix Part II. Theory ......
Page 18 and 19: Chapter 1 Introduction 1.1 Biologic
Page 20 and 21: 1.2 Alignment: A Model for Sequence
Page 22 and 23: 1.2 Alignment: A Model for Sequence
Page 24 and 25: 1.3 Scoring Alignment 7 ( ) k m a k
Page 26 and 27: 1.4 Computing Sequence Alignment 9
Page 28 and 29: 1.5 Multiple Alignment 11 1.5 Multi
Page 30 and 31: 1.8 Bibliographic Notes and Further
Page 32 and 33: PART I. ALGORITHMS AND TECHNIQUES 1
Page 34 and 35: 18 2 Basic Algorithmic Techniques 2
Page 36 and 37: 20 2 Basic Algorithmic Techniques F
Page 38 and 39: 22 2 Basic Algorithmic Techniques s
Page 40 and 41: 24 2 Basic Algorithmic Techniques F
Page 44 and 45: 28 2 Basic Algorithmic Techniques P
Page 46 and 47: 30 2 Basic Algorithmic Techniques a
Page 48 and 49: 32 2 Basic Algorithmic Techniques O
Page 50 and 51: Chapter 3 Pairwise Sequence Alignme
Page 52 and 53: 3.3 Global Alignment 37 3.2 Dot Mat
Page 54 and 55: 3.3 Global Alignment 39 ⎧ ⎨ S[i
Page 56 and 57: 3.3 Global Alignment 41 ( ai b j )
Page 58 and 59: 3.4 Local Alignment 43 ⎧ 0, ⎪
Page 60 and 61: 3.4 Local Alignment 45 Algorithm LO
Page 62 and 63: 3.5 Various Scoring Schemes 47 Fig.
Page 64 and 65: 3.6 Space-Saving Strategies 49 Fig.
Page 66 and 67: 3.6 Space-Saving Strategies 51 Algo
Page 68 and 69: 3.6 Space-Saving Strategies 53 scor
Page 70 and 71: 3.7 Other Advanced Topics 55 ning,
Page 72 and 73: 3.7 Other Advanced Topics 57 (0,0).
Page 74 and 75: 3.7 Other Advanced Topics 59 3.7.4
Page 76 and 77: 3.8 Bibliographic Notes and Further
Page 78 and 79: Chapter 4 Homology Search Tools The
Page 80 and 81: 4.1 Finding Exact Word Matches 65 F
Page 82 and 83: 4.1 Finding Exact Word Matches 67 F
Page 84 and 85: 4.3 BLAST 69 SALSDLHAHKLRVDPVNFKLLS
Page 86 and 87: 4.3 BLAST 71 length w, whereas for
Page 88 and 89: 4.3 BLAST 73 Fig. 4.9 A scenario of
Page 90 and 91: 4.5 PatternHunter 75 BLAT identifie
Page 92 and 93:
4.5 PatternHunter 77 can develop an
Page 94 and 95:
4.6 Bibliographic Notes and Further
Page 96 and 97:
82 5 Multiple Sequence Alignment S
Page 98 and 99:
84 5 Multiple Sequence Alignment S
Page 100 and 101:
86 5 Multiple Sequence Alignment Fi
Page 102 and 103:
88 5 Multiple Sequence Alignment al
Page 104 and 105:
Chapter 6 Anatomy of Spaced Seeds B
Page 106 and 107:
6.2 Basic Formulas on Hit Probabili
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
6.3 Distance between Non-Overlappin
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
6.4 Asymptotic Analysis of Hit Prob
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
6.5 Spaced Seed Selection 111 Count
Page 126 and 127:
6.6 Generalizations of Spaced Seeds
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
120 7 Local Alignment Statistics tr
Page 134 and 135:
122 7 Local Alignment Statistics Fi
Page 136 and 137:
124 7 Local Alignment Statistics Le
Page 138 and 139:
126 7 Local Alignment Statistics Be
Page 140 and 141:
128 7 Local Alignment Statistics we
Page 142 and 143:
130 7 Local Alignment Statistics A
Page 144 and 145:
132 7 Local Alignment Statistics pr
Page 146 and 147:
134 7 Local Alignment Statistics wh
Page 148 and 149:
136 7 Local Alignment Statistics Ta
Page 150 and 151:
138 7 Local Alignment Statistics Be
Page 152 and 153:
140 7 Local Alignment Statistics 7.
Page 154 and 155:
Page 156 and 157:
144 7 Local Alignment Statistics al
Page 158 and 159:
Page 160 and 161:
Chapter 8 Scoring Matrices With the
Page 162 and 163:
8.1 The PAM Scoring Matrices 151 AB
Page 164 and 165:
8.2 The BLOSUM Scoring Matrices 153
Page 166 and 167:
8.3 General Form of the Scoring Mat
Page 168 and 169:
8.4 How to Select a Scoring Matrix?
Page 170 and 171:
8.5 Compositional Adjustment of Sco
Page 172 and 173:
8.6 DNA Scoring Matrices 161 This i
Page 174 and 175:
8.7 Gap Cost in Gapped Alignments 1
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Appendix A Basic Concepts in Molecu
Page 186 and 187:
A.4 The Genomes 175 ondary structur
Page 188 and 189:
Appendix B Elementary Probability T
Page 190 and 191:
B.3 Major Discrete Distributions 17
Page 192 and 193:
B.3 Major Discrete Distributions 18
Page 194 and 195:
B.5 Mean, Variance, and Moments 183
Page 196 and 197:
B.5 Mean, Variance, and Moments 185
Page 198 and 199:
B.6 Relative Entropy of Probability
Page 200 and 201:
B.7 Discrete-time Finite Markov Cha
Page 202 and 203:
B.8 Recurrent Events and the Renewa
Page 204 and 205:
B.8 Recurrent Events and the Renewa
Page 206 and 207:
196 C Software Packages for Sequenc
Page 208 and 209:
198 References 19. Bafna, V. and Pe
Page 210 and 211:
200 References 71. Fitch, W.M. and
Page 212 and 213:
202 References 122. Letunic, I., Co
Page 214 and 215:
204 References 173. Robinson, A.B.
Page 216 and 217:
Index O-notation, 18 P-value, 139 a
Page 218:
Index 209 heuristic, 85 progressive
show all

Sequence Comparison.pdf

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?