26 2 Basic Algorithmic Techniques Fig. 2.7 Computing F 10 by a tabular computation. Now let us describe a more efficient dynamic-programming algorithm for this problem. Define S[i] to be the maximum sum of segments ending at position i of A. The value S[i] can be computed by the following recurrence: { ai + max{S[i − 1],0} if i > 1, S[i]= a 1 if i = 1. If S[i − 1] < 0, concatenating a i with its previous elements will give a smaller sum than a i itself. In this case, the maximum-sum segment ending at position i is a i itself. By a tabular computation, each S[i] can be computed in constant time for i from 1 to n, therefore all S values can be computed in O(n) time. During the computation, we record the largest S entry computed so far in order to report where the maximumsum segment ends. We also record the traceback information for each position i so that we can trace back from the end position of the maximum-sum segment to its start position. If S[i − 1] > 0, we need to concatenate with previous elements for a larger sum, therefore the traceback symbol for position i is “←.” Otherwise, “↑” is recorded. Once we have computed all S values, the traceback information is used to construct the maximum-sum segment by starting from the largest S entry and following the arrows until a “↑” is reached. For example, in Figure 2.8, A = 〈3, 2, -6,5,2,-3,6,-4,2〉. By computing from i = 1toi = n, wehaveS = 〈3, 5, -1, 5, 7, 4, 10,6, 8〉. The maximum S entry is S[7] whose value is 10. By backtracking from S[7], we conclude that the maximum-sum segment of A is 〈5, 2, -3, 6〉, whose sum is 10. Fig. 2.8 Finding a maximum-sum segment.
2.4 Dynamic Programming 27 Let prefix sum P[i]=∑ i j=1 a j be the sum of the first i elements. It can be easily seen that ∑ j k=i a k = P[ j] − P[i − 1]. Therefore, if we wish to compute for a given position the maximum-sum segment ending at it, we could just look for a minimum prefix sum ahead of this position. This yields another linear-time algorithm for the maximum-sum segment problem. 2.4.3 Longest Increasing Subsequences Given a sequence of numbers A = 〈a 1 ,a 2 ,...,a n 〉, the longest increasing subsequence problem is to find an increasing subsequence in A whose length is maximum. Without loss of generality, we assume that these numbers are distinct. Formally speaking, given a sequence of distinct real numbers A = 〈a 1 ,a 2 ,...,a n 〉, sequence B = 〈b 1 ,b 2 ,...,b k 〉 is said to be a subsequence of A if there exists a strictly increasing sequence 〈i 1 ,i 2 ,...,i k 〉 of indices of A such that for all j = 1,2,...,k, wehave a i j = b j . In other words, B is obtained by deleting zero or more elements from A. We say that the subsequence B is increasing if b 1 < b 2 < ... < b k . The longest increasing subsequence problem is to find a maximum-length increasing subsequence of A. For example, suppose A = 〈4, 8, 2, 7, 3, 6, 9, 1, 10, 5〉, both 〈2,3,6〉 and 〈2,7,9,10〉 are increasing subsequences of A, whereas 〈8,7,9〉 (not increasing) and 〈2,3,5,7〉 (not a subsequence) are not. Note that we may have more than one longest increasing subsequence, so we use “a longest increasing subsequence” instead of “the longest increasing subsequence.” Let L[i] be the length of a longest increasing subsequence ending at position i.They can be computed by the following recurrence: { 1 + max L[i]= j=0,...,i−1 {L[ j] | a j < a i } if i > 0, 0 if i = 0. Here we assume that a 0 is a dummy element and smaller than any element in A, and L[0] is equal to 0. By tabular computation for every i from 1 to n, each L[i] can be computed in O(i) steps. Therefore, they require in total ∑ n i=1 O(i)=O(n2 ) steps. For each position i, we use an array P to record the index of the best previous element for the current element to concatenate with. By tracing back from the element with the largest L value, we derive a longest increasing subsequence. Figure 2.9 illustrates the process of finding a longest increasing subsequence of A = 〈4, 8, 2, 7, 3, 6, 9, 1, 10, 5〉. Takei = 4 for instance, where a 4 = 7. Its previous smaller elements are a 1 and a 3 , both with L value equaling 1. Therefore, we have L[4] =L[1]+1 = 2, meaning that the length of a longest increasing subsequence ending at position 4 is of length 2. Indeed, both 〈a 1 ,a 4 〉 and 〈a 3 ,a 4 〉 are an increasing subsequence ending at position 4. In order to trace back the solution, we use array P to record which entry contributes the maximum to the current L value. Thus, P[4] can be 1 (standing for a 1 ) or 3 (standing for a 3 ). Once we have computed all L and
- Page 2 and 3: Computational Biology Editors-in-Ch
- Page 4 and 5: Kun-Mao Chao·Louxin Zhang Sequence
- Page 6 and 7: KMC: To Daddy, Mommy, Pei-Pei and L
- Page 8 and 9: viii Foreword I invite you to study
- Page 10 and 11: x Preface Chapters 2 to 5 form the
- Page 12 and 13: Acknowledgments We are extremely gr
- Page 14 and 15: Contents Foreword .................
- Page 16 and 17: Contents xix Part II. Theory ......
- Page 18 and 19: Chapter 1 Introduction 1.1 Biologic
- Page 20 and 21: 1.2 Alignment: A Model for Sequence
- Page 22 and 23: 1.2 Alignment: A Model for Sequence
- Page 24 and 25: 1.3 Scoring Alignment 7 ( ) k m a k
- Page 26 and 27: 1.4 Computing Sequence Alignment 9
- Page 28 and 29: 1.5 Multiple Alignment 11 1.5 Multi
- Page 30 and 31: 1.8 Bibliographic Notes and Further
- Page 32 and 33: PART I. ALGORITHMS AND TECHNIQUES 1
- Page 34 and 35: 18 2 Basic Algorithmic Techniques 2
- Page 36 and 37: 20 2 Basic Algorithmic Techniques F
- Page 38 and 39: 22 2 Basic Algorithmic Techniques s
- Page 40 and 41: 24 2 Basic Algorithmic Techniques F
- Page 44 and 45: 28 2 Basic Algorithmic Techniques P
- Page 46 and 47: 30 2 Basic Algorithmic Techniques a
- Page 48 and 49: 32 2 Basic Algorithmic Techniques O
- Page 50 and 51: Chapter 3 Pairwise Sequence Alignme
- Page 52 and 53: 3.3 Global Alignment 37 3.2 Dot Mat
- Page 54 and 55: 3.3 Global Alignment 39 ⎧ ⎨ S[i
- Page 56 and 57: 3.3 Global Alignment 41 ( ai b j )
- Page 58 and 59: 3.4 Local Alignment 43 ⎧ 0, ⎪
- Page 60 and 61: 3.4 Local Alignment 45 Algorithm LO
- Page 62 and 63: 3.5 Various Scoring Schemes 47 Fig.
- Page 64 and 65: 3.6 Space-Saving Strategies 49 Fig.
- Page 66 and 67: 3.6 Space-Saving Strategies 51 Algo
- Page 68 and 69: 3.6 Space-Saving Strategies 53 scor
- Page 70 and 71: 3.7 Other Advanced Topics 55 ning,
- Page 72 and 73: 3.7 Other Advanced Topics 57 (0,0).
- Page 74 and 75: 3.7 Other Advanced Topics 59 3.7.4
- Page 76 and 77: 3.8 Bibliographic Notes and Further
- Page 78 and 79: Chapter 4 Homology Search Tools The
- Page 80 and 81: 4.1 Finding Exact Word Matches 65 F
- Page 82 and 83: 4.1 Finding Exact Word Matches 67 F
- Page 84 and 85: 4.3 BLAST 69 SALSDLHAHKLRVDPVNFKLLS
- Page 86 and 87: 4.3 BLAST 71 length w, whereas for
- Page 88 and 89: 4.3 BLAST 73 Fig. 4.9 A scenario of
- Page 90 and 91: 4.5 PatternHunter 75 BLAT identifie
- Page 92 and 93:
4.5 PatternHunter 77 can develop an
- Page 94 and 95:
4.6 Bibliographic Notes and Further
- Page 96 and 97:
82 5 Multiple Sequence Alignment S
- Page 98 and 99:
84 5 Multiple Sequence Alignment S
- Page 100 and 101:
86 5 Multiple Sequence Alignment Fi
- Page 102 and 103:
88 5 Multiple Sequence Alignment al
- Page 104 and 105:
Chapter 6 Anatomy of Spaced Seeds B
- Page 106 and 107:
6.2 Basic Formulas on Hit Probabili
- Page 108 and 109:
6.2 Basic Formulas on Hit Probabili
- Page 110 and 111:
6.2 Basic Formulas on Hit Probabili
- Page 112 and 113:
6.3 Distance between Non-Overlappin
- Page 114 and 115:
6.3 Distance between Non-Overlappin
- Page 116 and 117:
6.3 Distance between Non-Overlappin
- Page 118 and 119:
6.4 Asymptotic Analysis of Hit Prob
- Page 120 and 121:
6.4 Asymptotic Analysis of Hit Prob
- Page 122 and 123:
6.4 Asymptotic Analysis of Hit Prob
- Page 124 and 125:
6.5 Spaced Seed Selection 111 Count
- Page 126 and 127:
6.6 Generalizations of Spaced Seeds
- Page 128 and 129:
6.7 Bibliographic Notes and Further
- Page 130 and 131:
6.7 Bibliographic Notes and Further
- Page 132 and 133:
120 7 Local Alignment Statistics tr
- Page 134 and 135:
122 7 Local Alignment Statistics Fi
- Page 136 and 137:
124 7 Local Alignment Statistics Le
- Page 138 and 139:
126 7 Local Alignment Statistics Be
- Page 140 and 141:
128 7 Local Alignment Statistics we
- Page 142 and 143:
130 7 Local Alignment Statistics A
- Page 144 and 145:
132 7 Local Alignment Statistics pr
- Page 146 and 147:
134 7 Local Alignment Statistics wh
- Page 148 and 149:
136 7 Local Alignment Statistics Ta
- Page 150 and 151:
138 7 Local Alignment Statistics Be
- Page 152 and 153:
140 7 Local Alignment Statistics 7.
- Page 154 and 155:
142 7 Local Alignment Statistics 7.
- Page 156 and 157:
144 7 Local Alignment Statistics al
- Page 158 and 159:
146 7 Local Alignment Statistics 7.
- Page 160 and 161:
Chapter 8 Scoring Matrices With the
- Page 162 and 163:
8.1 The PAM Scoring Matrices 151 AB
- Page 164 and 165:
8.2 The BLOSUM Scoring Matrices 153
- Page 166 and 167:
8.3 General Form of the Scoring Mat
- Page 168 and 169:
8.4 How to Select a Scoring Matrix?
- Page 170 and 171:
8.5 Compositional Adjustment of Sco
- Page 172 and 173:
8.6 DNA Scoring Matrices 161 This i
- Page 174 and 175:
8.7 Gap Cost in Gapped Alignments 1
- Page 176 and 177:
8.8 Bibliographic Notes and Further
- Page 178 and 179:
8.8 Bibliographic Notes and Further
- Page 180 and 181:
8.8 Bibliographic Notes and Further
- Page 182 and 183:
8.8 Bibliographic Notes and Further
- Page 184 and 185:
Appendix A Basic Concepts in Molecu
- Page 186 and 187:
A.4 The Genomes 175 ondary structur
- Page 188 and 189:
Appendix B Elementary Probability T
- Page 190 and 191:
B.3 Major Discrete Distributions 17
- Page 192 and 193:
B.3 Major Discrete Distributions 18
- Page 194 and 195:
B.5 Mean, Variance, and Moments 183
- Page 196 and 197:
B.5 Mean, Variance, and Moments 185
- Page 198 and 199:
B.6 Relative Entropy of Probability
- Page 200 and 201:
B.7 Discrete-time Finite Markov Cha
- Page 202 and 203:
B.8 Recurrent Events and the Renewa
- Page 204 and 205:
B.8 Recurrent Events and the Renewa
- Page 206 and 207:
196 C Software Packages for Sequenc
- Page 208 and 209:
198 References 19. Bafna, V. and Pe
- Page 210 and 211:
200 References 71. Fitch, W.M. and
- Page 212 and 213:
202 References 122. Letunic, I., Co
- Page 214 and 215:
204 References 173. Robinson, A.B.
- Page 216 and 217:
Index O-notation, 18 P-value, 139 a
- Page 218:
Index 209 heuristic, 85 progressive