Algorithm Design

More documents

Recommendations

Info

162 Chapter 4 Greedy <strong>Algorithm</strong>s ~ The Problem Encoding Symbols Using Bits Since computers ultimately operate on sequences of bits (i.e., sequences consisting only of the symbols 0 and 1), one needs encoding schemes that take text written in richer alphabets (such as the alphabets underpinning human languages) and converts this text into long strings of bits. The simplest way to do this would be to use a fixed number of bits for each symbol in the alphabet, and then just concatenate the bit strings for each symbol to form the text. To take a basic example, suppose we wanted to encode the 26 letters of English, plus the space (to separate words) and five punctuation characters: comma, period, question mark, exclamation point, and apostrophe. This would give us 32 symbols in total to be encoded. Now, you can form 2 b different sequences out of b bits, and so if we use 5 bits per symbol, then we can encode 2 s= 32 symbols--just enough for our purposes. So, for example, we could let the bit string 00000 represent a, the bit string 00001 represent b, and so forth up to 11111, which could represent the apostrophe. Note that the mapping of bit strings to symbols is arbitrary; the’ point is simply that five bits per symbol is sufficient. In fact, encoding schemes like ASCII work precisely this way, except that they use a larger number of bits per symbol so as to handle larger character sets, including capital letters, parentheses, and all those other special symbols you see on a typewriter or computer keyboard. Let’s think about our bare-bones example with just 32 symbols. Is there anything more we could ask for from an encoding scheme ?. We couldn’t ask to encode each symbo! using just four bits, since 24 is only 16--not enough for the number of symbols we have. Nevertheless, it’s not clear that over large stretches of text, we really need to be spending an average of five bits per symbol. If we think about it, the letters in most human alphabets do not get used equally frequently. In English, for example, the letters e, t: a, o, i, and n get used much more frequently than q, J, x, and z (by more than an order of magnitude). So it’s really a tremendous waste to translate them all into the same number of bits; instead we could use a small number of bits for the frequent letters, and a larger number of bits for the less frequent ones, and hope to end up using fewer than five bits per letter when we average over a long string of typical text. This issue of reducing the average number of bits per letter is a fundamental problem in the area of data compression. When large files need to be shipped across communication networks, or stored on hard disks, it’s important to represent them as compactly as possible, subject to the requirement that a subsequent reader of the file should be able to correctly reconstruct it. A huge amount of research is devoted to the design of compression algorithms 4.8 Huffman Codes and Data Compression 163 that can take files as input and reduce their space ~rough efficient encoding schemes. We now describe one of the fundamental ways of formulating this issue, building up to the question of how we might construct the optimal way to take advantage of the nonuniform frequencies of the letters. In one sense, such an optimal solution is a very appealing answer to the problem of compressing data: it squeezes all the available gains out of nonuniformities in the frequencies. At the end of the section, we will discuss how one can make flLrther progress in compression, taking advantage of features other than nonuniform frequencies. Variable-Length Encoding Schemes Before the Internet, before the digital computer, before the radio and telephone, there was the telegraph. Communicating by telegraph was a lot faster than the contemporary alternatives of hand-delivering messages by railroad or on horseback. But telegraphs were only capable of transmitting pulses down a wire, and so if you wanted to send a message, you needed a way to encode the text of your message as a sequence of pulses. To deal with this issue, the pioneer of telegraphic communication, Samuel Morse, developed Morse code, translating each letter into a sequence of dots (short pulses) and dashes (long pulses). For our purposes, we can think of dots and dashes as zeros and ones, and so this is simply a mapping of symbols into bit strings, just as in ASCII. Morse understood the point that one could communicate more efficiently by encoding frequent letters with short strings, and so this is the approach he took. (He consulted local printing presses to get frequency estimates for the letters in English.) Thus, Morse code maps e to 0 (a single dot), t to 1 (a single dash), a to 01 (dot-dash), and in general maps more frequent letters to shorter bit strings. In fact, Morse code uses such short strings for the letters that the encoding of words becomes ambiguous. For example, just using what we know about the encoding of e, t, and a, we see that the string 0101 could correspond to any of the sequences of letters eta, aa, etet, or aet. (There are other possibilities as well, involving other letters.) To deal with this ambiguity, Morse code transmissions involve short pauses between letter; (so the encoding of aa would actually be dot-dash-pause-dot-dash-pause). This is a reasonable solution--using very short bit strings and then introducing pauses--but it means that we haven’t actually encoded the letters using just 0 and 1; we’ve actually encoded it using a three-letter alphabet of 0, 1, and "pause." Thus, if we really needed to encode everything using only the bits 0 and !, there would need to be some flLrther encoding in which the pause got mapped to bits.
164 Chapter 4 Greedy <strong>Algorithm</strong>s 4.8 Huffman Codes and Data Compression 165 Prefix Codes The ambiguity problem in Morse code arises because there exist pairs of letters where the bit string that encodes one letter is a prefix of the bit string that encodes another. To eliminate this problem, and hence to obtain an encoding scheme that has a well-defined interpretation for every sequence of bits, it is enough to map letters to bit strings in such a way that no encoding is a prefix of any other. Concretely, we say that a prefix code for a set S of letters is a function y that maps each letter x ~ S to some sequence of zeros and ones, in such a way that for distinct x, y ~ S, the sequence },(x) is not a prefix of the sequence y(y). Now suppose we have a text consisting of a sequence of letters xlx2x3 ¯ ¯ ¯ x~. ~ We can convert this to a sequence of bits by simply encoding each letter as a bit sequence using ~ and then concatenating all these bit sequences together: ~ (xl) y (x2) ¯ ¯ ¯ y (xn). If we then hand this message to a recipient who knows the function y, they will be able to reconstruct the text according to the following rule. o Scan the bit sequence from left to right. o As soon as you’ve seen enough bits to match the encoding of some letter, output this as the first letter of the text. This must be the correct first letter, since no shorter or longer prefix of the bit sequence could encode any other letter. o Now delete the corresponding set of bits from the front of the message and iterate. In this way, the recipient can produce the correct set of letters without our having to resort to artificial devices like pauses to separate the letters. For example, suppose we are trying to encode the set of five letters S = {a, b, c, d, e}. The encoding ~1 specified by y~(a) = 11 Zl(b) = O1 y~(c) = 001 y~(d) = 10 }q(e) = 000 is a prefix code, since we can check that no encoding is a prefix of any other. NOW, for example, the string cecab would be encoded as 0010000011101. A recipient of this message, knowing y~, would begin reading from left to right. Neither 0 nor O0 encodes a letter, but 001 does, so the recipient concludes that the first letter is c. This is a safe decision, since no longer sequence of bits beginning with 001 could encode a different letter. The recipient now iterates on the rest of the message, 0000011101; next they will conclude that the second letter is e, encoded as 000. Optimal Prefix Codes We’ve been doing all this because some letters are more frequent than others, and we want to take advantage of the fact that more frequent letters can have shorter encodings. To make this objective precise, we now introduce some notation to express the frequencies of letters. Suppose that for each letter x ~ S, there is a frequency fx, representing the fraction of letters in the text that are equal to x. In other words, assuming there are n letters total, nfx of these letters are equal to x. We notice that the frequencies sum to 1; that is, ~x~S fx = 1. Now, if we use a prefix code ~, to encode the given text, what is the total length of our encoding? This is simply the sum, over all letters x ~ S, of the number of times x occurs times the length of the bit string }, (x) used to encode x. Using Iy(x)l to denote the length y(x), we can write this as encoding length = ~ nfxl},(x)[ = n ~ fx[y(x)l. x~S Dropping the leading coefficient of n from the final expression gives us ~x~s fxl}’(x)l, the average number of bits required per letter. We denote this quantity by ABL0,’ ) . To continue the earlier example, suppose we have a text with the letters S = {a, b, c, d, e}, and their frequencies are as follows: £=.B2, f~=.25, f~=.20, fa=.~8, f~=.o5. Then the average number of bits per letter using the prefix code Yl defined previously is .32.2+.25.2+.20.3+.18.2+.05.3 =2.25. It is interesting to compare this to the average number of bits per letter using a fixed-length encoding. (Note that a fixed-length encoding is a prefix code: if all letters have encodings of the same length, then clearly no encoding can be a prefix of any other.) With a set S of five letters, we would need three bits per letter for a fixed-length encoding, since two bits could only encode four letters. Thus, using the code ~1 reduces the bits per letter from 3 to 2.25, a savings of 25 percent. And, in fact, Yl is not the best we can do in this example. Consider the prefix code ya given by
Page 2 and 3:
Cornell University Boston San Franc
Page 4 and 5:
Contents About the Authors Preface
Page 6 and 7:
X Contents 9 10 Extending the Limit
Page 8 and 9:
xiv Preface problems out in the wor
Page 10 and 11:
Preface Chapters 2 and 3 also prese
Page 12 and 13:
xxii Preface Preface xxiii Sue Whit
Page 14 and 15:
2 Chapter 1 Introduction: Some Repr
Page 16 and 17:
6 Chapter 1 Introduction: Some Repr
Page 18 and 19:
10 Chapter 1 Introduction: Some Rep
Page 20 and 21:
14 Chapter ! Introduction: Some Rep
Page 22 and 23:
Page 24 and 25:
Page 26 and 27:
Page 28 and 29:
30 Chapter 2 Basics of Algorithm An
Page 30 and 31:
34 n= I0 n=30 n=50 = 100 = 1,000 10
Page 32 and 33:
Page 34 and 35:
Page 36 and 37:
Page 38 and 39:
Page 40 and 41:
Page 42 and 43:
Page 44 and 45: 62 Chapter 2 Basics of Algorithm An
Page 50 and 51: 74 Chapter 3 Graphs 3.1 Basic Defin
Page 52 and 53: 78 Chapter 3 Graphs Rooted trees ar
Page 54 and 55: 82 Chapter 3 Graphs Current compone
Page 56 and 57: 86 Chapter 3 Graphs Proof. Suppose
Page 58 and 59: 90 Chapter 3 Graphs Two of the simp
Page 60 and 61: 94 Chapter 3 Graphs most ~u nv = 2m
Page 62 and 63: 98 Chapter 3 Graphs It is important
Page 64 and 65: 102 Chapter 3 Graphs ordering, that
Page 66 and 67: 106 Chapter 3 Graphs We can already
Page 68 and 69: 110 Chapter 3 Graphs Exercises 111
Page 70 and 71: Greedy A~gorithms In Wall Street, t
Page 72 and 73: 118 Chapter 4 Greedy Mgofithms o In
Page 74 and 75: 122 Chapter 4 Greedy Algorithms Our
Page 76 and 77: 126 Chapter 4 Greedy Algorithms Len
Page 78 and 79: Before swapping: After swapping: d
Page 80 and 81: 134 Chapter 4 Greedy Algorithms wit
Page 82 and 83: 138 Chapter 4 Greedy Algorithms 4.4
Page 84 and 85: 142 Chapter 4 Greedy Algorithms 4.5
Page 86 and 87: 146 Chapter 4 Greedy Algorithms (e
Page 88 and 89: 150 Chapter 4 Greedy Algorithms her
Page 90 and 91: 154 Chapter 4 Greedy Algorithms we
Page 92 and 93: 158 Chapter 4 Greedy Algorithms spa
Page 96 and 97: 166 Chapter 4 Greedy Algorithms g2(
Page 98 and 99: 170 Chapter 4 Greedy Algorithms Sha
Page 100 and 101: 174 Chapter 4 Greedy Algorithms ...
Page 102 and 103: 178 Chapter 4 Greedy Algorithms Fig
Page 104 and 105: 182 Chapter 4 Greedy Algorithms The
Page 106: 186 Chapter 4 Greedy Algorithms Sol
Page 109 and 110: 192 Chapter 4 Greedy Algorithms 8.
Page 115 and 116: 204 Chapter 4 Greedy Algorithms Not
Page 117 and 118: Chapter Divide artd Cortquer Divide
Page 119 and 120: 212 Chapter 5 Divide and Conquer Th
Page 121 and 122: 216 cn time plus recursive calls Ch
Page 123 and 124: 220 Chapter 5 Divide and Conquer mo
Page 125 and 126: 224 Chapter 5 Divide and Conquer El
Page 127 and 128: 228 Chapter 5 Divide and Conquer 5.
Page 129 and 130: 232 Chapter 5 Divide and Conquer (a
Page 131 and 132: we’ll just give a simple example
Page 133 and 134: 240 Chapter 5 Divide and Conquer po
Page 135 and 136: 244 Chapter 5 Divide and Conquer Th
Page 137 and 138: 248 Chapter 5 Divide and Conquer It
Page 139 and 140: 252 Chapter 6 Dynamic Programming b
Page 141 and 142: 256 Chapter 6 Dynamic Programming 6
Page 143 and 144: 260 Index 1 2 3 4 5 6 L~ I = 2 Chap
Page 145 and 146:
264 Chapter 6 Dynamic Programming w
Page 147 and 148:
268 Chapter 6 Dynamic Programming t
Page 149 and 150:
272 Chapter 6 Dynamic Programming s
Page 151 and 152:
276 Chapter 6 Dynamic Programming 1
Page 153 and 154:
280 Chapter 6 Dynamic Programming t
Page 155 and 156:
284 Figure 6.18 The OPT values for
Page 157 and 158:
288 Chapter 6 Dynamic Programming U
Page 159 and 160:
292 (a) Figure 6.21 (a) With negati
Page 161 and 162:
296 Chapter 6 Dynamic Programming i
Page 163 and 164:
300 Chapter 6 Dynamic Programming p
Page 165 and 166:
304 Chapter 6 Dynamic Programming e
Page 167 and 168:
308 Chapter 6 Dynamic Programming o
Page 169 and 170:
312 Chapter 6 Dynamic Programming E
Page 171 and 172:
316 Chapter 6 Dynamic Programming T
Page 173 and 174:
320 Chapter 6 Dynamic Programming (
Page 175 and 176:
Page 177 and 178:
328 Chapter 6 Dynamic Programming T
Page 179 and 180:
Page 181 and 182:
336 Chapter 6 Dynamic Programming h
Page 183 and 184:
54O Chapter 7 Network Flow define f
Page 185 and 186:
344 Chapter 7 Network Flow This aug
Page 187 and 188:
348 Chapter 7 Network Flow Proof. ~
Page 189 and 190:
352 Chapter 7 Network Flow In the n
Page 191 and 192:
356 Chapter 7 Network Flow (7.19) T
Page 193 and 194:
360 Chapter 7 Network Flow preflow
Page 195 and 196:
364 Chapter 7 Network Flow Heights
Page 197 and 198:
368 Chapter 7 Network Flow ~ The Pr
Page 199 and 200:
372 Chapter 7 Network Flow that are
Page 201 and 202:
376 Chapter 7 Network Flow I~low ar
Page 203 and 204:
380 Chapter 7 Network Flow Figure 7
Page 205 and 206:
384 Chapter 7 Network Flow Lower bo
Page 207 and 208:
388 BOS 6 DCA 7 DCA 8 Chapter 7 Net
Page 209 and 210:
392 Chapter 7 Network Flow natural
Page 211 and 212:
396 Chapter 7 Network Flow 7.11 Pro
Page 213 and 214:
400 Chapter 7 Network Flow 7.12 Bas
Page 215 and 216:
404 Chapter 7 Network Flow to cross
Page 217 and 218:
4O8 Chapter 7 Network Flow Why are
Page 219 and 220:
412 Chapter 7 Network Flow ProoL Th
Page 221 and 222:
416 Chapter 7 Network Flow Figure 7
Page 223 and 224:
420 Chapter 7 Network Flow 11. Now
Page 225 and 226:
424 Figure 7.29 An instance of Cove
Page 227 and 228:
428 Chapter 7 Network Flow 22. This
Page 229 and 230:
432 Chapter 7 Network Flow generall
Page 231 and 232:
436 Chapter 7 Network Flow (a) (b)
Page 233 and 234:
440 Chapter 7 Network Flow going to
Page 235 and 236:
Chapter 7 Network Flow 44. 45. Give
Page 237 and 238:
448 Chapter 7 Network Flow 51. The
Page 239 and 240:
452 Chapter 8 NP and Computational
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
472 Chapter 8 NP and Computation!!
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
5OO Chapter 8 NP and Computational
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
532 Chapter 9 PSPACE: A Class of Pr
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
Page 287 and 288:
Page 289 and 290:
Page 291 and 292:
556 Chapter 10 Extending the Limits
Page 293 and 294:
Page 295 and 296:
Page 297 and 298:
Page 299 and 300:
Page 301 and 302:
Page 303 and 304:
58O Chapter 10 Extending the Limits
Page 305 and 306:
Page 307 and 308:
Page 309 and 310:
Page 311 and 312:
596 Figure 10.12 A triangulated cyc
Page 313 and 314:
600 Chapter 11 Approximation Algori
Page 315 and 316:
Page 317 and 318:
Page 319 and 320:
Page 321 and 322:
Page 323 and 324:
Page 325 and 326:
Page 327 and 328:
628 Chapter 11 Approximation Mgorit
Page 329 and 330:
Page 331 and 332:
636 Chapter !! Approximation Algori
Page 333 and 334:
640 Chapter 11 Approximation Mgofit
Page 335 and 336:
Page 337 and 338:
Page 339 and 340:
Page 341 and 342:
656 Chapter 11 Approximation Mgorit
Page 343 and 344:
Page 345 and 346:
664 Chapter 12 Local Search basic p
Page 347 and 348:
668 Chapter 12 Local Search moves b
Page 349 and 350:
672 Chapter 12 Local Search of all
Page 351 and 352:
676 Chapter 12 Local Search 12.4 Ma
Page 353 and 354:
680 Chapter 12 Local Search saw ear
Page 355 and 356:
684 Chapter 12 Local Search utting
Page 357 and 358:
688 Chapter 12 Local Search this di
Page 359 and 360:
692 Chapter 12 Local Search sides a
Page 361 and 362:
696 Chapter 12 Local Search Of cour
Page 363 and 364:
7OO Chapter 12 Loca! Search and for
Page 365 and 366:
7O4 Chapter 12 Local Search A previ
Page 367 and 368:
708 Chapter 13 Randomized Algorithm
Page 369 and 370:
Page 371 and 372:
Page 373 and 374:
Page 375 and 376:
724 Chapter 13 Randomized Mgorithms
Page 377 and 378:
Page 379 and 380:
Page 381 and 382:
736 Chapter 13 Randomized Mgofithms
Page 383 and 384:
Page 385 and 386:
Page 387 and 388:
Page 389 and 390:
Page 391 and 392:
Page 393 and 394:
Page 395 and 396:
Page 397 and 398:
Page 399 and 400:
Page 401 and 402:
Page 403 and 404:
Page 405 and 406:
Page 407 and 408:
Page 409 and 410:
792 Chapter !3 Randomized Algorithm
Page 411 and 412:
796 Epilogue: Algorithms That Run F
Page 413 and 414:
800 Epilogue: Algorithms That Run F
Page 415 and 416:
8O4 Epilogue: Algorithms That Run F
Page 417 and 418:
808 References L. R. Ford. Network
Page 419 and 420:
812 References M. Mitzenmacher and
Page 421 and 422:
816 Index Approximation algorithms
Page 423 and 424:
820 Cushions in packet switching, 8
Page 425 and 426:
824 Index Graphs (cont.) queues and
Page 427 and 428:
828 Index Mapping genomes, 279, 521
Page 429 and 430:
832 Index Index 833 Preflow-Push Ma
Page 431 and 432:
836 Index Stale items in randomized
show all

Algorithm Design

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?