Algorithm Design

More documents

Recommendations

Info

734 Chapter 13 Randomized <strong>Algorithm</strong>s We considered this modified version of Quicksort tO simplify the analysis. Coming back to the original Quicksor% our intuition suggests that the expected running time is no worse than in the modified algorithm, as accepting the noncentral splitters helps a bit with sorting, even if it does not help as much as when a central splitter is chosen. As mentioned earlier, one can in fact make this intuition precise, leading to an O(n log n) expected time bound for the original ~uicksort algorithm; we will not go into the details of this here. 13.6 Hashing: A Randomized Implementation oi Dictionaries Randomization has also proved to be a powerful technique in the design of data structures. Here we discuss perhaps the most fundamental use of randomization in this setting, a technique called hashing that can be used to maintain a dynamically changing set of elements. In the next section, we will show how an application of this technique yields a very simple algorithm for a problem that we saw in Chapter 5--the problem of finding the closest pair of points in the plane. ~ The Problem One of the most basic applications of data structures is to simply maintain a set of elements that changes over time. For example, such applications could include a large company maintaining the set of its current employees and contractors, a news indexing service recording the first paragraphs of news articles it has seen coming across the newswire, or a search algorithm keeping track of the small part of an exponentially large search space that it has already explored. In all these examples, there is a universe U of possible elements that is extremely large: the set of all possible people, all possible paragraphs (say, up to some character length limit), or all possible solutions to a computationallY hard problem. The data structure is trying to keep track of a set S _ U whose size is generally a negligible fraction of U, and the goal is to be able to insert and delete elements from S and quickly determine whether a given element belongs to S. We will cal! a data structure that accomplishes this a d[ctiortary. More precisely, a dictionary is a data structure that supports the following operations. o MakeDictionary. This operation initializes a fresh dictionary that can maintain a subset S of U; the dictionary starts out empty. o Insert(u) adds element u ~ U to the set S. In many applications, there may be some additional information that we want to associate with u O 13.6 Hashing: A Randomized Implementation of Dictionaries (for example, u may be the name or ID number of an employee, and we want to also store some personal information about this employee), and we will simply imagine this being stored in the dictionary as part of a record together with u. (So, in general, when we talk about the element u, we really mean u and any additional information stored with u.) Delete(u) removes element u from the set S, if it is currently present. Lookup(u) determines whether u currently belongs to S; if it does, it also retrieves any.additional information stored with~. Many of the implementations we’ve discussed earlier in the book involve (most of) these operations: For example, in the implementation of the BFS and DFS graph traversal algorithms, we needed to maintain the set S of nodes already visited. But there is a fundamental difference between those problems and the present setting, and that is the size of U. The universe U in BFS or DFS is the set of nodes V, which is already given explicitly as part of the input. Thus it is completely feasible in those cases to maintain a set S _ U as we did there: defining an array with IUI positions, one for each possible element, and setting the array position for u equal to 1 if u ~ S, and equal to 0 if u ~ S. This allows for insertion, deletion, and lookup of elements in constant time per operation, by simply accessing the desired array entry. Here, by contrast, we are considering the setting in which the universe U is enormous. So we are not going to be able to use an array whose size is anywhere near that of U. The fundamental question is whether, in this case, we can still implement a dictionary to support the basic operations almost as quickly as when U was relatively small. We now describe a randomized technique called hashing that addresses this question. While we will not be able to do quite as well as the case in which it is feasible to define an array over all of U, hashing will allow us to come quite close. ~ <strong>Design</strong>ing the Data Structure As a motivating example, let’s think a bit more about the problem faced by an automated service that processes breaking news. Suppose you’re receiving a steady stream of short articles from various wire services, weblog postings, and so forth, and you’re storing the lead paragraph of each article (truncated to at most 1,000 characters). Because you’re using many sources for the sake of full coverage, there’sa lot of redundancy: the same article can show up many times. When a new article shows up, you’d like to quickly check whether you’ve seen the lead paragraph before. So a dictionary is exactly what you want for this problem: The universe U is the set of al! strings of length at most 1,000 (or of 735
736 Chapter 13 Randomized Mgofithms length exactly 1,000, if we pad them out with blanks), and we’re maintaining a set S_ c U consisting of strings (i.e., lead paragraphs) that we’ve seen before. One solution would be to keep a linked list of all paragraphs, and scan this fist each time a new one arrives. But a Lookup operation in this case takes time proportional to ISI. How can we get back to something that looks like an array-based solutio n~. Hash Functions The basic idea of hashing is to work with an array of size ISI, rather than one comparable to the (astronomical) size of U. Suppose we want to be able to store a set S of size up to n. We will set up an array H of size rt to store the information, and use a function h : U --> {0, 1 ..... rt - 1} that maps elements of U to array positions. We call such a function h a hash function, and the array H a hash table. Now, if we want to add an element a to the set S, we simply place a in position h(a) of the array H. In the case of storing paragraphs of text, we can think of h(.) as computing some kind of numerical signature or "check-sum" of the paragraph u, and this tells us the array position at which to store a. This would work extremely well if, for all distinct a and u in our set S, it happened to be the case that h(u) ~= h(v). In such a case, we could look up u in constant time: when we check array position H[h(u)], it would either be empty or would contain just ~. In general, though, we cannot expect to be this lucky: there can be distinct elements R, u ~ S for which h(R) = h(u). We will say that these two elements .... collide, since they are mapped to the same place in H. There are a number of ways to deal with collisions. Here we wil! assume that each position H[i] of the hash table stores a linked fist of all elements u ~ S with h(R) = i. The operation Lookup(u) would now work as follows. o Compute the hash function h(u). o Scan the linked fist at position H[h(tt)] to see if ~ is present inthis fist. .... Hence the time required for Lookup(R) is proportional to the time to compute h(tt), plus the length of the finked list at H[h(a)]. And this latter quantity, in turn, is just the number of elements in S that collide with ~. The Insert and Delete operations work similarly: Insert adds u to the linked fist at position H[h(R)], and De]-e~ce scans this fist and removes a if it is present. So now the goal is clear: We’d like to find a hash function that "spreads out" the elements being added, so that no one entry of the hash table H contains too many elements. This is not a problem for which worst-case analysis is very informative. Indeed, suppose that IUI >_ r~ 2 (we’re imagining applications where it’s much larger than this). Then, for any hash function h that we choose, there will be some set S of rt elements that all map to the same 13.6 Hashing: A Randomized Implementation of Dictionaries position. In the worst case, we will insert all the elements of this set, and then our Lookup operations will consist of scanning a linked fist of length n. Our main goal here is to show that randomization can help significantly for this problem. As usual, we won’t make any assumptions about the set of elements S being random; we will simply exploit randomization in the design of the hash function. In doing this, we won’t be able to completely avoid collisions, but can make them relatively rare enough, and so the fists will be quite short. Choosing a Good Hash Function We’ve seen that the efficiency of the dictionary is based on the choice of the hash function h. Typically, we will think of U as a large set of numbers, and then use an easily computable function h that maps each number u ~ U to some value in the smaller range of integers [0, 1 ..... n - 1}. There are many simple ways to do this: we could use the first or last few digits of u, or simply take u modulo ft. While these simple choices may work well in many situations, it is also possible to get large numbers of collisions. Indeed, a fixed choice of hash function may run into problems because of the types of elements u encountered in the application: Maybe the particular digits we use to define the hash function encode some property of u, and hence maybe only a few options are possible. Taking u modulo rt can have the same problem, especially if rt is a power of 2. To take a concrete example, suppose we used a hash function that took an English paragraph, used a standard character encoding scheme like ASCII to map it to a sequence of bits, and then kept only the first few bits in this sequence. We’d expect a huge number of collisions at the array entries corresponding to the bit strings that encoded common English words like The, while vast portions of the array can be occupied only by paragraphs that begin with strings like qxf, and hence will be empty. A slightly better choice in practice is to take (u mod p) for a prime number p that is approximately equal to n. While in some applications this may yield a good hashing function, it may not work well in al! applications, and some primes may work much better than others (for example, primes very close to powers of 2 may not work so well). Since hashing has been widely used in practice for a long time, there is a lot of experience with what makes for a good hash function, and many hash functions have been proposed that tend to work well empirically. Here we would like to develop a hashing scheme where we can prove that it results in efficient dictionary operations with high probability. The basic idea, as suggested earlier, is to use randomization in the construction of h. First let’s consider an extreme version of this: for every element u ~ U, when we go to insert u into S, we select a value h(u) uniformly at 737
Page 2 and 3:
Cornell University Boston San Franc
Page 4 and 5:
Contents About the Authors Preface
Page 6 and 7:
X Contents 9 10 Extending the Limit
Page 8 and 9:
xiv Preface problems out in the wor
Page 10 and 11:
Preface Chapters 2 and 3 also prese
Page 12 and 13:
xxii Preface Preface xxiii Sue Whit
Page 14 and 15:
2 Chapter 1 Introduction: Some Repr
Page 16 and 17:
6 Chapter 1 Introduction: Some Repr
Page 18 and 19:
10 Chapter 1 Introduction: Some Rep
Page 20 and 21:
14 Chapter ! Introduction: Some Rep
Page 22 and 23:
Page 24 and 25:
Page 26 and 27:
Page 28 and 29:
30 Chapter 2 Basics of Algorithm An
Page 30 and 31:
34 n= I0 n=30 n=50 = 100 = 1,000 10
Page 32 and 33:
Page 34 and 35:
Page 36 and 37:
Page 38 and 39:
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
Page 46 and 47:
Page 48 and 49:
Page 50 and 51:
74 Chapter 3 Graphs 3.1 Basic Defin
Page 52 and 53:
78 Chapter 3 Graphs Rooted trees ar
Page 54 and 55:
82 Chapter 3 Graphs Current compone
Page 56 and 57:
86 Chapter 3 Graphs Proof. Suppose
Page 58 and 59:
90 Chapter 3 Graphs Two of the simp
Page 60 and 61:
94 Chapter 3 Graphs most ~u nv = 2m
Page 62 and 63:
98 Chapter 3 Graphs It is important
Page 64 and 65:
102 Chapter 3 Graphs ordering, that
Page 66 and 67:
106 Chapter 3 Graphs We can already
Page 68 and 69:
110 Chapter 3 Graphs Exercises 111
Page 70 and 71:
Greedy A~gorithms In Wall Street, t
Page 72 and 73:
118 Chapter 4 Greedy Mgofithms o In
Page 74 and 75:
122 Chapter 4 Greedy Algorithms Our
Page 76 and 77:
126 Chapter 4 Greedy Algorithms Len
Page 78 and 79:
Before swapping: After swapping: d
Page 80 and 81:
134 Chapter 4 Greedy Algorithms wit
Page 82 and 83:
138 Chapter 4 Greedy Algorithms 4.4
Page 84 and 85:
142 Chapter 4 Greedy Algorithms 4.5
Page 86 and 87:
146 Chapter 4 Greedy Algorithms (e
Page 88 and 89:
150 Chapter 4 Greedy Algorithms her
Page 90 and 91:
154 Chapter 4 Greedy Algorithms we
Page 92 and 93:
158 Chapter 4 Greedy Algorithms spa
Page 94 and 95:
162 Chapter 4 Greedy Algorithms ~ T
Page 96 and 97:
166 Chapter 4 Greedy Algorithms g2(
Page 98 and 99:
170 Chapter 4 Greedy Algorithms Sha
Page 100 and 101:
174 Chapter 4 Greedy Algorithms ...
Page 102 and 103:
178 Chapter 4 Greedy Algorithms Fig
Page 104 and 105:
182 Chapter 4 Greedy Algorithms The
Page 106:
186 Chapter 4 Greedy Algorithms Sol
Page 109 and 110:
192 Chapter 4 Greedy Algorithms 8.
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
204 Chapter 4 Greedy Algorithms Not
Page 117 and 118:
Chapter Divide artd Cortquer Divide
Page 119 and 120:
212 Chapter 5 Divide and Conquer Th
Page 121 and 122:
216 cn time plus recursive calls Ch
Page 123 and 124:
220 Chapter 5 Divide and Conquer mo
Page 125 and 126:
224 Chapter 5 Divide and Conquer El
Page 127 and 128:
228 Chapter 5 Divide and Conquer 5.
Page 129 and 130:
232 Chapter 5 Divide and Conquer (a
Page 131 and 132:
we’ll just give a simple example
Page 133 and 134:
240 Chapter 5 Divide and Conquer po
Page 135 and 136:
244 Chapter 5 Divide and Conquer Th
Page 137 and 138:
248 Chapter 5 Divide and Conquer It
Page 139 and 140:
252 Chapter 6 Dynamic Programming b
Page 141 and 142:
256 Chapter 6 Dynamic Programming 6
Page 143 and 144:
260 Index 1 2 3 4 5 6 L~ I = 2 Chap
Page 145 and 146:
264 Chapter 6 Dynamic Programming w
Page 147 and 148:
268 Chapter 6 Dynamic Programming t
Page 149 and 150:
272 Chapter 6 Dynamic Programming s
Page 151 and 152:
Page 153 and 154:
280 Chapter 6 Dynamic Programming t
Page 155 and 156:
284 Figure 6.18 The OPT values for
Page 157 and 158:
288 Chapter 6 Dynamic Programming U
Page 159 and 160:
292 (a) Figure 6.21 (a) With negati
Page 161 and 162:
296 Chapter 6 Dynamic Programming i
Page 163 and 164:
300 Chapter 6 Dynamic Programming p
Page 165 and 166:
304 Chapter 6 Dynamic Programming e
Page 167 and 168:
308 Chapter 6 Dynamic Programming o
Page 169 and 170:
312 Chapter 6 Dynamic Programming E
Page 171 and 172:
316 Chapter 6 Dynamic Programming T
Page 173 and 174:
320 Chapter 6 Dynamic Programming (
Page 175 and 176:
Page 177 and 178:
328 Chapter 6 Dynamic Programming T
Page 179 and 180:
Page 181 and 182:
336 Chapter 6 Dynamic Programming h
Page 183 and 184:
54O Chapter 7 Network Flow define f
Page 185 and 186:
344 Chapter 7 Network Flow This aug
Page 187 and 188:
348 Chapter 7 Network Flow Proof. ~
Page 189 and 190:
352 Chapter 7 Network Flow In the n
Page 191 and 192:
356 Chapter 7 Network Flow (7.19) T
Page 193 and 194:
360 Chapter 7 Network Flow preflow
Page 195 and 196:
364 Chapter 7 Network Flow Heights
Page 197 and 198:
368 Chapter 7 Network Flow ~ The Pr
Page 199 and 200:
372 Chapter 7 Network Flow that are
Page 201 and 202:
376 Chapter 7 Network Flow I~low ar
Page 203 and 204:
380 Chapter 7 Network Flow Figure 7
Page 205 and 206:
384 Chapter 7 Network Flow Lower bo
Page 207 and 208:
388 BOS 6 DCA 7 DCA 8 Chapter 7 Net
Page 209 and 210:
392 Chapter 7 Network Flow natural
Page 211 and 212:
396 Chapter 7 Network Flow 7.11 Pro
Page 213 and 214:
400 Chapter 7 Network Flow 7.12 Bas
Page 215 and 216:
404 Chapter 7 Network Flow to cross
Page 217 and 218:
4O8 Chapter 7 Network Flow Why are
Page 219 and 220:
412 Chapter 7 Network Flow ProoL Th
Page 221 and 222:
416 Chapter 7 Network Flow Figure 7
Page 223 and 224:
420 Chapter 7 Network Flow 11. Now
Page 225 and 226:
424 Figure 7.29 An instance of Cove
Page 227 and 228:
428 Chapter 7 Network Flow 22. This
Page 229 and 230:
432 Chapter 7 Network Flow generall
Page 231 and 232:
436 Chapter 7 Network Flow (a) (b)
Page 233 and 234:
440 Chapter 7 Network Flow going to
Page 235 and 236:
Chapter 7 Network Flow 44. 45. Give
Page 237 and 238:
448 Chapter 7 Network Flow 51. The
Page 239 and 240:
452 Chapter 8 NP and Computational
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
472 Chapter 8 NP and Computation!!
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
5OO Chapter 8 NP and Computational
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
532 Chapter 9 PSPACE: A Class of Pr
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
Page 287 and 288:
Page 289 and 290:
Page 291 and 292:
556 Chapter 10 Extending the Limits
Page 293 and 294:
Page 295 and 296:
Page 297 and 298:
Page 299 and 300:
Page 301 and 302:
Page 303 and 304:
58O Chapter 10 Extending the Limits
Page 305 and 306:
Page 307 and 308:
Page 309 and 310:
Page 311 and 312:
596 Figure 10.12 A triangulated cyc
Page 313 and 314:
600 Chapter 11 Approximation Algori
Page 315 and 316:
Page 317 and 318:
Page 319 and 320:
Page 321 and 322:
Page 323 and 324:
Page 325 and 326:
Page 327 and 328:
628 Chapter 11 Approximation Mgorit
Page 329 and 330: 632 Chapter 11 Approximation Algori
Page 331 and 332: 636 Chapter !! Approximation Algori
Page 333 and 334: 640 Chapter 11 Approximation Mgofit
Page 341 and 342: 656 Chapter 11 Approximation Mgorit
Page 345 and 346: 664 Chapter 12 Local Search basic p
Page 347 and 348: 668 Chapter 12 Local Search moves b
Page 349 and 350: 672 Chapter 12 Local Search of all
Page 351 and 352: 676 Chapter 12 Local Search 12.4 Ma
Page 353 and 354: 680 Chapter 12 Local Search saw ear
Page 355 and 356: 684 Chapter 12 Local Search utting
Page 357 and 358: 688 Chapter 12 Local Search this di
Page 359 and 360: 692 Chapter 12 Local Search sides a
Page 361 and 362: 696 Chapter 12 Local Search Of cour
Page 363 and 364: 7OO Chapter 12 Loca! Search and for
Page 365 and 366: 7O4 Chapter 12 Local Search A previ
Page 367 and 368: 708 Chapter 13 Randomized Algorithm
Page 375 and 376: 724 Chapter 13 Randomized Mgorithms
Page 379: 732 Chapter 13 Randomized Algorithm
Page 409 and 410: 792 Chapter !3 Randomized Algorithm
Page 411 and 412: 796 Epilogue: Algorithms That Run F
Page 413 and 414: 800 Epilogue: Algorithms That Run F
Page 415 and 416: 8O4 Epilogue: Algorithms That Run F
Page 417 and 418: 808 References L. R. Ford. Network
Page 419 and 420: 812 References M. Mitzenmacher and
Page 421 and 422: 816 Index Approximation algorithms
Page 423 and 424: 820 Cushions in packet switching, 8
Page 425 and 426: 824 Index Graphs (cont.) queues and
Page 427 and 428: 828 Index Mapping genomes, 279, 521
Page 429 and 430: 832 Index Index 833 Preflow-Push Ma
Page 431 and 432:
836 Index Stale items in randomized
show all

Algorithm Design

Create successful ePaper yourself

Delete template?

Save as template?