04.09.2013 Views

Algorithm Design

Algorithm Design

Algorithm Design

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

740<br />

Chapter 13 Randomized <strong>Algorithm</strong>s<br />

This now completes our random implementation of dictionaries. We define<br />

the family of hash functions to be 9£--= {ha:a ~ A]- To execute MakeDictionary,<br />

we choose a random hash function from J~; in other words, we<br />

choose a random vector from A (by choosing each coordinate uniformly at<br />

random), and form the function ha. Note that in order to define A, we need<br />

to find a prime number p >_ n. There are methods for generating prime numbers<br />

quickly, which we will not go into here. (In practice, this can also be<br />

accomplished using a table of known prime numbers, even for relatively large<br />

n.) We then use this as the hash function with which to implement Insert,<br />

Delete, and Lookup. The family 9~ = {ha : a ~ A} satisfies a formal version of<br />

the second property we were seeking: It has a compact representation, since<br />

by simply choosing and remembering a random a ~ A, we can compute ha(u)<br />

for all elements u ~ U. Thus, to show that ~ leads to an efficient, hashingbased<br />

implementation of dictionaries, we iust need to establish that ~ is a<br />

universal family of hash functions.<br />

~ Analyzing the Data Structure<br />

If we are using a hash function ha from the class J~ that we’ve defined, then a<br />

collision ha(X) : ha(Y) defines a linear equation modulo the prime number p. In<br />

order to analyze such equations, it’s useful to have the following "cancellation<br />

law."<br />

(13.24) For any prime p and any integer z ~= 0 rood p, and any two integers<br />

a, r, ifaz =/~z modp, then ~ =/~ modp.<br />

ProoL Suppose az = ~z modp. Then, by rearranging terms, we get z(a -/~) =<br />

0 mod p, and hence z(a - r) is divisible by p. But z ~ 0 rood p, so z is not<br />

divisible by p. Since p is prime, it follows that a - fl must be divisible by p;<br />

that is, a --- fl rood p as claimed. []<br />

We now use this to prove the main result in our analysis.<br />

(13.25) The class of linear fimctions ~K defined above is universal.<br />

Proof. Let x = (Xl, x2 .... Xr) and y = (Y~, Yz ....Yr) be two distinct elements<br />

of U. We need to show that the probability of ha(x) = ha(Y), for a randomly<br />

chosen a ~ A, is at most 1/p.<br />

Since x 5 & y, then there must be an index j such that xj ~ yj. We now<br />

consider the following way of choosing the random vector a ~ A. We first<br />

choose all the coordinates ai where i ~j. Then, finally, we choose coordinate<br />

aj. We will show that regardless of how a!l the other coordinates ai were<br />

13.7 Finding the Closest Pair of Points: A Randomized Approach 741<br />

chosen, the probability of ha(x ) = ha(Y), taken over the final choice of aj, is<br />

exactly 1/p. It will follow that the probability of ha(x) = ha(Y) over the random<br />

choice of the full vector a must be 1/p as well.<br />

This conclusion is intuitively clear: If the probability is 1/p regardless of<br />

how we choose all other ai, then it is lip overall. There is also a direct proof<br />

of this using conditional probabilities. Let £ be the event that ha(x) = ha(Y),<br />

and let 5’ b be the event that all coordinates ai (for i ~j) receive a sequence of<br />

values b. We will show, below, that Pr [~ I 5~b] = 1/p for all b. It then follows<br />

that Pr [g] = ~b Pr" [~ I 9:b]- Pr [5:b] = (I/p) ~bPr [9:b] = !/p.<br />

So, to conclude the proof, we assume that values have been chosen<br />

arbitrarily for all other coordinates a i, and we consider the probability of<br />

selecting aj so that ha(x ) = ha(Y ). By rearranging terms, we see that ha(x) =<br />

ha(Y) if and only if<br />

aj(Y] -- xj) = E ai(xi -- yi) modp.<br />

Since the choices for all a i (i ~j) have been fixed, we can view the right-hand<br />

side as some fixed quantity m. Also, let us define z =<br />

Now it is enough to show that there is exactly one value 0 < a] < p that<br />

satisfies a]z = m rood p; indeed, if this is the case, then there is a probability<br />

of exactly lip of choosing this value for aj. So suppose there were two such<br />

values, aj and a~. Then we would have ajz = a~z modp, and so by (13.24) we<br />

would have a] = a~ rood p. But we assumed that a], a~ < p, and so in fact aj<br />

and a~ would be the same. It follows that there is only one aj in this range that<br />

satisfies ajz = m rood p.<br />

Tracing back through the implications, this means that the probability of<br />

choosing aj so that ha(x ) = ha(Y ) is l/p, however we set the other coordinates<br />

a i in a; thus the probability that x and y collide is lip. Thus we have shown<br />

that ~C is a universal class of hash functions, u<br />

13.7 Finding the Closest Pair of Points:<br />

A Randomized Approach<br />

In Chapter 5, we used the divide-and-conquer technique to develop an<br />

O(n log n) time algorithm for the problem of finding the closest pair of points in<br />

the plane. Here we will show how to use randomization to develop a different<br />

algorithm for this problem, using an underlying dictionary data structure. We<br />

will show that this algorithm runs in O(n) expected time, plus O(n) expected<br />

dictionary operations.<br />

There are several related reasons why it is useful to express the running<br />

time of our algorithm in this way, accounting for the dictionary operations

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!