Algorithm Design
Algorithm Design
Algorithm Design
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
740<br />
Chapter 13 Randomized <strong>Algorithm</strong>s<br />
This now completes our random implementation of dictionaries. We define<br />
the family of hash functions to be 9£--= {ha:a ~ A]- To execute MakeDictionary,<br />
we choose a random hash function from J~; in other words, we<br />
choose a random vector from A (by choosing each coordinate uniformly at<br />
random), and form the function ha. Note that in order to define A, we need<br />
to find a prime number p >_ n. There are methods for generating prime numbers<br />
quickly, which we will not go into here. (In practice, this can also be<br />
accomplished using a table of known prime numbers, even for relatively large<br />
n.) We then use this as the hash function with which to implement Insert,<br />
Delete, and Lookup. The family 9~ = {ha : a ~ A} satisfies a formal version of<br />
the second property we were seeking: It has a compact representation, since<br />
by simply choosing and remembering a random a ~ A, we can compute ha(u)<br />
for all elements u ~ U. Thus, to show that ~ leads to an efficient, hashingbased<br />
implementation of dictionaries, we iust need to establish that ~ is a<br />
universal family of hash functions.<br />
~ Analyzing the Data Structure<br />
If we are using a hash function ha from the class J~ that we’ve defined, then a<br />
collision ha(X) : ha(Y) defines a linear equation modulo the prime number p. In<br />
order to analyze such equations, it’s useful to have the following "cancellation<br />
law."<br />
(13.24) For any prime p and any integer z ~= 0 rood p, and any two integers<br />
a, r, ifaz =/~z modp, then ~ =/~ modp.<br />
ProoL Suppose az = ~z modp. Then, by rearranging terms, we get z(a -/~) =<br />
0 mod p, and hence z(a - r) is divisible by p. But z ~ 0 rood p, so z is not<br />
divisible by p. Since p is prime, it follows that a - fl must be divisible by p;<br />
that is, a --- fl rood p as claimed. []<br />
We now use this to prove the main result in our analysis.<br />
(13.25) The class of linear fimctions ~K defined above is universal.<br />
Proof. Let x = (Xl, x2 .... Xr) and y = (Y~, Yz ....Yr) be two distinct elements<br />
of U. We need to show that the probability of ha(x) = ha(Y), for a randomly<br />
chosen a ~ A, is at most 1/p.<br />
Since x 5 & y, then there must be an index j such that xj ~ yj. We now<br />
consider the following way of choosing the random vector a ~ A. We first<br />
choose all the coordinates ai where i ~j. Then, finally, we choose coordinate<br />
aj. We will show that regardless of how a!l the other coordinates ai were<br />
13.7 Finding the Closest Pair of Points: A Randomized Approach 741<br />
chosen, the probability of ha(x ) = ha(Y), taken over the final choice of aj, is<br />
exactly 1/p. It will follow that the probability of ha(x) = ha(Y) over the random<br />
choice of the full vector a must be 1/p as well.<br />
This conclusion is intuitively clear: If the probability is 1/p regardless of<br />
how we choose all other ai, then it is lip overall. There is also a direct proof<br />
of this using conditional probabilities. Let £ be the event that ha(x) = ha(Y),<br />
and let 5’ b be the event that all coordinates ai (for i ~j) receive a sequence of<br />
values b. We will show, below, that Pr [~ I 5~b] = 1/p for all b. It then follows<br />
that Pr [g] = ~b Pr" [~ I 9:b]- Pr [5:b] = (I/p) ~bPr [9:b] = !/p.<br />
So, to conclude the proof, we assume that values have been chosen<br />
arbitrarily for all other coordinates a i, and we consider the probability of<br />
selecting aj so that ha(x ) = ha(Y ). By rearranging terms, we see that ha(x) =<br />
ha(Y) if and only if<br />
aj(Y] -- xj) = E ai(xi -- yi) modp.<br />
Since the choices for all a i (i ~j) have been fixed, we can view the right-hand<br />
side as some fixed quantity m. Also, let us define z =<br />
Now it is enough to show that there is exactly one value 0 < a] < p that<br />
satisfies a]z = m rood p; indeed, if this is the case, then there is a probability<br />
of exactly lip of choosing this value for aj. So suppose there were two such<br />
values, aj and a~. Then we would have ajz = a~z modp, and so by (13.24) we<br />
would have a] = a~ rood p. But we assumed that a], a~ < p, and so in fact aj<br />
and a~ would be the same. It follows that there is only one aj in this range that<br />
satisfies ajz = m rood p.<br />
Tracing back through the implications, this means that the probability of<br />
choosing aj so that ha(x ) = ha(Y ) is l/p, however we set the other coordinates<br />
a i in a; thus the probability that x and y collide is lip. Thus we have shown<br />
that ~C is a universal class of hash functions, u<br />
13.7 Finding the Closest Pair of Points:<br />
A Randomized Approach<br />
In Chapter 5, we used the divide-and-conquer technique to develop an<br />
O(n log n) time algorithm for the problem of finding the closest pair of points in<br />
the plane. Here we will show how to use randomization to develop a different<br />
algorithm for this problem, using an underlying dictionary data structure. We<br />
will show that this algorithm runs in O(n) expected time, plus O(n) expected<br />
dictionary operations.<br />
There are several related reasons why it is useful to express the running<br />
time of our algorithm in this way, accounting for the dictionary operations