08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

We now give an example <strong>of</strong> a 2-universal family <strong>of</strong> hash functions. Let M be a prime<br />

greater than m. For each pair <strong>of</strong> integers a and b in the range [0, M − 1], define a hash<br />

function<br />

h ab (x) = ax + b (mod M)<br />

To store the hash function h ab , store the two integers a and b. This requires only O(log M)<br />

space. To see that the family is 2-universal note that h(x) = w and h(y) = z if and only<br />

if ( ) ( ) ( )<br />

x 1 a w<br />

= (mod M)<br />

y 1 b z<br />

( ) x 1<br />

If x ≠ y, the matrix is invertible modulo M.<br />

y 1<br />

28 Thus<br />

( a<br />

=<br />

b)<br />

( −1 ( )<br />

x 1 w<br />

y 1)<br />

z<br />

and for each ( w<br />

z)<br />

there is a unique<br />

( a<br />

b)<br />

. Hence<br />

and H is 2-universal.<br />

(mod M)<br />

Prob ( h(x) = w and h(y) = z ) = 1<br />

M 2<br />

Analysis <strong>of</strong> distinct element counting algorithm<br />

Let b 1 , b 2 , . . . , b d be the distinct values that appear in the input. Then the set S =<br />

{h(b 1 ), h(b 2 ), . . . , h(b d )} is a set <strong>of</strong> d random and pairwise independent values from the<br />

set {0, 1, 2, . . . , M − 1}. We now show that M is a good estimate for d, the number <strong>of</strong><br />

min<br />

distinct elements in the input, where min = min(S).<br />

Lemma 7.1 With probability at least 2 − d , we have d ≤ M 3 M 6 min<br />

smallest element <strong>of</strong> S.<br />

≤ 6d, where min is the<br />

Pro<strong>of</strong>: First, we show that Prob ( M<br />

> 6d) < 1 + d . This part does not require pairwise<br />

min 6 M<br />

independence.<br />

( ) (<br />

M<br />

Prob<br />

min > 6d = Prob min < M ) (<br />

= Prob ∃k, h (b k ) < M )<br />

6d<br />

6d<br />

d∑<br />

(<br />

≤ Prob h(b i ) < M ) (<br />

⌈<br />

M<br />

≤ d<br />

⌉<br />

) (<br />

6d 1<br />

≤ d<br />

6d M 6d + 1 )<br />

≤ 1 M 6 + d M .<br />

i=1<br />

28 The primality <strong>of</strong> M ensures that inverses <strong>of</strong> elements exist in ZM ∗<br />

then x and y are not equal mod M.<br />

and M > m ensures that if x ≠ y,<br />

241

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!