12.02.2013 Views

Lecture Notes: Flajolet-Martin Sketch 1 Distinct element counting ...

Lecture Notes: Flajolet-Martin Sketch 1 Distinct element counting ...

Lecture Notes: Flajolet-Martin Sketch 1 Distinct element counting ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Our proof is based on [1]. We say that our algorithm is correct if 1 ˜F<br />

c ≤ F ≤ c (i.e., our estimate<br />

˜F is off by at most a factor of c, from either above or below). The above proposition indicates that<br />

our algorithm is correct with at least a constant probability 1− 3<br />

c > 0.<br />

Lemma 1. For any integer r ∈ [0,w], Pr[zk ≥ r] = 1<br />

2 r.<br />

Proof. Note that zk ≥ r means that the hash value h(k) of k is between 0...0 0...0 and 0...01...1,<br />

namely, between 0 and 2 w−r −1. Remember that h(k) is uniformly distributed from 0 to 2 w −1.<br />

Hence:<br />

Let us fix an r. For each k ∈ S, define:<br />

Pr[zk ≥ r] = 2w−r 1<br />

=<br />

2w 2r. xk(r) =<br />

� 1 if zk ≥ r<br />

0 otherwise<br />

By Lemma 1, we know that xk(r) takes 1 with probability 1/2 r . Hence:<br />

Also define:<br />

Let:<br />

E[xk(r)] = 1/2 r<br />

var[xk(r)] = 1<br />

2 r<br />

X(r) = �<br />

distinct k ∈ S<br />

�<br />

1− 1<br />

2 r<br />

�<br />

xk(r).<br />

r1 = the smallest r such that 2 r > cF<br />

r2 = the smallest r such that 2 r ≥ F<br />

c<br />

Lemma 2. Our algorithm is correct if X(r1) = 0 and X(r2) �= 0.<br />

Proof. Our algorithm is correct if Z as given in (1) satisfies r2 ≤ Z < r1, due to the definitions of r1<br />

and r2. If X(r1) = 0, it means that no k ∈ S gives an zk ≥ r1; this implies Z < r1 (see again (1)).<br />

Likewise, if X(r2) �= 0, it means that at least one k ∈ S gives an zk ≥ r2; this implies Z ≥ r2.<br />

Next, we will prove that the probability of having “X(r1) = 0 and X(r2) �= 0” is at least<br />

1−3/c. Towards this, we will consider the complements of these two events, namely: X(r1) ≥ 1<br />

and X(r2) = 0. We will prove that X(r1) ≥ 1 can happen with probability at most 1/c, whereas<br />

X(r2) = 0 can happen with probability at most 2/c. then it follows from the union bound that<br />

the probability of at least one of the two events happening is at most 3/c. This is sufficient for<br />

establishing Proposition 1.<br />

Lemma 3. Pr[X(r1) ≥ 1] < 1/c.<br />

2<br />

����<br />

r<br />

����<br />

w−r<br />

����<br />

r<br />

����<br />

w−r<br />

(2)<br />

(3)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!