08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

expected value <strong>of</strong> the sum will be zero where the expectation is over the choice <strong>of</strong> the ±1<br />

value for the x s .<br />

( m<br />

)<br />

∑<br />

E x s f s = 0<br />

s=1<br />

Although the expected value <strong>of</strong> the sum is zero, its actual value is a random variable and<br />

the expected value <strong>of</strong> the square <strong>of</strong> the sum is given by<br />

( m<br />

) 2 (<br />

∑<br />

m<br />

)<br />

)<br />

∑<br />

m∑<br />

E x s f s = E x 2 sfs<br />

2 + 2E x s x t f s f t = fs 2 ,<br />

s=1<br />

s=1<br />

( ∑<br />

s≠t<br />

The last equality follows since E (x s x t ) = E(x s )E(x t ) = 0 for s ≠ t, using pairwise<br />

independence <strong>of</strong> the random variables. Thus<br />

( m<br />

) 2<br />

∑<br />

a = x s f s<br />

s=1<br />

is an unbiased estimator <strong>of</strong> ∑ m<br />

s=1 f s 2 in that it has the correct expectation. Note that at<br />

this point we could use Markov’s inequality to state that Prob(a ≥ 3 ∑ m<br />

s=1 f s 2 ) ≤ 1/3, but<br />

we want to get a tighter guarantee. To do so, consider the second moment <strong>of</strong> a:<br />

( m<br />

) 4 ( )<br />

∑<br />

∑<br />

E(a 2 ) = E x s f s = E<br />

x s x t x u x v f s f t f u f v .<br />

s=1<br />

1≤s,t,u,v≤m<br />

The last equality is by expansion. Assume that the random variables x s are 4-wise independent,<br />

or equivalently that they are produced by a 4-wise independent hash function.<br />

Then, since the x s are independent in the last sum, if any one <strong>of</strong> s, u, t, or v is distinct<br />

from the others, then the expectation <strong>of</strong> the whole term is zero. Thus, we need to deal<br />

only with terms <strong>of</strong> the form x 2 sx 2 t for t ≠ s and terms <strong>of</strong> the form x 4 s.<br />

Each term in the above sum has four indices, s, t, u, v, and there are ( 4<br />

2)<br />

ways <strong>of</strong><br />

choosing two indices that have the same x value. Thus,<br />

E(a 2 ) ≤<br />

= 6<br />

( 4<br />

2)<br />

E<br />

m∑<br />

( m<br />

∑<br />

m∑<br />

s=1 t=s+1<br />

m∑<br />

s=1 t=s+1<br />

f 2 s f 2 t +<br />

s=1<br />

) ( m<br />

)<br />

∑<br />

x 2 sx 2 t fs 2 ft<br />

2 + E x 4 sfs<br />

4<br />

m∑<br />

s=1<br />

f 4 s<br />

( m<br />

) 2<br />

∑<br />

≤ 3 fs 2 = 3E 2 (a).<br />

s=1<br />

Therefore, V ar(a) = E(a 2 ) − E 2 (a) ≤ 2E 2 (a).<br />

s=1<br />

245

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!