08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Also, if x and y are independent, then E(xy) = E(x)E(y). These facts imply that if x<br />

and y are independent then V ar(x + y) = V ar(x) + V ar(y), which is seen as follows:<br />

V ar(x + y) = E(x + y) 2 − E 2 (x + y)<br />

= E(x 2 + 2xy + y 2 ) − ( E 2 (x) + 2E(x)E(y) + E 2 (y) )<br />

= E(x 2 ) − E 2 (x) + E(y 2 ) − E 2 (y) = V ar(x) + V ar(y),<br />

where we used independence to replace E(2xy) with 2E(x)E(y).<br />

Theorem 2.4 (Law <strong>of</strong> large numbers) Let x 1 , x 2 , . . . , x n be n independent samples <strong>of</strong><br />

a random variable x. Then<br />

(∣ )<br />

∣∣∣ x 1 + x 2 + · · · + x n<br />

Prob<br />

− E(x)<br />

n<br />

∣ ≥ ɛ ≤ V ar(x)<br />

nɛ 2<br />

Pro<strong>of</strong>: By Chebychev’s inequality<br />

(∣ )<br />

∣∣∣ x 1 + x 2 + · · · + x n<br />

Prob<br />

− E(x)<br />

n<br />

∣ ≥ ɛ ≤ V ar ( x 1 +x 2 +···+x n<br />

)<br />

n<br />

ɛ 2<br />

= 1<br />

n 2 ɛ V ar(x 2 1 + x 2 + · · · + x n )<br />

= 1 (<br />

V ar(x1 ) + V ar(x<br />

n 2 ɛ 2 2 ) + · · · + V ar(x n ) )<br />

= V ar(x) .<br />

nɛ 2<br />

The Law <strong>of</strong> Large Numbers is quite general, applying to any random variable x <strong>of</strong> finite<br />

variance. Later we will look at tighter concentration bounds for spherical Gaussians and<br />

sums <strong>of</strong> 0-1 valued random variables.<br />

As an application <strong>of</strong> the Law <strong>of</strong> Large Numbers, let z be a d-dimensional random point<br />

1<br />

whose coordinates are each selected from a zero mean, variance Gaussian. We set the<br />

2π<br />

variance to 1 so the Gaussian probability density equals one at the origin and is bounded<br />

2π<br />

below throughout the unit ball by a constant. By the Law <strong>of</strong> Large Numbers, the square<br />

<strong>of</strong> the distance <strong>of</strong> z to the origin will be Θ(d) with high probability. In particular, there is<br />

vanishingly small probability that such a random point z would lie in the unit ball. This<br />

implies that the integral <strong>of</strong> the probability density over the unit ball must be vanishingly<br />

small. On the other hand, the probability density in the unit ball is bounded below by a<br />

constant. We thus conclude that the unit ball must have vanishingly small volume.<br />

Similarly if we draw two points y and z from a d-dimensional Gaussian with unit<br />

variance in each direction, then |y| 2 ≈ d, |z| 2 ≈ d, and |y − z| 2 ≈ 2d (since E(y i − z i ) 2 =<br />

E(y 2 i ) + E(z 2 i ) − 2E(y i z i ) = 2 for all i.) Thus by the Pythagorean theorem, the random<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!