08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

andom variables, but also Gaussians, exponentials, Poisson, etc.<br />

The gold standard for tail bounds is the central limit theorem for independent, identically<br />

distributed random variables x 1 , x 2 , · · · , x n with zero mean and Var(x i ) = σ 2 that<br />

states as n → ∞ the distribution <strong>of</strong> x = (x 1 + x 2 + · · · + x n )/ √ n tends to the Gaussian<br />

density with zero mean and variance σ 2 . Loosely, this says that in the limit, the<br />

tails <strong>of</strong> x = (x 1 + x 2 + · · · + x n )/ √ n are bounded by that <strong>of</strong> a Gaussian with variance<br />

σ 2 . But this theorem is only in the limit, whereas, we prove a bound that applies for all n.<br />

In the following theorem, x is the sum <strong>of</strong> n independent, not necessarily identically<br />

distributed, random variables x 1 , x 2 , . . . , x n , each <strong>of</strong> zero mean and variance at most σ 2 .<br />

By the central limit theorem, in the limit the probability density <strong>of</strong> x goes to that <strong>of</strong><br />

the Gaussian with variance at most nσ 2 . In a limit sense, this implies an upper bound<br />

<strong>of</strong> ce −a2 /(2nσ 2)<br />

for the tail probability Pr(|x| > a) for some constant c. The following<br />

theorem assumes bounds on higher moments, and asserts a quantitative upper bound <strong>of</strong><br />

3e −a2 /(12nσ 2) on the tail probability, not just in the limit, but for every n. We will apply<br />

this theorem to get tail bounds on sums <strong>of</strong> Gaussian, binomial, and power law distributed<br />

random variables.<br />

Theorem 12.5 Let x = x 1 + x 2 + · · · + x n , where x 1 , x 2 , . . . , x n are mutually independent<br />

random variables with zero mean and variance at most σ 2 . Suppose a ∈ [0, √ 2nσ 2 ] and<br />

s ≤ nσ 2 /2 is a positive even integer and |E(x r i )| ≤ σ 2 r!, for r = 3, 4, . . . , s. Then,<br />

( ) 2snσ<br />

2 s/2<br />

Pr (|x 1 + x 2 + · · · x n | ≥ a) ≤<br />

.<br />

a 2<br />

If further, s ≥ a 2 /(4nσ 2 ), then we also have:<br />

Pr (|x 1 + x 2 + · · · x n | ≥ a) ≤ 3e −a2 /(12nσ 2) .<br />

Pro<strong>of</strong>: We first prove an upper bound on E(x r ) for any even positive integer r and then<br />

use Markov’s inequality as discussed earlier. Expand (x 1 + x 2 + · · · + x n ) r .<br />

(x 1 + x 2 + · · · + x n ) r = ∑ ( )<br />

r<br />

x r 1<br />

1 x r 2<br />

2 · · · x rn<br />

n<br />

r 1 , r 2 , . . . , r n<br />

= ∑ r!<br />

r 1 !r 2 ! · · · r n ! xr 1<br />

1 x r 2<br />

2 · · · x rn<br />

n<br />

where the r i range over all nonnegative integers summing to r. By independence<br />

E(x r ) = ∑ r!<br />

r 1 !r 2 ! · · · r n ! E(xr 1<br />

1 )E(x r 2<br />

2 ) · · · E(x rn<br />

n ).<br />

If in a term, any r i = 1, the term is zero since E(x i ) = 0. Assume henceforth that<br />

(r 1 , r 2 , . . . , r n ) runs over sets <strong>of</strong> nonzero r i summing to r where each nonzero r i is at least<br />

two. There are at most r/2 nonzero r i in each set. Since |E(x r i<br />

i )| ≤ σ2 r i !,<br />

∑<br />

E(x r ) ≤ r! σ 2( number <strong>of</strong> nonzero r i in set) .<br />

(r 1 ,r 2 ,...,r n)<br />

402

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!