08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12.5 Bounds on Tail Probability<br />

After an introduction to tail inequalties, the main purpose <strong>of</strong> this section is to state<br />

the Master Tail bounds theorem <strong>of</strong> Chapter 2 (with more detail), give a pro<strong>of</strong> <strong>of</strong> it, and<br />

derive the other tail inequalities mentioned in the table in that chapter.<br />

Markov’s inequality bounds the tail probability <strong>of</strong> a nonnegative random variable x<br />

based only on its expectation. For a > 0,<br />

Pr(x > a) ≤ E(x)<br />

a .<br />

As a grows, the bound drops <strong>of</strong>f as 1/a. Given the second moment <strong>of</strong> x, Chebyshev’s<br />

inequality, which does not assume x is a nonnegative random variable, gives a tail bound<br />

falling <strong>of</strong>f as 1/a 2<br />

( (x ) 2<br />

)<br />

E − E(x)<br />

Pr(|x − E(x)| ≥ a) ≤<br />

.<br />

a 2<br />

Higher moments yield bounds by applying either <strong>of</strong> these two theorems. For example,<br />

if r is a nonnegative even integer, then x r is a nonnegative random variable even if x takes<br />

on negative values. Applying Markov’s inequality to x r ,<br />

Pr(|x| ≥ a) = Pr(x r ≥ a r ) ≤ E(xr )<br />

a r ,<br />

a bound that falls <strong>of</strong>f as 1/a r . The larger the r, the greater the rate <strong>of</strong> fall, but a bound<br />

on E(x r ) is needed to apply this technique.<br />

For a random variable x that is the sum <strong>of</strong> a large number <strong>of</strong> independent random<br />

variables, x 1 , x 2 , . . . , x n , one can derive bounds on E(x r ) for high even r. There are many<br />

situations where the sum <strong>of</strong> a large number <strong>of</strong> independent random variables arises. For<br />

example, x i may be the amount <strong>of</strong> a good that the i th consumer buys, the length <strong>of</strong> the i th<br />

message sent over a network, or the indicator random variable <strong>of</strong> whether the i th record<br />

in a large database has a certain property. Each x i is modeled by a simple probability<br />

distribution. Gaussian, exponential probability density (at any t > 0 is e −t ), or binomial<br />

distributions are typically used, in fact, respectively in the three examples here. If the x i<br />

have 0-1 distributions, there are a number <strong>of</strong> theorems called Chern<strong>of</strong>f bounds, bounding<br />

the tails <strong>of</strong> x = x 1 + x 2 + · · · + x n , typically proved by the so-called moment-generating<br />

function method (see Section 12.4.11 <strong>of</strong> the appendix). But exponential and Gaussian random<br />

variables are not bounded and these methods do not apply. However, good bounds<br />

on the moments <strong>of</strong> these two distributions are known. Indeed, for any integer s > 0, the<br />

s th moment for the unit variance Gaussian and the exponential are both at most s!.<br />

Given bounds on the moments <strong>of</strong> individual x i the following theorem proves moment<br />

bounds on their sum. We use this theorem to derive tail bounds not only for sums <strong>of</strong> 0-1<br />

401

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!