08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Example: Consider flipping a coin 100 times. Suppose 62 heads and 38 tails occur.<br />

What is the most likely value <strong>of</strong> the probability <strong>of</strong> the coin to come down heads when the<br />

coin is flipped? In this case, it is r = 0.62. The probability that we get 62 heads if the<br />

unknown probability <strong>of</strong> heads in one trial is r is<br />

( ) 100<br />

Prob (62 heads|r) = r 62 (1 − r) 38 .<br />

62<br />

This quantity is maximized when r = 0.62. To see this take the logarithm, which as a<br />

function <strong>of</strong> r is ln ( )<br />

100<br />

62 + 62 ln r + 38 ln(1 − r). The derivative with respect to r is zero at<br />

r = 0.62 and the second derivative is negative indicating a maximum. Thus, r = 0.62 is<br />

the maximum likelihood estimator <strong>of</strong> the probability <strong>of</strong> heads in a trial.<br />

12.4.11 Tail Bounds and Chern<strong>of</strong>f inequalities<br />

Markov’s inequality bounds the probability that a nonnegative random variable exceeds<br />

a value a.<br />

p(x ≥ a) ≤ E(x)<br />

a .<br />

or<br />

p ( x ≥ aE(x) ) ≤ 1 a<br />

If one also knows the variance, σ 2 , then using Chebyshev’s inequality one can bound the<br />

probability that a random variable differs from its expected value by more than a standard<br />

deviations.<br />

p(|x − m| ≥ aσ) ≤ 1 a 2<br />

If a random variable s is the sum <strong>of</strong> n independent random variables x 1 , x 2 , . . . , x n <strong>of</strong><br />

finite variance, then better bounds are possible. For any δ > 0,<br />

[ ]<br />

e δ m<br />

Prob(s > (1 + δ)m) <<br />

(1 + δ) (1+δ)<br />

and for 0 < γ ≤ 1,<br />

Prob ( s < (1 − γ)m ) [<br />

]<br />

e −γ m<br />

<<br />

< e − γ2 m<br />

(1 + γ) (1+γ) 2<br />

Chern<strong>of</strong>f inequalities<br />

Chebyshev’s inequality bounds the probability that a random variable will deviate<br />

from its mean by more than a given amount. Chebyshev’s inequality holds for any probability<br />

distribution. For some distributions we can get much tighter bounds. For example,<br />

the probability that a Gaussian random variable deviates from its mean falls <strong>of</strong>f exponentially<br />

with the distance from the mean. Here we shall be concerned with the situation<br />

where we have a random variable that is the sum <strong>of</strong> n independent random variables. This<br />

397

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!