08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Poisson distribution<br />

The Poisson distribution describes the probability <strong>of</strong> k events happening in a unit <strong>of</strong><br />

time when the average rate per unit <strong>of</strong> time is λ. Divide the unit <strong>of</strong> time into n segments.<br />

When n is large enough, each segment is sufficiently small so that the probability <strong>of</strong> two<br />

events happening in the same segment is negligible. The Poisson distribution gives the<br />

probability <strong>of</strong> k events happening in a unit <strong>of</strong> time and can be derived from the binomial<br />

distribution by taking the limit as n → ∞.<br />

Let p = λ n . Then<br />

( ( n λ<br />

Prob(k successes in a unit <strong>of</strong> time) = lim<br />

n→∞ k)<br />

n<br />

= lim<br />

n→∞<br />

n (n − 1) · · · (n − k + 1)<br />

k!<br />

λ k<br />

= lim<br />

n→∞ k! e−λ<br />

( λ<br />

n) k (<br />

1 − λ n<br />

) k (<br />

1 − λ ) n−k<br />

n<br />

) n (<br />

1 − λ ) −k<br />

n<br />

In the limit as n goes to infinity the binomial distribution p (k) = ( n<br />

k)<br />

p k (1 − p) n−k becomes<br />

the Poisson distribution p (k) = e . The mean and the variance <strong>of</strong> the Poisson<br />

−λ λk<br />

k!<br />

distribution have value λ. If x and y are both Poisson random variables from distributions<br />

with means λ 1 and λ 2 respectively, then x + y is Poisson with mean m 1 + m 2 . For large<br />

n and small p the binomial distribution can be approximated with the Poisson distribution.<br />

The binomial distribution with mean np and variance np(1 − p) can be approximated<br />

by the normal distribution with mean np and variance np(1−p). The central limit theorem<br />

tells us that there is such an approximation in the limit. The approximation is good if<br />

both np and n(1 − p) are greater than 10 provided k is not extreme. Thus,<br />

( ) ( ) k ( ) n−k n 1 1 ∼= 1<br />

√ e − (n/2−k)2<br />

1<br />

2 n .<br />

k 2 2 πn/2<br />

This approximation is excellent provided k is Θ(n). The Poisson approximation<br />

( n<br />

k)p k (1 − p) k ∼ = e<br />

−np (np)k<br />

is <strong>of</strong>f for central values and tail values even for p = 1/2. The approximation<br />

( ) n<br />

p k (1 − p) n−k ∼ 1 = √ e − (pn−k)2<br />

pn<br />

k<br />

πpn<br />

is good for p = 1/2 but is <strong>of</strong>f for other values <strong>of</strong> p.<br />

k!<br />

394

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!