25.03.2013 Views

Chapter 12: Hypergeometric Distribution - School of Computing

Chapter 12: Hypergeometric Distribution - School of Computing

Chapter 12: Hypergeometric Distribution - School of Computing

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Chapter</strong> <strong>12</strong>:<br />

<strong>Hypergeometric</strong> <strong>Distribution</strong><br />

CA266<br />

<strong>School</strong> <strong>of</strong> <strong>Computing</strong>, DCU<br />

Marija Bezbradica<br />

mbezbradica@computing.dcu.ie<br />

http://www.computing.dcu.ie/~mbezbradica/<br />

(based on the book by Pr<strong>of</strong>. Jane M. Horgan) 1


Example 1. Five cards are chosen from a well shuffled deck.<br />

Lets denote:<br />

X = number <strong>of</strong> diamonds selected<br />

p – probability <strong>of</strong> selecting a diamond, i.e. the probability <strong>of</strong> “success”<br />

n – number <strong>of</strong> selected cards<br />

N – larger group, total size<br />

p = 0.25, n = 5, N = 52<br />

P(X=0) = P(0 diamond in the selected 5)<br />

P(X=1) = P(1 diamond in the selected 5)<br />

P(X=2) = P(2 diamonds in the selected 5)<br />

Example 2. An audio amplifier contains six transistors. It has been ascertained that three <strong>of</strong><br />

the transistors are faulty but it is not known which three. Amy removes three transistors<br />

at random, and inspects them.<br />

X = number <strong>of</strong> defective transistors that Amy finds<br />

p – probability <strong>of</strong> selecting a default, i.e. the probability <strong>of</strong> “success”<br />

n – number <strong>of</strong> selected transistors<br />

N – total size, total size<br />

p = 0.05, n = 3, N = 6<br />

P(X=0) = P(the sample will catch none <strong>of</strong> the defective transistors)<br />

P(X=1) = P(the sample will catch one <strong>of</strong> the defective transistors)<br />

P(X=2) = P(the sample will catch two <strong>of</strong> the defective transistors)<br />

P(X=3) = P(the sample will catch three <strong>of</strong> the defective transistors)<br />

2


Example 3. A batch <strong>of</strong> 20 integrated circuit chips contains 20% defective chips. A sample<br />

<strong>of</strong> 10 is drawn at random.<br />

X = number <strong>of</strong> defective chips in the sample<br />

p = 0.1, n = 10, N = 20<br />

P(X=0) = P(0 defectives in the sample <strong>of</strong> size 10)<br />

P(X=1) = P(1 defective in the sample <strong>of</strong> size 10)<br />

P(X=2) = P(2 defectives in the sample <strong>of</strong> size 10)<br />

Example 4. A batch <strong>of</strong> 100 printed circuit cards is populated with semiconductor chips.<br />

20 <strong>of</strong> these are selected without replacement for function testing. If the original<br />

batch contains 30 defective cards, how will these show up in the sample?<br />

X = number <strong>of</strong> defective cards in the sample<br />

p = 0.3, n = 20, N = 100<br />

P(X=0) = P(0 defectives in the sample <strong>of</strong> size 20)<br />

P(X=1) = P(1 defective in the sample <strong>of</strong> size 20)<br />

P(X=2) = P(2 defectives in the sample <strong>of</strong> size 20)<br />

3


<strong>Hypergeometric</strong> <strong>Distribution</strong><br />

Recall:<br />

• In Geometric – we investigated the number <strong>of</strong> trials necessary to arrive at the first<br />

success<br />

• In Binomial distribution – we investigated the number <strong>of</strong> successes in a fixed<br />

number <strong>of</strong> n draws with replacement<br />

In <strong>Hypergeometric</strong> distribution – probability <strong>of</strong> successes in n draws from a finite<br />

population without replacement<br />

Parameters are:<br />

This gives:<br />

n – number <strong>of</strong> chosen<br />

N – total number<br />

p – proportion <strong>of</strong> successes in the group to start with<br />

M <br />

Np<br />

where M is number <strong>of</strong> successes in the group to start with<br />

4


Example 1.<br />

p = 0.25, n = 5, N = 52<br />

The deck can be visualised as:<br />

Five cards are selected at random and without replacement from a deck.<br />

P(X=0) = P(probability <strong>of</strong> getting 0 diamonds in the set)<br />

P(X=1) = P(1 diamond in the selected 5) →<br />

13<br />

Diamonds<br />

P(X=2) = P(2 diamonds in the selected 5) →<br />

similar for P(X=3), P(X=4), P(X=5) ...<br />

39<br />

Nondiamonds<br />

5


<strong>Hypergeometric</strong> <strong>Distribution</strong> - PDF<br />

Def<br />

Conditions:<br />

1. An experiment consists <strong>of</strong> n repeated trials<br />

2. A finite population <strong>of</strong> size N consists <strong>of</strong>:<br />

a) M elements called successes: M = Np<br />

b) L elements called failures: L = N – M<br />

X = number <strong>of</strong> successes<br />

X is said to have a hypergeometric distribution<br />

Np = M<br />

Successes<br />

N(1-p ) = N - M<br />

Failures<br />

A sample <strong>of</strong> n elements are selected at random without replacement<br />

6


Example 3. – cont.<br />

What is probability that Amy finds three defective transistors when she selects three<br />

<strong>of</strong> the six?<br />

Solution 3:<br />

Example 5.<br />

Draw 6 cards from a deck without replacement. What is the probability <strong>of</strong> getting<br />

two hearts?<br />

Solution 5:<br />

M = 13 number <strong>of</strong> hearts<br />

L = 39 number <strong>of</strong> non-hearts<br />

N = 52 total<br />

Check in R:<br />

> dhyper(2, 13, 39, 6)<br />

[1] 0.315<strong>12</strong>99<br />

> round(dhyper(2, 13, 39, 6), 5)<br />

[1] 0.31513<br />

7


Calculating hypergeometric pdfs in R:<br />

Use prefix “hyper” with “d” for pdf. Function is dhyper with arguments M and N - M.<br />

Example 1: Five cards from a deck<br />

x


<strong>Hypergeometric</strong> pdfs<br />

9


Example 6. Lotto a) with N = 36<br />

The format <strong>of</strong> the game is such that you have to correctly chose n different numbers from N<br />

Classic example <strong>of</strong> hypergeometric distribution:<br />

n = 6<br />

N = 36 (was in Ireland, now is 45, we will se why)<br />

The population consists <strong>of</strong>:<br />

for x = 0, 1, 2, 3, 4, 5, 6<br />

Calculate in R:<br />

x


Example 6. Lotto b) with N = 42<br />

42 balls are numbered 1 - 42.<br />

You select six numbers between 1 and 42. (The ones you write on your lotto card)<br />

Number <strong>of</strong> possible ways to draw six numbers in the range [ 1, 42 ] =<br />

What is the probability that they contain:<br />

(i) match 6?<br />

(ii) match 5?<br />

(iii) match 4?<br />

(iv) match 3?<br />

Solution 6b:<br />

Total, N = 42; Favourable, M = 6; Non-Favourable, L = 36<br />

Sample size n = 6.<br />

(iii) match 4:<br />

x = 4<br />

11


Solution 6. Lotto – cont.<br />

6 correct numbers, just one way <strong>of</strong> winning:<br />

With 36 numbers<br />

choose(36, 6)<br />

Winning the Jackpot<br />

[1] 1,947,792 # a chance <strong>of</strong> about 1 in 2 million!<br />

With 39 numbers<br />

choose(39, 6)<br />

[1] 3,262,623 # a chance <strong>of</strong> less than 1 in 3 million!<br />

With 42 numbers<br />

choose(42, 6)<br />

[1] 5,245,786 # less than 1 in 5 million chance!<br />

The current situation <strong>of</strong> 45 numbers:<br />

choose(45, 6)<br />

[1] 8,145,060 # a chance <strong>of</strong> less than 1 in 8 million!<br />

<strong>12</strong>


<strong>Hypergeometric</strong> CDF<br />

Cumulative <strong>Distribution</strong> Function (CDF) computes probabilities that the selected item will<br />

contain up to and include x successes: F(x) = P(X ≤ x)<br />

Example 1:<br />

What is the probability <strong>of</strong> getting at most 2 diamonds in the 5 selected without<br />

replacement from a well shuffled deck?<br />

x = 2<br />

n = 5<br />

N = 52<br />

M = 13<br />

L = N – M = 39<br />

Check in R:<br />

> phyper(2, 13, 39, 5)<br />

[1]0.9072329<br />

Example 6:<br />

What is the probability <strong>of</strong> getting something in lotto?<br />

P(X ≥ 3) = 1 – P(X ≤ 2) = 1 – (P(match 0) + P(match 1) + P(match 2))<br />

13


Calculating hypergeometric cdfs in R:<br />

par(mfrow = c(2,2))<br />

x


<strong>Hypergeometric</strong> cdfs<br />

15


Binomial or <strong>Hypergeometric</strong>?<br />

Boxes contain 20 items <strong>of</strong> which 10% are defective.<br />

Find the probability that no more than 2 defectives will be obtained in a sample <strong>of</strong> size 10.<br />

X = no <strong>of</strong> defectives<br />

With Replacement Sampling<br />

Without Replacement Sampling<br />

P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)<br />

→ probabilities<br />

are different for N = 20!<br />

16


Binomial or <strong>Hypergeometric</strong>? – cont.<br />

Boxes contain 200 items <strong>of</strong> which 10% are defective.<br />

Find the probability that no more than 2 defectives will be obtained in a sample <strong>of</strong> size 10.<br />

X = no <strong>of</strong> defectives<br />

With Replacement Sampling<br />

Without Replacement Sampling<br />

P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)<br />

→ near to probabilities<br />

from binomial for N = 200!<br />

17


Binomial or <strong>Hypergeometric</strong>? – cont.<br />

Boxes contain 2000 items <strong>of</strong> which 10% are defective.<br />

Find the probability that no more than 2 defectives will be obtained in a sample <strong>of</strong> size 10.<br />

X = no <strong>of</strong> defectives<br />

With Replacement Sampling<br />

Without Replacement Sampling<br />

P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)<br />

→ very close to probabilities<br />

from binomial for N = 2000!<br />

18


Binomial or <strong>Hypergeometric</strong>? – cont.<br />

What is the probability <strong>of</strong> getting no more than 2 defectives in a random sample drawn<br />

without replacement from a batch which has 10% defectives?<br />

- we illustrated that when the batch size N is large the hypergeometric may be<br />

approximated with binomial!<br />

Batch size 20 200 2000 Binomial approx<br />

P(X = 0) 0.2368 0.3398 0.3476 0.3487<br />

P(X = 1) 0.5263 0.3974 0.3884 0.3874<br />

P(X = 2) 0.2368 0.1975 0.1941 0.1937<br />

P(X ≤ 2) 0.999 0.9347 0.9296 0.9298<br />

19


<strong>Hypergeometric</strong> and binomial pdfs with n = 10, p = 0.1<br />

20


Plotting hypergeometric and binomial<br />

x


<strong>Hypergeometric</strong> vs. binomial pdfs with n = 10, p = 0.1 using matplot<br />

22


Matplot<br />

Matplot - Plot the columns <strong>of</strong> one matrix against the<br />

columns <strong>of</strong> another.<br />

x


Let:<br />

As N →∞, the hypergeometric distribution converges to the binomial!<br />

Population Size = N<br />

Proportion <strong>of</strong> successes = p<br />

Number <strong>of</strong> successes in N = Np<br />

Number <strong>of</strong> failures = N(1 − p)<br />

X = number <strong>of</strong> successes in s sample <strong>of</strong> size n drawn without replacement<br />

from N<br />

Np = M<br />

Successes<br />

N(1-p ) = N - M<br />

Failures<br />

→ for N → ∞<br />

24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!