Chapter 12: Hypergeometric Distribution - School of Computing
Chapter 12: Hypergeometric Distribution - School of Computing
Chapter 12: Hypergeometric Distribution - School of Computing
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Chapter</strong> <strong>12</strong>:<br />
<strong>Hypergeometric</strong> <strong>Distribution</strong><br />
CA266<br />
<strong>School</strong> <strong>of</strong> <strong>Computing</strong>, DCU<br />
Marija Bezbradica<br />
mbezbradica@computing.dcu.ie<br />
http://www.computing.dcu.ie/~mbezbradica/<br />
(based on the book by Pr<strong>of</strong>. Jane M. Horgan) 1
Example 1. Five cards are chosen from a well shuffled deck.<br />
Lets denote:<br />
X = number <strong>of</strong> diamonds selected<br />
p – probability <strong>of</strong> selecting a diamond, i.e. the probability <strong>of</strong> “success”<br />
n – number <strong>of</strong> selected cards<br />
N – larger group, total size<br />
p = 0.25, n = 5, N = 52<br />
P(X=0) = P(0 diamond in the selected 5)<br />
P(X=1) = P(1 diamond in the selected 5)<br />
P(X=2) = P(2 diamonds in the selected 5)<br />
Example 2. An audio amplifier contains six transistors. It has been ascertained that three <strong>of</strong><br />
the transistors are faulty but it is not known which three. Amy removes three transistors<br />
at random, and inspects them.<br />
X = number <strong>of</strong> defective transistors that Amy finds<br />
p – probability <strong>of</strong> selecting a default, i.e. the probability <strong>of</strong> “success”<br />
n – number <strong>of</strong> selected transistors<br />
N – total size, total size<br />
p = 0.05, n = 3, N = 6<br />
P(X=0) = P(the sample will catch none <strong>of</strong> the defective transistors)<br />
P(X=1) = P(the sample will catch one <strong>of</strong> the defective transistors)<br />
P(X=2) = P(the sample will catch two <strong>of</strong> the defective transistors)<br />
P(X=3) = P(the sample will catch three <strong>of</strong> the defective transistors)<br />
2
Example 3. A batch <strong>of</strong> 20 integrated circuit chips contains 20% defective chips. A sample<br />
<strong>of</strong> 10 is drawn at random.<br />
X = number <strong>of</strong> defective chips in the sample<br />
p = 0.1, n = 10, N = 20<br />
P(X=0) = P(0 defectives in the sample <strong>of</strong> size 10)<br />
P(X=1) = P(1 defective in the sample <strong>of</strong> size 10)<br />
P(X=2) = P(2 defectives in the sample <strong>of</strong> size 10)<br />
Example 4. A batch <strong>of</strong> 100 printed circuit cards is populated with semiconductor chips.<br />
20 <strong>of</strong> these are selected without replacement for function testing. If the original<br />
batch contains 30 defective cards, how will these show up in the sample?<br />
X = number <strong>of</strong> defective cards in the sample<br />
p = 0.3, n = 20, N = 100<br />
P(X=0) = P(0 defectives in the sample <strong>of</strong> size 20)<br />
P(X=1) = P(1 defective in the sample <strong>of</strong> size 20)<br />
P(X=2) = P(2 defectives in the sample <strong>of</strong> size 20)<br />
3
<strong>Hypergeometric</strong> <strong>Distribution</strong><br />
Recall:<br />
• In Geometric – we investigated the number <strong>of</strong> trials necessary to arrive at the first<br />
success<br />
• In Binomial distribution – we investigated the number <strong>of</strong> successes in a fixed<br />
number <strong>of</strong> n draws with replacement<br />
In <strong>Hypergeometric</strong> distribution – probability <strong>of</strong> successes in n draws from a finite<br />
population without replacement<br />
Parameters are:<br />
This gives:<br />
n – number <strong>of</strong> chosen<br />
N – total number<br />
p – proportion <strong>of</strong> successes in the group to start with<br />
M <br />
Np<br />
where M is number <strong>of</strong> successes in the group to start with<br />
4
Example 1.<br />
p = 0.25, n = 5, N = 52<br />
The deck can be visualised as:<br />
Five cards are selected at random and without replacement from a deck.<br />
P(X=0) = P(probability <strong>of</strong> getting 0 diamonds in the set)<br />
P(X=1) = P(1 diamond in the selected 5) →<br />
13<br />
Diamonds<br />
P(X=2) = P(2 diamonds in the selected 5) →<br />
similar for P(X=3), P(X=4), P(X=5) ...<br />
39<br />
Nondiamonds<br />
5
<strong>Hypergeometric</strong> <strong>Distribution</strong> - PDF<br />
Def<br />
Conditions:<br />
1. An experiment consists <strong>of</strong> n repeated trials<br />
2. A finite population <strong>of</strong> size N consists <strong>of</strong>:<br />
a) M elements called successes: M = Np<br />
b) L elements called failures: L = N – M<br />
X = number <strong>of</strong> successes<br />
X is said to have a hypergeometric distribution<br />
Np = M<br />
Successes<br />
N(1-p ) = N - M<br />
Failures<br />
A sample <strong>of</strong> n elements are selected at random without replacement<br />
6
Example 3. – cont.<br />
What is probability that Amy finds three defective transistors when she selects three<br />
<strong>of</strong> the six?<br />
Solution 3:<br />
Example 5.<br />
Draw 6 cards from a deck without replacement. What is the probability <strong>of</strong> getting<br />
two hearts?<br />
Solution 5:<br />
M = 13 number <strong>of</strong> hearts<br />
L = 39 number <strong>of</strong> non-hearts<br />
N = 52 total<br />
Check in R:<br />
> dhyper(2, 13, 39, 6)<br />
[1] 0.315<strong>12</strong>99<br />
> round(dhyper(2, 13, 39, 6), 5)<br />
[1] 0.31513<br />
7
Calculating hypergeometric pdfs in R:<br />
Use prefix “hyper” with “d” for pdf. Function is dhyper with arguments M and N - M.<br />
Example 1: Five cards from a deck<br />
x
<strong>Hypergeometric</strong> pdfs<br />
9
Example 6. Lotto a) with N = 36<br />
The format <strong>of</strong> the game is such that you have to correctly chose n different numbers from N<br />
Classic example <strong>of</strong> hypergeometric distribution:<br />
n = 6<br />
N = 36 (was in Ireland, now is 45, we will se why)<br />
The population consists <strong>of</strong>:<br />
for x = 0, 1, 2, 3, 4, 5, 6<br />
Calculate in R:<br />
x
Example 6. Lotto b) with N = 42<br />
42 balls are numbered 1 - 42.<br />
You select six numbers between 1 and 42. (The ones you write on your lotto card)<br />
Number <strong>of</strong> possible ways to draw six numbers in the range [ 1, 42 ] =<br />
What is the probability that they contain:<br />
(i) match 6?<br />
(ii) match 5?<br />
(iii) match 4?<br />
(iv) match 3?<br />
Solution 6b:<br />
Total, N = 42; Favourable, M = 6; Non-Favourable, L = 36<br />
Sample size n = 6.<br />
(iii) match 4:<br />
x = 4<br />
11
Solution 6. Lotto – cont.<br />
6 correct numbers, just one way <strong>of</strong> winning:<br />
With 36 numbers<br />
choose(36, 6)<br />
Winning the Jackpot<br />
[1] 1,947,792 # a chance <strong>of</strong> about 1 in 2 million!<br />
With 39 numbers<br />
choose(39, 6)<br />
[1] 3,262,623 # a chance <strong>of</strong> less than 1 in 3 million!<br />
With 42 numbers<br />
choose(42, 6)<br />
[1] 5,245,786 # less than 1 in 5 million chance!<br />
The current situation <strong>of</strong> 45 numbers:<br />
choose(45, 6)<br />
[1] 8,145,060 # a chance <strong>of</strong> less than 1 in 8 million!<br />
<strong>12</strong>
<strong>Hypergeometric</strong> CDF<br />
Cumulative <strong>Distribution</strong> Function (CDF) computes probabilities that the selected item will<br />
contain up to and include x successes: F(x) = P(X ≤ x)<br />
Example 1:<br />
What is the probability <strong>of</strong> getting at most 2 diamonds in the 5 selected without<br />
replacement from a well shuffled deck?<br />
x = 2<br />
n = 5<br />
N = 52<br />
M = 13<br />
L = N – M = 39<br />
Check in R:<br />
> phyper(2, 13, 39, 5)<br />
[1]0.9072329<br />
Example 6:<br />
What is the probability <strong>of</strong> getting something in lotto?<br />
P(X ≥ 3) = 1 – P(X ≤ 2) = 1 – (P(match 0) + P(match 1) + P(match 2))<br />
13
Calculating hypergeometric cdfs in R:<br />
par(mfrow = c(2,2))<br />
x
<strong>Hypergeometric</strong> cdfs<br />
15
Binomial or <strong>Hypergeometric</strong>?<br />
Boxes contain 20 items <strong>of</strong> which 10% are defective.<br />
Find the probability that no more than 2 defectives will be obtained in a sample <strong>of</strong> size 10.<br />
X = no <strong>of</strong> defectives<br />
With Replacement Sampling<br />
Without Replacement Sampling<br />
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)<br />
→ probabilities<br />
are different for N = 20!<br />
16
Binomial or <strong>Hypergeometric</strong>? – cont.<br />
Boxes contain 200 items <strong>of</strong> which 10% are defective.<br />
Find the probability that no more than 2 defectives will be obtained in a sample <strong>of</strong> size 10.<br />
X = no <strong>of</strong> defectives<br />
With Replacement Sampling<br />
Without Replacement Sampling<br />
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)<br />
→ near to probabilities<br />
from binomial for N = 200!<br />
17
Binomial or <strong>Hypergeometric</strong>? – cont.<br />
Boxes contain 2000 items <strong>of</strong> which 10% are defective.<br />
Find the probability that no more than 2 defectives will be obtained in a sample <strong>of</strong> size 10.<br />
X = no <strong>of</strong> defectives<br />
With Replacement Sampling<br />
Without Replacement Sampling<br />
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)<br />
→ very close to probabilities<br />
from binomial for N = 2000!<br />
18
Binomial or <strong>Hypergeometric</strong>? – cont.<br />
What is the probability <strong>of</strong> getting no more than 2 defectives in a random sample drawn<br />
without replacement from a batch which has 10% defectives?<br />
- we illustrated that when the batch size N is large the hypergeometric may be<br />
approximated with binomial!<br />
Batch size 20 200 2000 Binomial approx<br />
P(X = 0) 0.2368 0.3398 0.3476 0.3487<br />
P(X = 1) 0.5263 0.3974 0.3884 0.3874<br />
P(X = 2) 0.2368 0.1975 0.1941 0.1937<br />
P(X ≤ 2) 0.999 0.9347 0.9296 0.9298<br />
19
<strong>Hypergeometric</strong> and binomial pdfs with n = 10, p = 0.1<br />
20
Plotting hypergeometric and binomial<br />
x
<strong>Hypergeometric</strong> vs. binomial pdfs with n = 10, p = 0.1 using matplot<br />
22
Matplot<br />
Matplot - Plot the columns <strong>of</strong> one matrix against the<br />
columns <strong>of</strong> another.<br />
x
Let:<br />
As N →∞, the hypergeometric distribution converges to the binomial!<br />
Population Size = N<br />
Proportion <strong>of</strong> successes = p<br />
Number <strong>of</strong> successes in N = Np<br />
Number <strong>of</strong> failures = N(1 − p)<br />
X = number <strong>of</strong> successes in s sample <strong>of</strong> size n drawn without replacement<br />
from N<br />
Np = M<br />
Successes<br />
N(1-p ) = N - M<br />
Failures<br />
→ for N → ∞<br />
24