Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF ...
Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF ...
Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Chapter</strong> 7: <strong>SAMPLING</strong> <strong>DISTRIBUTIONS</strong><br />
& <strong>POINT</strong> <strong>ESTIMATION</strong><br />
<strong>OF</strong> PARAMETERS<br />
Part 1: Introduction<br />
Sampling Distributions &<br />
the Central Limit Theorem<br />
Point Estimation & Estimators<br />
Sections 7-1 to 7-2<br />
Sample data is collected on a population to draw<br />
conclusions, or make statistical inferences, about<br />
the population.<br />
Types of statistical inference:<br />
1) parameter estimation (e.g. estimating µ)<br />
2) hypothesis testing (e.g. H 0 : µ = 50)<br />
1
Example of parameter estimation<br />
(or point estimation):<br />
We’re interested in the value of µ.<br />
The observed ¯x is a point estimate for µ.<br />
µ is the parameter being estimated.<br />
NOTATION: ˆµ = ¯X¯X is the estimator.<br />
{We often show an estimator as a ‘hat’<br />
over its respective parameter.}<br />
The estimate is a single value,<br />
or a point estimate.<br />
¯X is the statistic of interest from the data.<br />
2
Sample-to-sample variability<br />
The value we get for ¯X (the sample mean) depends<br />
on the sample chosen.<br />
¯X is a random variable!<br />
The distribution of ¯X is called the<br />
sampling distribution of ¯X.<br />
We expect ¯X to be close to µ (we ARE using<br />
it to estimate µ) but there is variability in<br />
¯X before it is observed because we use random<br />
sampling to choose our sample of size n.<br />
3
The sampling distribution of ¯X tells us what kind<br />
of values are likely to occur for ¯X.<br />
The sampling distribution of ¯X puts a probability<br />
distribution on the possible values for ¯X.<br />
In a simple random sample of n observations<br />
from a population,<br />
E( ¯X) = µ<br />
⇒ ¯X is an unbiased estimator of µ.<br />
This gives us a measure of center for the sampling<br />
distribution for ¯X, but what about the<br />
variability of the ¯X random variable<br />
4
Sampling distribution of ¯X<br />
Case 1 Original population is normally distributed.<br />
f(x)<br />
x<br />
The ¯x I observe depends on the sample (the<br />
particular n observations) I chose from this<br />
normal distribution.<br />
Let’s look at the distribution of ¯x values if I<br />
choose a sample of size n and compute ¯x for<br />
that sample, and I repeat this process 1000<br />
times...<br />
5
f(x)<br />
x<br />
1) Choose a sample of size n from a normal<br />
distribution<br />
2) Compute ¯x<br />
3) Plot the ¯x on our frequency histogram<br />
4) Do steps 1-3 1000 times<br />
See applet at:<br />
http://onlinestatbook.com/stat sim/sampling dist/index.html<br />
6
SKETCH THE PLOTS:<br />
Distribution of ¯X for n=2 when original population<br />
is normal.<br />
Distribution of ¯X for n=25 when original<br />
population is normal.<br />
7
Turns out, in this case, the the random variable<br />
¯X is normally distributed.<br />
This normal distribution is centered at µ (the<br />
mean of the original population we were sampling<br />
from).<br />
The variability of ¯X depends on the sample<br />
size n, and the variability in the original population.<br />
SPECIFICALLY:<br />
When X ∼ N(µ, σ 2 ),<br />
¯X ∼ N(µ, σ2<br />
n )<br />
NOTE: the distribution for ¯X is less variable<br />
than the distribution for X.<br />
8
¯X ∼ N(µ, σ2<br />
n )<br />
NOTE: ¯X from n = 25 is less variable than<br />
¯X from n = 2.<br />
More data (larger n) gives us a better estimate<br />
of µ from ¯X.<br />
The distribution of our estimator ¯X is squished<br />
closer, or is tighter, around the thing we’re<br />
trying to estimate. Which is beneficial when<br />
estimating something.<br />
9
Sampling distribution of ¯X<br />
Case 2 Original population is NOT normally<br />
distributed.<br />
f(x)<br />
f(x)<br />
x<br />
x<br />
f(x)<br />
x<br />
Or anything else...<br />
10
What does the distribution of ¯X look like<br />
1) Choose a sample of size n<br />
from the distribution<br />
2) Compute ¯x<br />
3) Plot the ¯x on our frequency histogram<br />
4) Do steps 1-3 1000 times<br />
———————————————————–<br />
Right-skewed with n = 10.<br />
11
Really non-normal (mass out at the ends)<br />
with n = 2.<br />
Really non-normal (mass out at the ends)<br />
with n = 25.<br />
12
Turns out the random variable ¯X is normally<br />
distributed no matter what your original distribution<br />
was IF n is large enough...<br />
What’s large enough<br />
Rule of thumb is n ≥ 30<br />
So, what have we learned...<br />
if X is normally distributed, then<br />
¯X ∼ N(µ, σ 2 /n) for any n.<br />
if X is NOT normally distributed, then<br />
¯X ∼ N(µ, σ 2 /n) for n ≥ 30.<br />
if X is not severely non-normal, then<br />
¯X ∼ N(µ, σ 2 /n) is close to true for n < 30.<br />
13
Sampling Distributions and<br />
the Central Limit Theorem<br />
Section 7-2<br />
Sample data is collected on a population to draw<br />
conclusions, or make statistical inferences, about<br />
the population.<br />
NOTATION:<br />
− A large letter like ¯X represents the random<br />
variable ¯X, and ¯X can take on many values.<br />
− A small letter like ¯x represents an actual observed<br />
¯x from a sample, and it is a fixed<br />
quanitity once observed.<br />
14
• Random Sample<br />
The random variables X 1 , X 2 , . . . , X n are a<br />
random sample of size n if...<br />
a) the X i ’s are independent random variables,<br />
and<br />
b) every X i has the same sample probability<br />
distribution (i.e. they are drawn from the<br />
same population).<br />
NOTE: the observed data x 1 , x 2 , . . . , x n is<br />
also referred to as a random sample.<br />
15
• Statistic<br />
– A statistic is any function of the observations<br />
in a random sample.<br />
∗ Example:<br />
The mean ¯X is a function of the observations<br />
(specifically, a linear combination<br />
of the observations).<br />
¯X =<br />
∑ ni=1<br />
X i<br />
n<br />
= 1 n X 1+ 1 n X 2+· · ·+ 1 n X n<br />
– A statistic is a random variable,<br />
and it has a probability distribution<br />
– The distribution of a statistic is called the<br />
sampling distribution of the statistic<br />
because is depends on the sample chosen.<br />
16
– The sampling distribution of the mean<br />
is very important.<br />
What is the expected value of the sample<br />
mean ¯X in a random sample<br />
E( ¯X) = E( 1 n X 1 + 1 n X 2 + · · · + 1 n X n)<br />
= 1 ∑<br />
E(Xi )<br />
n<br />
= 1 ∑ nµ µ =<br />
n n = µ = µ ¯X<br />
Notation: E( ¯X) = µ ¯X = µ<br />
where µ is the population mean.<br />
(µ is also the expected value<br />
of a single X i )<br />
17
What is the variance of the sample mean<br />
¯X in a random sample<br />
V ( ¯X) = V ( 1 n X 1 + 1 n X 2 + · · · + 1 n X n)<br />
=<br />
=<br />
=<br />
( ) 1 2 ∑<br />
V (Xi )<br />
n<br />
( ) 1 2 ∑<br />
σ<br />
2<br />
n<br />
( 1<br />
n) 2<br />
nσ 2 = σ2<br />
n<br />
Notation: V ( ¯X) = σ 2¯X = σ2<br />
n<br />
where σ 2 is the population variance.<br />
(σ 2 is also the variance of a single X i )<br />
18
As we have described earlier, for n ≥ 30<br />
¯X ∼ N(µ, σ2<br />
n )<br />
(and this is also true for n < 30 if each X i comes<br />
from a normal population).<br />
Using this fact, and what we know about standardizing<br />
variables, leads to...<br />
• The Central Limit Theorem<br />
If X 1 , X 2 , . . . , X n is a random sample of size<br />
n taken from a population with mean µ and<br />
variance σ 2 , the limiting form of the distribution<br />
of<br />
Z = ¯X − µ<br />
σ/ √ n<br />
as n → ∞ is the standard normal distribution,<br />
or N(0, 1).<br />
19
The approximation of<br />
¯X − µ<br />
σ/ √ n<br />
∼ N(0, 1)<br />
depends on the size of n.<br />
Satisfactory approximation for n ≥ 30 for<br />
any population.<br />
Satisfactory approximation for n < 30 for<br />
near normal populations.<br />
————————————————————<br />
The next graphic shows 3 different original populations<br />
(one nearly normal, two that are not),<br />
and the sampling distribution for ¯X based on a<br />
sample of size n = 5 and size n = 30.<br />
20
The three original distributions are on the far<br />
left (one that is nearly symmetric and bell-shaped,<br />
one that is right skewed, and one that is highly<br />
right skewed).<br />
As shown in: Navidi, W. ‘Statistics for Engineers and Scientists’, McGraw Hill, 2006<br />
21
Things to notice from the previous graphic:<br />
• The variability of ¯X decreases as n increases<br />
Recall: V ( ¯X) = σ2<br />
n .<br />
• If the original population has a shape that’s<br />
closer to normal, smaller n is sufficient for ¯X<br />
to be normal.<br />
• The normal approximation gets better with<br />
larger n when you’re starting with a nonnormal<br />
population.<br />
• Even when X has a very non-normal distribution,<br />
¯X still has a normal distribution with<br />
a large enough n.<br />
22
• Example: Flaws in a copper wire.<br />
Let X denote the number of flaws in a 1 inch<br />
length of copper wire. The probability mass<br />
function of X is presented in the following table:<br />
x P (X = x)<br />
0 0.48<br />
1 0.39<br />
2 0.12<br />
3 0.01<br />
Suppose n = 100 wires are sampled from this<br />
population. What is the probability that the<br />
average number of flaws per wire in the sample<br />
is less than 0.5<br />
23
ANS:<br />
P ( ¯X < 0.5) =<br />
24
Differences in sample means ¯X 1 and ¯X 2<br />
What if we’re interested in estimating the difference<br />
in means between two populations<br />
Value of interest: µ 1 − µ 2<br />
Estimator: ¯X1 − ¯X 2<br />
0.00 0.05 0.10 0.15 0.20<br />
Pop'n 1 Pop'n 2<br />
−5 0 5 10 15 20 25 30<br />
Y<br />
The above picture shows two populations with<br />
different means,<br />
µ 1 − µ 2 ≠ 0.<br />
25
0.00 0.05 0.10 0.15 0.20<br />
Pop'n 1 Pop'n 2<br />
−5 0 5 10 15 20 25 30<br />
Y<br />
If the populations had the same mean, then the<br />
two distributions would be on top of each other<br />
(no distinction), and µ 1 − µ 2 = 0.<br />
We want to know the behavior of our estimator<br />
¯X 1 − ¯X 2 .<br />
So far, we’ve only discussed the behavior of ¯X.<br />
26
The sampling distribution of ¯X1 − ¯X 2 :<br />
We will assume the sample from each group was<br />
taken independent of each other (two independent<br />
samples).<br />
E( ¯X 1 − ¯X 2 ) = E( ¯X 1 ) − E( ¯X 2 )<br />
= µ 1 − µ 2<br />
where µ 1 is the population mean of pop’n 1<br />
where µ 2 is the population mean of pop’n 2<br />
V ( ¯X 1 − ¯X 2 ) = V ( ¯X 1 ) + V ( ¯X 2 )<br />
{since independent}<br />
= σ2 1<br />
n 1<br />
+ σ2 2<br />
n 2<br />
where σ 2 1<br />
where σ 2 2<br />
is the population variance of pop’n 1<br />
is the population variance of pop’n 2<br />
27
⇒<br />
¯X 1 − ¯X 2 is a random variable<br />
with E( ¯X 1 − ¯X 2 ) = µ 1 − µ 2<br />
and V ( ¯X 1 − ¯X 2 ) = σ2 1<br />
n 1<br />
+ σ2 2<br />
n 2<br />
So, we have the expected value and the variance<br />
of this random variable of interest. But we’d like<br />
to know the full distribution of the r.v.<br />
28
IF both original populations were normal, then<br />
¯X 1 and ¯X 2 are linear combinations of normal<br />
random variables, and ¯X 1 − ¯X 2 is also a linear<br />
combination of normals... so<br />
¯X 1 − ¯X 2 ∼ N(µ 1 − µ 2 , σ2 1<br />
n 1<br />
+ σ2 2<br />
n 2<br />
)<br />
Again, we have a random variable of interest<br />
¯X 1 − ¯X 2 that has a normal distribution with<br />
known ‘predictable’ behavior.<br />
————————————————————<br />
What if both original populations were NOT normal<br />
If n 1 and n 2 are both greater than 30, then we<br />
can apply the central limit theorem to show that<br />
¯X 1 − ¯X 2 is again, normally distributed.<br />
29
• Approximate Sampling Distribution<br />
for ¯X 1 − ¯X 2<br />
If we have two independent populations with<br />
means µ 1 and µ 2 and variances σ 2 1 and σ2 2 ,<br />
and if ¯X1 and ¯X 2 are sample means of two<br />
independent random samples of size n 1 and<br />
n 2 from the two populations, then the sampling<br />
distribution of<br />
Z = ( ¯X 1 − ¯X 2 ) − (µ 1 − µ 2 )<br />
√<br />
σ 2 1<br />
n 1<br />
+ σ2 2<br />
n 2<br />
is approximately standard normal (if the conditions<br />
of the central limit theorem apply).<br />
If the original populations were normal to begin<br />
with, then Z is exactly a standard normal.<br />
30
• Example: Difference in means<br />
A random sample of n 1 =20 observations are<br />
taken from a normal population with mean<br />
30. A random sample of n 2 =25 observations<br />
are taken from a different normal population<br />
with mean 27. Both populations have<br />
σ 2 = 8.<br />
What is the probability that ¯X1 − ¯X 2 exceeds<br />
5<br />
31
• Example: Picture tube brightness (problem<br />
7-14 p.231)<br />
A consumer electronics company is comparing<br />
the brightness of two different types of<br />
picture tubes. Type A is the present model,<br />
and is thought to have a population mean<br />
brightness of 100 and a known standard deviation<br />
of 16. Type B has an unknown mean<br />
brightness and standard deviation equal to<br />
type A.<br />
If µ B exceeds µ A , the manufacturer would<br />
like to adopt type B for use.<br />
A random sample of 25 is taken from each<br />
type...<br />
32
The observed difference in sample means is<br />
¯x B − ¯x A = 6.75<br />
(so, the sample mean brightness for type B<br />
was higher than the sample mean for type A,<br />
but is it high enough).<br />
What decision should they make<br />
33