Ch. 9 pt 1
Ch. 9 pt 1
Ch. 9 pt 1
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Ch</strong>a<strong>pt</strong>er 9: TESTS OF HYPOTHESES<br />
FOR A SINGLE SAMPLE<br />
Part 1: Intro to Hypothesis Testing<br />
Sections 9-1, 9-2, 9-3<br />
Statistical Inference<br />
We infer something about the population as a<br />
whole from the information in a sample.<br />
Sample<br />
Population<br />
- Point estimation ̌<br />
- Confidence intervals ̌<br />
- Hypothesis testing (introduced in cha<strong>pt</strong>er 9)<br />
1
Hypothesis Testing<br />
Sections 9-1, 9-2, 9-3<br />
We’ll start with an illustration...<br />
• Example: Reduction of car emissions<br />
A certain automobile engine emits 100 mg<br />
of nitrogen oxides per second on average. A<br />
modification to the engine has been proposed<br />
that may reduce the emissions.<br />
The new design will be put into production<br />
IF it can be demonstrated that its mean emission<br />
rate is less than 100 mg/s.<br />
To make a decision, a random sample of<br />
n = 50 modified engines is taken and<br />
emission measurements are recorded.<br />
2
The sample mean is ¯x = 92 mg/s and the<br />
sample standard deviation is s = 21 mg/s.<br />
A normal probability plot suggests emissions<br />
follow a normal distribution.<br />
Isn’t 92 far enough below 100 for us to say<br />
the modified engine is better<br />
Is there enough evidence to completely change<br />
the manufacturing line and switch which<br />
engine is produced<br />
3
STATISTICAL QUESTION:<br />
Could we have gotten this low of a sample<br />
mean emission ¯x even if the modified engine<br />
WASN’T any better than the first (i.e. it’s<br />
population mean was actually 100)<br />
Could we have grabbed a sample that happened<br />
to have many low emission values eventhough<br />
the population mean was 100<br />
To make a decision on the engines, we want<br />
to quantify the above question with a probability:<br />
“Given that the true population mean emission<br />
is 100 mg/s, what is the probability<br />
of observing an emissions ¯x this low or<br />
lower<br />
4
Recall from the last cha<strong>pt</strong>er:<br />
If we assume µ = 100 and n large, we have<br />
¯X ∼ N(100, σ2<br />
n ).<br />
This is a known behavior of the sample mean.<br />
Probability of interest:<br />
Given µ = 100 (engine not any better),<br />
P ( ¯X ≤ 92) = <br />
Since σ 2 is unknown in this case, we have<br />
T = ¯X − µ<br />
S/ √ n ∼ t n−1<br />
where S is the sample standard deviation<br />
and T has a t distribution with n−1 degrees<br />
of freedom (and n = 50 in this example).<br />
5
P ( ¯X ≤ 92) = P<br />
( ¯X − µ<br />
S/ √ n<br />
)<br />
92 − 100<br />
≤<br />
21/ √ 50<br />
= P (T ≤ −2.69)<br />
because T ∼ t 49<br />
t with 49 df<br />
t(49) density<br />
−3 −2 −1 0 1 2 3<br />
T<br />
= 0.0049<br />
6
NOT VERY LIKELY...<br />
The probability of observing an emissions ¯x<br />
this low or lower, given that the true population<br />
mean is 100 mg/s is<br />
0.0049<br />
This suggests that our initial assum<strong>pt</strong>ion in<br />
the calculation, that the true mean was 100,<br />
is perhaps incorrect.<br />
For this reason, we reject the assum<strong>pt</strong>ion of<br />
µ = 100 in favor of the ‘alternative’, that<br />
the true mean emissions IS LESS THAN 100<br />
mg/s.<br />
We don’t know FOR SURE, but there’s strong<br />
evidence against someone saying that the mean<br />
of the modified engine is 100 mg/s.<br />
7
If it was 100 mg/s, we would very rarely see<br />
an ¯x this low (could happen, but not likely).<br />
What’s unlikely enough to actually reject<br />
the initial assum<strong>pt</strong>ion (that the two engine<br />
models were equal)<br />
There’s some opinion here, but we often use<br />
0.05 as a threshold. Anything less than this<br />
is considered rather unlikely.<br />
————————————————————<br />
We have essentially just performed a hypothesis<br />
test, now we will formalize the procedure...<br />
8
• General set-up for testing a<br />
hypothesis for µ<br />
1. State your null H 0 and alternative H 1<br />
hypotheses.<br />
(The null is what we assume to be true.)<br />
H 0 : µ = µ 0<br />
(The subscri<strong>pt</strong> on µ 0 is used to emphasize<br />
that this value is the assumed mean under<br />
the null hypothesis being true.)<br />
There are 3 choices for the alternative,<br />
either...<br />
* H 1 : µ ≠ µ 0 (two-sided alternative)<br />
* H 1 : µ < µ 0 (one-sided alternative)<br />
* H 1 : µ > µ 0 (one-sided alternative)<br />
9
2. Calculate the test statistic (either a Z or T ).<br />
(In this example, the test statistic was a<br />
T , we’ll make a conclusion based on this.)<br />
3. Compute the probability of observing a test<br />
statistic this extreme, or more extreme,<br />
under the null being true.<br />
(This probability is called a p-value.)<br />
4. State your conclusion with respect to the<br />
problem:<br />
Either...<br />
‘Reject the null’<br />
or<br />
‘Fail to reject the null’.<br />
5. Be sure to verify any assum<strong>pt</strong>ions that were<br />
needed.<br />
(This is usually a normal probability plot<br />
for verifying normality which is needed to<br />
have T ∼ t n−1 ).<br />
10
• Example: Formalizing the emissions<br />
hypothesis test<br />
1. State your null H 0 and alternative H 1<br />
hypotheses.<br />
H 0 : µ = 100<br />
H 1 : µ < 100<br />
(this is a one-sided<br />
hypothesis test with<br />
µ 0 = 100)<br />
2. Calculate the observed test statistic.<br />
t 0 = ¯x − µ 0<br />
s/ √ n<br />
=<br />
92 − 100<br />
21/ √ 50 = −2.69<br />
(The subscri<strong>pt</strong> on t 0 is used to emphasize<br />
the fact that we’re assuming the mean to<br />
be µ 0 .)<br />
11
3. Compute the probability of observing a<br />
test statistic this extreme, or more extreme,<br />
under the null being true (i.e. compute the<br />
p-value).<br />
Under H 0 true, T 0 = ¯X−µ 0<br />
S/ √ n ∼ t 49, and<br />
P (T 0 ≤ −2.69) = 0.0049<br />
t with 49 df<br />
t(49) density<br />
−3 −2 −1 0 1 2 3<br />
Thus, because this is a one-sided hypothesis<br />
test, the p-value=0.0049.<br />
T<br />
12
p-value=0.0049...<br />
“If the true mean is really µ = 100, then<br />
the probability of observing a sample mean<br />
(from a sample of size n = 50) this far below<br />
100 (or even farther) is only 0.0049.”<br />
4. State your conclusion for the hypothesis<br />
test:<br />
Using 0.05 or<br />
5<br />
100<br />
as a threshold for ‘unlikeliness’,<br />
we have<br />
p-value = 0.0049 < 0.05<br />
and we reject the null in favor of the<br />
alternative, which is that µ < 100.<br />
13
5. Be sure to verify any assum<strong>pt</strong>ions that<br />
were needed.<br />
As stated earlier, we checked the normal<br />
probability plot of the emission values and<br />
it was OK, and the needed requirement for<br />
T 0 ∼ t 49 (that the parent population was<br />
normally distributed) was fulfilled.<br />
When we reject H 0 , we say the test was significant.<br />
For this example, we say there was significant<br />
statistical evidence that the modified engine<br />
has a mean emissions lower than 100 mg/s.<br />
So, there was strong evidence that the modified<br />
engine is better.<br />
14
• Why do we use this test statistic T 0 to test<br />
H 0 : µ = µ 0 <br />
T 0 = ¯X − µ 0<br />
S/ √ n<br />
Let’s pick-apart this statistic...<br />
– Under H 0 true, E( ¯X) = µ 0 and the expected<br />
value of the numerator in T 0 is 0,<br />
and the distribution of T 0 is unimodal centered<br />
at zero.<br />
– If ¯X is far from µ0 in either direction, the<br />
numerator in T 0 will be ‘large’(+ or −)<br />
leading to a ‘large’ T 0 , leading to rejection<br />
of H 0 .<br />
A ‘large’ or ‘extreme’ T 0 would not be expected<br />
if H 0 was true (we expect T 0 to<br />
‘bounce-around ’ 0 if H 0 true).<br />
15
– But what is a ‘large’ difference or ¯X −µ 0 <br />
This is where the denominator comes into<br />
play. ‘Large’ is based on our sample size<br />
and the variability in the population σ 2<br />
(which shows up in S).<br />
For one thing, scale matters. A ‘large’ difference<br />
in ¯X −µ 0 on a nanoscale will probably<br />
not be the same as a large difference<br />
in kilometers (S will make this adjustment<br />
here).<br />
We also know that the expected squared<br />
distance of ¯X from µ goes down as n increases.<br />
This also has to be taken into<br />
account for deciding what is ‘large’.<br />
Bottom line... if we observe a realized t 0<br />
value that is in the far tail of the T 0 distribution,<br />
it suggests we should reject H 0 .<br />
16
Some comments on terminology...<br />
• The Null Hypothesis:<br />
– It is what we assume to be true upon entering<br />
the hypothesis test<br />
In many formal arguments, we often assume<br />
something to be true, and then see<br />
if we can contradict this assum<strong>pt</strong>ion<br />
later.<br />
We’re not looking to prove something<br />
here, but we may find that the data were<br />
not very likely to have occurred under the<br />
null being true, which was the assum<strong>pt</strong>ion<br />
we made (in which case we reject the null).<br />
– Often, the null is the less interesting statement<br />
to the researcher.<br />
17
– Innocent until proven guilty.<br />
We’re being cautious, we’re giving the<br />
status-quo the benefit of the doubt.<br />
– The situation is assumed uninteresting<br />
until evidence can show (beyond reasonable<br />
doubt) that something interesting is<br />
going on.<br />
– Symbolized by H 0 .<br />
– It is a statement about a population parameter,<br />
not a statistic.<br />
– Example: the modified engine data,<br />
H 0 : µ = 100<br />
18
• P-value:<br />
– The p-value represents the probability of<br />
obtaining a test statistic as extreme (or<br />
more extreme) in magnitude than the observed<br />
test statistic under H 0 true<br />
– If you perform a two-sided hypothesis test<br />
H 0 : µ = µ 0 vs. H 1 : µ ≠ µ 0 ,<br />
the p-value is the probability in both tails<br />
(example on slide p.23)<br />
– Large test statistic (in absolute value) ⇔<br />
small p-value<br />
– Small p-values are evidence against the<br />
null hypothesis (as are large test statistics)<br />
– When we make a decision to reject H 0 it<br />
is because the p-value is small<br />
19
– A small p-value says we would have been<br />
very unlikely to have gotten a sample with<br />
data like this if H 0 were true<br />
– The p-value is not the probability that H 0<br />
is true<br />
– We use the calculated p-value to make a<br />
conclusion or decision on the hypothesis<br />
test based on a chosen significance level α<br />
(on next slide):<br />
∗ Reject the null hypothesis<br />
∗ Fail to reject the null hypothesis<br />
(i.e. acce<strong>pt</strong> the null hypothesis)<br />
– We do not prove the null hypothesis true,<br />
this is not how things are set-up. We will<br />
assume it to be true right from the start<br />
of the procedure.<br />
20
• The significance level α:<br />
– How low must a p-value be to reject the<br />
null<br />
– We set a threshold that will control our<br />
chance of making a particular mistake.<br />
What mistake<br />
REJECTING H 0 WHEN H 0 IS<br />
ACTUALLY TRUE.<br />
This is called a type I error.<br />
This is often seen as a big mistake.<br />
In the emissions example, the company<br />
would completely re-do their engine manufacturing<br />
set-up if they reject. This would<br />
be a big waste if the modified engine actually<br />
wasn’t any better.<br />
21
– We set the chance of such a mistake to be<br />
α which is often set at 0.05 (though 0.01<br />
and others are also seen).<br />
We simply acce<strong>pt</strong> a 5% chance that we<br />
make a type I error. For most situations,<br />
this chance of a mistake is considered low<br />
enough.<br />
– By only rejecting when the p-value is less<br />
then α we control the type I error at the<br />
α level.<br />
α = P (type I error)<br />
= P (reject H 0 when H 0 is true)<br />
= P (reject H 0 |H 0 is true)<br />
= P (a false positive occuring)<br />
22
• Example: An example where σ 2 is known<br />
or you have very large sample<br />
If σ 2 is known, or you have a very large<br />
sample, the test statistic will be the<br />
Z test statistic, instead of the T .<br />
An inspector measured the full volume of a<br />
simple random sample of n = 100 cans of<br />
juice that were labeled as containing 12 oz.<br />
The sample had a mean volumed 11.98 oz<br />
and a standard deviation of 0.19 oz.<br />
Let µ represent the mean fill volume for all<br />
cans of juice recently filled by the machine.<br />
Perform a hypothesis test that µ = 12 versus<br />
µ ≠ 12 at the α = 0.05 significance level.<br />
23
ANS:<br />
24