09.01.2015 Views

Ch. 9 pt 1

Ch. 9 pt 1

Ch. 9 pt 1

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Ch</strong>a<strong>pt</strong>er 9: TESTS OF HYPOTHESES<br />

FOR A SINGLE SAMPLE<br />

Part 1: Intro to Hypothesis Testing<br />

Sections 9-1, 9-2, 9-3<br />

Statistical Inference<br />

We infer something about the population as a<br />

whole from the information in a sample.<br />

Sample<br />

Population<br />

- Point estimation ̌<br />

- Confidence intervals ̌<br />

- Hypothesis testing (introduced in cha<strong>pt</strong>er 9)<br />

1


Hypothesis Testing<br />

Sections 9-1, 9-2, 9-3<br />

We’ll start with an illustration...<br />

• Example: Reduction of car emissions<br />

A certain automobile engine emits 100 mg<br />

of nitrogen oxides per second on average. A<br />

modification to the engine has been proposed<br />

that may reduce the emissions.<br />

The new design will be put into production<br />

IF it can be demonstrated that its mean emission<br />

rate is less than 100 mg/s.<br />

To make a decision, a random sample of<br />

n = 50 modified engines is taken and<br />

emission measurements are recorded.<br />

2


The sample mean is ¯x = 92 mg/s and the<br />

sample standard deviation is s = 21 mg/s.<br />

A normal probability plot suggests emissions<br />

follow a normal distribution.<br />

Isn’t 92 far enough below 100 for us to say<br />

the modified engine is better<br />

Is there enough evidence to completely change<br />

the manufacturing line and switch which<br />

engine is produced<br />

3


STATISTICAL QUESTION:<br />

Could we have gotten this low of a sample<br />

mean emission ¯x even if the modified engine<br />

WASN’T any better than the first (i.e. it’s<br />

population mean was actually 100)<br />

Could we have grabbed a sample that happened<br />

to have many low emission values eventhough<br />

the population mean was 100<br />

To make a decision on the engines, we want<br />

to quantify the above question with a probability:<br />

“Given that the true population mean emission<br />

is 100 mg/s, what is the probability<br />

of observing an emissions ¯x this low or<br />

lower<br />

4


Recall from the last cha<strong>pt</strong>er:<br />

If we assume µ = 100 and n large, we have<br />

¯X ∼ N(100, σ2<br />

n ).<br />

This is a known behavior of the sample mean.<br />

Probability of interest:<br />

Given µ = 100 (engine not any better),<br />

P ( ¯X ≤ 92) = <br />

Since σ 2 is unknown in this case, we have<br />

T = ¯X − µ<br />

S/ √ n ∼ t n−1<br />

where S is the sample standard deviation<br />

and T has a t distribution with n−1 degrees<br />

of freedom (and n = 50 in this example).<br />

5


P ( ¯X ≤ 92) = P<br />

( ¯X − µ<br />

S/ √ n<br />

)<br />

92 − 100<br />

≤<br />

21/ √ 50<br />

= P (T ≤ −2.69)<br />

because T ∼ t 49<br />

t with 49 df<br />

t(49) density<br />

−3 −2 −1 0 1 2 3<br />

T<br />

= 0.0049<br />

6


NOT VERY LIKELY...<br />

The probability of observing an emissions ¯x<br />

this low or lower, given that the true population<br />

mean is 100 mg/s is<br />

0.0049<br />

This suggests that our initial assum<strong>pt</strong>ion in<br />

the calculation, that the true mean was 100,<br />

is perhaps incorrect.<br />

For this reason, we reject the assum<strong>pt</strong>ion of<br />

µ = 100 in favor of the ‘alternative’, that<br />

the true mean emissions IS LESS THAN 100<br />

mg/s.<br />

We don’t know FOR SURE, but there’s strong<br />

evidence against someone saying that the mean<br />

of the modified engine is 100 mg/s.<br />

7


If it was 100 mg/s, we would very rarely see<br />

an ¯x this low (could happen, but not likely).<br />

What’s unlikely enough to actually reject<br />

the initial assum<strong>pt</strong>ion (that the two engine<br />

models were equal)<br />

There’s some opinion here, but we often use<br />

0.05 as a threshold. Anything less than this<br />

is considered rather unlikely.<br />

————————————————————<br />

We have essentially just performed a hypothesis<br />

test, now we will formalize the procedure...<br />

8


• General set-up for testing a<br />

hypothesis for µ<br />

1. State your null H 0 and alternative H 1<br />

hypotheses.<br />

(The null is what we assume to be true.)<br />

H 0 : µ = µ 0<br />

(The subscri<strong>pt</strong> on µ 0 is used to emphasize<br />

that this value is the assumed mean under<br />

the null hypothesis being true.)<br />

There are 3 choices for the alternative,<br />

either...<br />

* H 1 : µ ≠ µ 0 (two-sided alternative)<br />

* H 1 : µ < µ 0 (one-sided alternative)<br />

* H 1 : µ > µ 0 (one-sided alternative)<br />

9


2. Calculate the test statistic (either a Z or T ).<br />

(In this example, the test statistic was a<br />

T , we’ll make a conclusion based on this.)<br />

3. Compute the probability of observing a test<br />

statistic this extreme, or more extreme,<br />

under the null being true.<br />

(This probability is called a p-value.)<br />

4. State your conclusion with respect to the<br />

problem:<br />

Either...<br />

‘Reject the null’<br />

or<br />

‘Fail to reject the null’.<br />

5. Be sure to verify any assum<strong>pt</strong>ions that were<br />

needed.<br />

(This is usually a normal probability plot<br />

for verifying normality which is needed to<br />

have T ∼ t n−1 ).<br />

10


• Example: Formalizing the emissions<br />

hypothesis test<br />

1. State your null H 0 and alternative H 1<br />

hypotheses.<br />

H 0 : µ = 100<br />

H 1 : µ < 100<br />

(this is a one-sided<br />

hypothesis test with<br />

µ 0 = 100)<br />

2. Calculate the observed test statistic.<br />

t 0 = ¯x − µ 0<br />

s/ √ n<br />

=<br />

92 − 100<br />

21/ √ 50 = −2.69<br />

(The subscri<strong>pt</strong> on t 0 is used to emphasize<br />

the fact that we’re assuming the mean to<br />

be µ 0 .)<br />

11


3. Compute the probability of observing a<br />

test statistic this extreme, or more extreme,<br />

under the null being true (i.e. compute the<br />

p-value).<br />

Under H 0 true, T 0 = ¯X−µ 0<br />

S/ √ n ∼ t 49, and<br />

P (T 0 ≤ −2.69) = 0.0049<br />

t with 49 df<br />

t(49) density<br />

−3 −2 −1 0 1 2 3<br />

Thus, because this is a one-sided hypothesis<br />

test, the p-value=0.0049.<br />

T<br />

12


p-value=0.0049...<br />

“If the true mean is really µ = 100, then<br />

the probability of observing a sample mean<br />

(from a sample of size n = 50) this far below<br />

100 (or even farther) is only 0.0049.”<br />

4. State your conclusion for the hypothesis<br />

test:<br />

Using 0.05 or<br />

5<br />

100<br />

as a threshold for ‘unlikeliness’,<br />

we have<br />

p-value = 0.0049 < 0.05<br />

and we reject the null in favor of the<br />

alternative, which is that µ < 100.<br />

13


5. Be sure to verify any assum<strong>pt</strong>ions that<br />

were needed.<br />

As stated earlier, we checked the normal<br />

probability plot of the emission values and<br />

it was OK, and the needed requirement for<br />

T 0 ∼ t 49 (that the parent population was<br />

normally distributed) was fulfilled.<br />

When we reject H 0 , we say the test was significant.<br />

For this example, we say there was significant<br />

statistical evidence that the modified engine<br />

has a mean emissions lower than 100 mg/s.<br />

So, there was strong evidence that the modified<br />

engine is better.<br />

14


• Why do we use this test statistic T 0 to test<br />

H 0 : µ = µ 0 <br />

T 0 = ¯X − µ 0<br />

S/ √ n<br />

Let’s pick-apart this statistic...<br />

– Under H 0 true, E( ¯X) = µ 0 and the expected<br />

value of the numerator in T 0 is 0,<br />

and the distribution of T 0 is unimodal centered<br />

at zero.<br />

– If ¯X is far from µ0 in either direction, the<br />

numerator in T 0 will be ‘large’(+ or −)<br />

leading to a ‘large’ T 0 , leading to rejection<br />

of H 0 .<br />

A ‘large’ or ‘extreme’ T 0 would not be expected<br />

if H 0 was true (we expect T 0 to<br />

‘bounce-around ’ 0 if H 0 true).<br />

15


– But what is a ‘large’ difference or ¯X −µ 0 <br />

This is where the denominator comes into<br />

play. ‘Large’ is based on our sample size<br />

and the variability in the population σ 2<br />

(which shows up in S).<br />

For one thing, scale matters. A ‘large’ difference<br />

in ¯X −µ 0 on a nanoscale will probably<br />

not be the same as a large difference<br />

in kilometers (S will make this adjustment<br />

here).<br />

We also know that the expected squared<br />

distance of ¯X from µ goes down as n increases.<br />

This also has to be taken into<br />

account for deciding what is ‘large’.<br />

Bottom line... if we observe a realized t 0<br />

value that is in the far tail of the T 0 distribution,<br />

it suggests we should reject H 0 .<br />

16


Some comments on terminology...<br />

• The Null Hypothesis:<br />

– It is what we assume to be true upon entering<br />

the hypothesis test<br />

In many formal arguments, we often assume<br />

something to be true, and then see<br />

if we can contradict this assum<strong>pt</strong>ion<br />

later.<br />

We’re not looking to prove something<br />

here, but we may find that the data were<br />

not very likely to have occurred under the<br />

null being true, which was the assum<strong>pt</strong>ion<br />

we made (in which case we reject the null).<br />

– Often, the null is the less interesting statement<br />

to the researcher.<br />

17


– Innocent until proven guilty.<br />

We’re being cautious, we’re giving the<br />

status-quo the benefit of the doubt.<br />

– The situation is assumed uninteresting<br />

until evidence can show (beyond reasonable<br />

doubt) that something interesting is<br />

going on.<br />

– Symbolized by H 0 .<br />

– It is a statement about a population parameter,<br />

not a statistic.<br />

– Example: the modified engine data,<br />

H 0 : µ = 100<br />

18


• P-value:<br />

– The p-value represents the probability of<br />

obtaining a test statistic as extreme (or<br />

more extreme) in magnitude than the observed<br />

test statistic under H 0 true<br />

– If you perform a two-sided hypothesis test<br />

H 0 : µ = µ 0 vs. H 1 : µ ≠ µ 0 ,<br />

the p-value is the probability in both tails<br />

(example on slide p.23)<br />

– Large test statistic (in absolute value) ⇔<br />

small p-value<br />

– Small p-values are evidence against the<br />

null hypothesis (as are large test statistics)<br />

– When we make a decision to reject H 0 it<br />

is because the p-value is small<br />

19


– A small p-value says we would have been<br />

very unlikely to have gotten a sample with<br />

data like this if H 0 were true<br />

– The p-value is not the probability that H 0<br />

is true<br />

– We use the calculated p-value to make a<br />

conclusion or decision on the hypothesis<br />

test based on a chosen significance level α<br />

(on next slide):<br />

∗ Reject the null hypothesis<br />

∗ Fail to reject the null hypothesis<br />

(i.e. acce<strong>pt</strong> the null hypothesis)<br />

– We do not prove the null hypothesis true,<br />

this is not how things are set-up. We will<br />

assume it to be true right from the start<br />

of the procedure.<br />

20


• The significance level α:<br />

– How low must a p-value be to reject the<br />

null<br />

– We set a threshold that will control our<br />

chance of making a particular mistake.<br />

What mistake<br />

REJECTING H 0 WHEN H 0 IS<br />

ACTUALLY TRUE.<br />

This is called a type I error.<br />

This is often seen as a big mistake.<br />

In the emissions example, the company<br />

would completely re-do their engine manufacturing<br />

set-up if they reject. This would<br />

be a big waste if the modified engine actually<br />

wasn’t any better.<br />

21


– We set the chance of such a mistake to be<br />

α which is often set at 0.05 (though 0.01<br />

and others are also seen).<br />

We simply acce<strong>pt</strong> a 5% chance that we<br />

make a type I error. For most situations,<br />

this chance of a mistake is considered low<br />

enough.<br />

– By only rejecting when the p-value is less<br />

then α we control the type I error at the<br />

α level.<br />

α = P (type I error)<br />

= P (reject H 0 when H 0 is true)<br />

= P (reject H 0 |H 0 is true)<br />

= P (a false positive occuring)<br />

22


• Example: An example where σ 2 is known<br />

or you have very large sample<br />

If σ 2 is known, or you have a very large<br />

sample, the test statistic will be the<br />

Z test statistic, instead of the T .<br />

An inspector measured the full volume of a<br />

simple random sample of n = 100 cans of<br />

juice that were labeled as containing 12 oz.<br />

The sample had a mean volumed 11.98 oz<br />

and a standard deviation of 0.19 oz.<br />

Let µ represent the mean fill volume for all<br />

cans of juice recently filled by the machine.<br />

Perform a hypothesis test that µ = 12 versus<br />

µ ≠ 12 at the α = 0.05 significance level.<br />

23


ANS:<br />

24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!