06.09.2021 Views

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

null hypothesis<br />

σ=??<br />

μ=μ 0<br />

alternative hypothesis<br />

σ=??<br />

μ≠μ 0<br />

value of X<br />

value of X<br />

Figure 13.4: Graphical illustration of the null <strong>and</strong> alternative hypotheses assumed by the (two sided) one<br />

sample t-test. Note the similarity to the z-test (Figure 13.2). The null hypothesis is that the population<br />

mean μ is equal to some specified value μ 0 ,<strong>and</strong>thealternativehypothesisisthatitisnot. Likethez-test,<br />

we assume that the data are normally distributed; but we do not assume that the population st<strong>and</strong>ard<br />

deviation σ is known in advance.<br />

.......................................................................................................<br />

entertaining the hypothesis that they don’t have the same mean, then why should I believe that they<br />

absolutely have the same st<strong>and</strong>ard deviation? In view of this, I should really stop assuming that I know<br />

thetruevalueofσ. This violates the assumptions of my z-test, so in one sense I’m back to square one.<br />

However, it’s not like I’m completely bereft of options. After all, I’ve still got my raw data, <strong>and</strong> those<br />

raw data give me an estimate of the population st<strong>and</strong>ard deviation:<br />

> sd( grades )<br />

[1] 9.520615<br />

In <strong>other</strong> words, while I can’t say that I know that σ “ 9.5, I can say that ˆσ “ 9.52.<br />

Okay, cool. The obvious thing that you might think to do is run a z-test, but using the estimated<br />

st<strong>and</strong>ard deviation of 9.52 instead of relying on my assumption that the true st<strong>and</strong>ard deviation is 9.5. So,<br />

we could just type this new number into R <strong>and</strong> out would come the answer. And you probably wouldn’t<br />

be surprised to hear that this would still give us a significant result. This approach is close, but it’s not<br />

quite correct. Because we are now relying on an estimate of the population st<strong>and</strong>ard deviation, we need<br />

to make some adjustment <strong>for</strong> the fact that we have some uncertainty about what the true population<br />

st<strong>and</strong>ard deviation actually is. Maybe our data are just a fluke . . . maybe the true population st<strong>and</strong>ard<br />

deviation is 11, <strong>for</strong> instance. But if that were actually true, <strong>and</strong> we ran the z-test assuming σ “ 11,<br />

then the result would end up being non-significant. That’s a problem, <strong>and</strong> it’s one we’re going to have<br />

to address.<br />

13.2.1 Introducing the t-test<br />

This ambiguity is annoying, <strong>and</strong> it was resolved in 1908 by a guy called William Sealy Gosset (Student,<br />

1908), who was working as a chemist <strong>for</strong> the Guinness brewery at the time (see J. F. Box, 1987). Because<br />

Guinness took a dim view of its employees publishing statistical analysis (apparently they felt it was a<br />

trade secret), he published the work under the pseudonym “A Student”, <strong>and</strong> to this day, the full name<br />

of the t-test is actually Student’s t-test. The key thing that Gosset figured out is how we should<br />

- 386 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!