06.09.2021 Views

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

observations that fall <strong>with</strong>in the i-th category (where i could be 1, 2, 3 or 4). Finally, if we want to refer<br />

to the set of all observed frequencies, statisticians group all of observed values into a vector, which I’ll<br />

refer to as O.<br />

O “pO 1 ,O 2 ,O 3 ,O 4 q<br />

Again, there’s nothing new or interesting here: it’s just notation. If I say that O “ p35, 51, 64, 50q all<br />

I’m doing is describing the table of observed frequencies (i.e., observed), but I’m referring to it using<br />

mathematical notation, rather than by referring to an R variable.<br />

12.1.2 The null hypothesis <strong>and</strong> the alternative hypothesis<br />

As the last section indicated, our research hypothesis is that “people don’t choose cards r<strong>and</strong>omly”.<br />

What we’re going to want to do now is translate this into some statistical hypotheses, <strong>and</strong> construct a<br />

statistical test of those hypotheses. The test that I’m going to describe to you is Pearson’s χ 2 goodness<br />

of fit test, <strong>and</strong> as is so often the case, we have to begin by carefully constructing our null hypothesis.<br />

In this case, it’s pretty easy. First, let’s state the null hypothesis in words:<br />

H 0 :<br />

All four suits are chosen <strong>with</strong> equal probability<br />

Now, because this is statistics, we have to be able to say the same thing in a mathematical way. To do<br />

this, let’s use the notation P j to refer to the true probability that the j-th suit is chosen. If the null<br />

hypothesis is true, then each of the four suits has a 25% chance of being selected: in <strong>other</strong> words, our<br />

null hypothesis claims that P 1 “ .25, P 2 “ .25, P 3 “ .25 <strong>and</strong> finally that P 4 “ .25. However, in the same<br />

way that we can group our observed frequencies into a vector O that summarises the entire data set,<br />

we can use P to refer to the probabilities that correspond to our null hypothesis. So if I let the vector<br />

P “pP 1 ,P 2 ,P 3 ,P 4 q refer to the collection of probabilities that describe our null hypothesis, then we have<br />

H 0 :<br />

P “p.25,.25,.25,.25q<br />

In this particular instance, our null hypothesis corresponds to a vector of probabilities P in which all<br />

of the probabilities are equal to one an<strong>other</strong>. But this doesn’t have to be the case. For instance, if the<br />

experimental task was <strong>for</strong> people to imagine they were drawing from a deck that had twice as many clubs<br />

as any <strong>other</strong> suit, then the null hypothesis would correspond to something like P “p.4,.2,.2,.2q. As<br />

long as the probabilities are all positive numbers, <strong>and</strong> they all sum to 1, them it’s a perfectly legitimate<br />

choice <strong>for</strong> the null hypothesis. However, the most common use of the goodness of fit test is to test a null<br />

hypothesis that all of the categories are equally likely, so we’ll stick to that <strong>for</strong> our example.<br />

What about our alternative hypothesis, H 1 ? All we’re really interested in is demonstrating that the<br />

probabilities involved aren’t all identical (that is, people’s choices weren’t completely r<strong>and</strong>om). As a<br />

consequence, the “human friendly” versions of our hypotheses look like this:<br />

H 0 : All four suits are chosen <strong>with</strong> equal probability<br />

H 1 : At least one of the suit-choice probabilities isn’t .25<br />

<strong>and</strong> the “mathematician friendly” version is<br />

H 0 : P “p.25,.25,.25,.25q<br />

H 1 : P ‰p.25,.25,.25,.25q<br />

Conveniently, the mathematical version of the hypotheses looks quite similar to an R comm<strong>and</strong> defining<br />

avector. SomaybewhatIshoulddoisstoretheP vector in R as well, since we’re almost certainly going<br />

to need it later. And because I’m Mister Imaginative, I’ll call this R vector probabilities,<br />

> probabilities probabilities<br />

- 353 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!