06.09.2021 Views

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

16.6.1 Some data<br />

To make things concrete, let’s suppose that our outcome variable is the grade that a student receives<br />

in my class, a ratio-scale variable corresponding to a mark from 0% to 100%. There are two predictor<br />

variables of interest: whether or not the student turned up to lectures (the attend variable), <strong>and</strong> whether<br />

or not the student actually read the textbook (the reading variable). We’ll say that attend = 1 if the<br />

student attended class, <strong>and</strong> attend = 0 if they did not. Similarly, we’ll say that reading = 1 if the student<br />

read the textbook, <strong>and</strong> reading = 0 if they did not.<br />

Okay, so far that’s simple enough. The next thing we need to do is to wrap some maths around this<br />

(sorry!). For the purposes of this example, let Y p denote the grade of the p-th student in the class. This<br />

is not quite the same notation that we used earlier in this chapter: previously, we’ve used the notation<br />

Y rci to refer to the i-thpersoninther-th group <strong>for</strong> predictor 1 (the row factor) <strong>and</strong> the c-th group<br />

<strong>for</strong> predictor 2 (the column factor). This extended notation was really h<strong>and</strong>y <strong>for</strong> describing how the<br />

SS values are calculated, but it’s a pain in the current context, so I’ll switch notation here. Now, the<br />

Y p notation is visually simpler than Y rci , but it has the shortcoming that it doesn’t actually keep track<br />

of the group memberships! That is, if I told you that Y 0,0,3 “ 35, you’d immediately know that we’re<br />

talking about a student (the 3rd such student, in fact) who didn’t attend the lectures (i.e., attend = 0)<br />

<strong>and</strong> didn’t read the textbook (i.e. reading = 0), <strong>and</strong> who ended up failing the class (grade = 35). But if<br />

I tell you that Y p “ 35 all you know is that the p-th student didn’t get a good grade. We’ve lost some<br />

key in<strong>for</strong>mation here. Of course, it doesn’t take a lot of thought to figure out how to fix this: what we’ll<br />

do instead is introduce two new variables X 1p <strong>and</strong> X 2p that keep track of this in<strong>for</strong>mation. In the case<br />

of our hypothetical student, we know that X 1p “ 0 (i.e., attend = 0) <strong>and</strong>X 2p “ 0 (i.e., reading = 0). So<br />

the data might look like this:<br />

person, p grade, Y p attendance, X 1p reading, X 2p<br />

1 90 1 1<br />

2 87 1 1<br />

3 75 0 1<br />

4 60 1 0<br />

5 35 0 0<br />

6 50 0 0<br />

7 65 1 0<br />

8 70 0 1<br />

This isn’t anything particularly special, of course: it’s exactly the <strong>for</strong>mat in which we expect to see our<br />

data! In <strong>other</strong> words, if your data have been stored as a data frame in R then you’re probably expecting<br />

to see something that looks like the rtfm.1 data frame:<br />

> rtfm.1<br />

grade attend reading<br />

1 90 1 1<br />

2 87 1 1<br />

3 75 0 1<br />

4 60 1 0<br />

5 35 0 0<br />

6 50 0 0<br />

7 65 1 0<br />

8 70 0 1<br />

Well, sort of. I suspect that a few readers are probably frowning a little at this point. Earlier on in the<br />

book I emphasised the importance of converting nominal scale variables such as attend <strong>and</strong> reading to<br />

factors, rather than encoding them as numeric variables. The rtfm.1 data frame doesn’t do this, but the<br />

rtfm.2 data frame does, <strong>and</strong> so you might instead be expecting to see data like this:<br />

- 522 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!