06.09.2021 Views

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

means can be organised into the same table as the population means. For our clinical trial data, that<br />

table looks like this:<br />

no therapy CBT total<br />

placebo Ȳ 11 Ȳ 12 Ȳ 1.<br />

anxifree Ȳ 21 Ȳ 22 Ȳ 2.<br />

joyzepam Ȳ 31 Ȳ 32 Ȳ 3.<br />

total Ȳ .1 Ȳ .2 Ȳ ..<br />

And if we look at the sample means that I showed earlier, we have Ȳ11 “ 0.30, Ȳ12 “ 0.60 etc. In our<br />

clinical trial example, the drugs factor has 3 levels <strong>and</strong> the therapy factor has 2 levels, <strong>and</strong> so what we’re<br />

trying to run is a 3 ˆ 2 factorial ANOVA. However, we’ll be a little more general <strong>and</strong> say that Factor A<br />

(the row factor) has R levels <strong>and</strong> Factor B (the column factor) has C levels, <strong>and</strong> so what we’re runnning<br />

here is an R ˆ C factorial ANOVA.<br />

Now that we’ve got our notation straight, we can compute the sum of squares values <strong>for</strong> each of the<br />

two factors in a relatively familiar way. For Factor A, our between group sum of squares is calculated by<br />

assessing the extent to which the (row) marginal means Ȳ1., Ȳ2. etc, are different from the gr<strong>and</strong> mean<br />

Ȳ .. . We do this in the same way that we did <strong>for</strong> one-way ANOVA: calculate the sum of squared difference<br />

between the Ȳi. values <strong>and</strong> the Ȳ.. values. Specifically, if there are N people in each group, then we<br />

calculate this:<br />

Rÿ<br />

SS A “pN ˆ Cq<br />

`Ȳr. ´ Ȳ..˘2<br />

As <strong>with</strong> one-way ANOVA, the most interesting 3 part of this <strong>for</strong>mula is the `Ȳ r. ´ Ȳ..˘2<br />

bit, which corresponds<br />

to the squared deviation associated <strong>with</strong> level r. All that this <strong>for</strong>mula does is calculate this<br />

squared deviation <strong>for</strong> all R levels of the factor, add them up, <strong>and</strong> then multiply the result by N ˆ C.<br />

The reason <strong>for</strong> this last part is that there are multiple cells in our design that have level r on Factor<br />

A: in fact, there are C of them, one corresponding to each possible level of Factor B! For instance, in<br />

our toy example, there are two different cells in the design corresponding to the anxifree drug: one <strong>for</strong><br />

people <strong>with</strong> no.therapy, <strong>and</strong> one <strong>for</strong> the CBT group. Not only that, <strong>with</strong>in each of these cells there are N<br />

observations. So, if we want to convert our SS value into a quantity that calculates the between-groups<br />

sum of squares on a “per observation” basis, we have to multiply by by N ˆ C. The <strong>for</strong>mula <strong>for</strong> factor<br />

B is of course the same thing, just <strong>with</strong> some subscripts shuffled around:<br />

SS B “pN ˆ Rq<br />

r“1<br />

Cÿ<br />

c“1<br />

`Ȳ.c ´ Ȳ..˘2<br />

Now that we have these <strong>for</strong>mulas, we can check them against the R output from the earlier section.<br />

First, notice that we calculated all the marginal means (i.e., row marginal means Ȳr. <strong>and</strong> column<br />

marginal means Ȳ.c) earlier using aggregate(), <strong>and</strong> we also calculated the gr<strong>and</strong> mean. Let’s repeat<br />

those calculations, but this time we’ll save the results to varibles so that we can use them in subsequent<br />

calculations:<br />

> drug.means therapy.means gr<strong>and</strong>.mean

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!