Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

More documents

Recommendations

Info

12. Categorical data analysis Now that we’ve got the basic theory behind hypothesis testing, it’s time to start looking at specific tests that are commonly used in psychology. So where should we start? Not every textbook agrees on where to start, but I’m going to start with “χ 2 tests” (this chapter) and “t-tests” (Chapter 13). Both of these tools are very frequently used in scientific practice, and while they’re not as powerful as “analysis of variance” (Chapter 14) and “regression” (Chapter 15) they’re much easier to understand. The term “categorical data” is just another name for “nominal scale data”. It’s nothing that we haven’t already discussed, it’s just that in the context of data analysis people tend to use the term “categorical data” rather than “nominal scale data”. I don’t know why. In any case, categorical data analysis refers to a collection of tools that you can use when your data are nominal scale. However, there are a lot of different tools that can be used for categorical data analysis, and this chapter only covers a few of the more common ones. 12.1 The χ 2 goodness-of-fit test The χ 2 goodness-of-fit test is one of the oldest hypothesis tests around: it was invented by Karl Pearson around the turn of the century (Pearson, 1900), with some corrections made later by Sir Ronald Fisher (Fisher, 1922a). To introduce the statistical problem that it addresses, let’s start with some psychology... 12.1.1 The cards data Over the years, there have been a lot of studies showing that humans have a lot of difficulties in simulating randomness. Try as we might to “act” random, we think in terms of patterns and structure, and so when asked to “do something at random”, what people actually do is anything but random. As a consequence, the study of human randomness (or non-randomness, as the case may be) opens up a lot of deep psychological questions about how we think about the world. With this in mind, let’s consider a very simple study. Suppose I asked people to imagine a shuffled deck of cards, and mentally pick one card from this imaginary deck “at random”. After they’ve chosen one card, I ask them to mentally select a second one. For both choices, what we’re going to look at is the suit (hearts, clubs, spades or diamonds) that people chose. After asking, say, N “ 200 people to do this, I’d like to look at the data and figure out whether or not the cards that people pretended to select were really random. The data are contained in the randomness.Rdata file, which contains a single data frame called cards. Let’s take a look: - 351 -
Page 2 and 3:
Learning statistics with R: A tutor
Page 4 and 5:
This book is published under a Crea
Page 6 and 7:
Preface Table of Contents I Backgro
Page 8 and 9:
8.6 Summary . . . . . . . . . . . .
Page 10 and 11:
VI Endings, alternatives and prospe
Page 12 and 13:
February 16, 2015 Preface to Versio
Page 14 and 15:
and density is discussed. A detaile
Page 17 and 18:
1. Why do we learn statistics? “T
Page 19 and 20:
And finally, an invalid argument wi
Page 21 and 22:
Admission rate (both genders) 0 20
Page 23 and 24:
experiment, and they don’t get an
Page 25 and 26:
2. A brief introduction to research
Page 27 and 28:
different ways: the length of time
Page 29 and 30:
that when I ask 100 people how they
Page 31 and 32:
Table 2.1: The relationship between
Page 33 and 34:
2.3 Assessing the reliability of a
Page 35 and 36:
2.5.1 Experimental research The key
Page 37 and 38:
things “inside” the study. Let
Page 39 and 40:
they lack face validity, you’ll f
Page 41 and 42:
that age is growing at an incredibl
Page 43 and 44:
then it can be a big problem. 2.7.7
Page 45 and 46:
2.7.10 Placebo effects The placebo
Page 47 and 48:
it doesn’t, which one do you thin
Page 49:
Part II. An introduction to R - 35
Page 52 and 53:
go out into the real world. It’s
Page 54 and 55:
download Rstudio. To understand why
Page 56 and 57:
Most of this text is pretty uninter
Page 58 and 59:
citation() To cite R in publication
Page 60 and 61:
Table 3.1: Basic arithmetic operati
Page 62 and 63:
Division / and Multiplication *, th
Page 64 and 65:
sales * royalty [1] 2450 As far as
Page 66 and 67:
x to the power of 0.5”. Personall
Page 68 and 69:
As you can see, specifying the argu
Page 70 and 71:
Figure 3.3: If you’ve typed the n
Page 72 and 73:
sales.by.month sales.by.month [1]
Page 74 and 75:
profit / days.per.month [1] 0.00000
Page 76 and 77:
not true of R. R is not infinitely
Page 78 and 79:
Table 3.3: Some more logical operat
Page 80 and 81:
In other words, any.sales.this.mont
Page 82 and 83:
and gotten exactly the same result.
Page 84 and 85:
Figure 3.6: The options window in R
Page 86 and 87:
- 72 -
Page 88 and 89:
print( keeper ) [1] 8.539539 So, fr
Page 90 and 91:
to import an SPSS data file into R,
Page 92 and 93:
The output here is telling you that
Page 94 and 95:
Figure 4.3: The Rstudio dialog box
Page 96 and 97:
Figure 4.5: The Rstudio “Environm
Page 98 and 99:
this should be obvious already. But
Page 100 and 101:
Figure 4.6: The “file panel” is
Page 102 and 103:
from this file into my workspace. T
Page 104 and 105:
2 February 28 100 high 3 March 31 2
Page 106 and 107:
Figure 4.9: The Rstudio window for
Page 108 and 109:
4.6.1 Special values The first thin
Page 110 and 111:
x y x * y Error in x * y : non-nu
Page 112 and 113:
4.7.1 Introducing factors Suppose,
Page 114 and 115:
factors in 7.11.2, but for now you
Page 116 and 117:
An alternative method is to use the
Page 118 and 119:
4.11 Generic functions There’s on
Page 120 and 121:
load (base) R Documentation The R D
Page 122 and 123:
Warning Saved R objects are binary
Page 124 and 125:
- 110 -
Page 127 and 128:
5. Descriptive statistics Any time
Page 129 and 130:
mean of these observations is just:
Page 131 and 132:
mean( afl.margins[1:5] ) [1] 36.6 A
Page 133 and 134:
the median rises only to $62,500. I
Page 135 and 136:
Next, let’s calculate means and m
Page 137 and 138:
5.2 Measures of variability The sta
Page 139 and 140:
English: which game value deviation
Page 141 and 142:
and you get the same... no, wait...
Page 143 and 144:
Mean − SD Mean Mean + SD Figure 5
Page 145 and 146:
Negative Skew No Skew Positive Skew
Page 147 and 148:
Platykurtic ("too flat") Mesokurtic
Page 149 and 150:
e useful, let’s try this for a ne
Page 151 and 152:
argument specifies the grouping var
Page 153 and 154:
grumpiness score. 13 If the mean is
Page 155 and 156:
Frequency 0 5 10 15 20 Frequency 0
Page 157 and 158:
My grumpiness 40 50 60 70 80 90 4 6
Page 159 and 160:
following extract illustrates the b
Page 161 and 162:
Y1 4 5 6 7 8 9 10 Y2 3 4 5 6 7 8 9
Page 163 and 164:
Okay, so let’s have a look at our
Page 165 and 166:
pay 0.760 0.720 . 0.137 . 0.196 . d
Page 167 and 168:
5 6.68 9.75 NA 5 6 5.99 5.04 72 6 B
Page 169 and 170:
• Measures of variability. In con
Page 171 and 172:
6. Drawing graphs Above all else sh
Page 173 and 174:
delete the plot and then type a new
Page 175 and 176:
Fibonacci 2 4 6 8 10 12 1 2 3 4 5 6
Page 177 and 178:
high-level functions all rely on th
Page 179 and 180:
type = ’p’ type = ’o’ type
Page 181 and 182:
Fibonacci 2 4 6 8 10 12 1 2 3 4 5 6
Page 183 and 184:
Figure 6.8: Altering the scale and
Page 185 and 186:
y the height of the leftmost bar in
Page 187 and 188:
6.3.1 Visual style of your histogra
Page 189 and 190:
0 20 40 60 80 100 120 (a) (b) Figur
Page 191 and 192:
0 20 40 60 80 100 120 Winning Margi
Page 193 and 194:
0 50 100 150 200 250 300 Winning Ma
Page 195 and 196:
6.5.3 Drawing multiple boxplots One
Page 197 and 198:
Winning Margin 0 50 100 150 1987 19
Page 199 and 200:
The second way do to it is to use a
Page 201 and 202:
cor( x = parenthood ) # calculate c
Page 203 and 204:
0 10 20 30 0 10 20 30 Adelaide Fitz
Page 205 and 206:
value for mar is c(5.1, 4.1, 4.1, 2
Page 207 and 208:
This takes the “active” figure
Page 209 and 210:
7. Pragmatic matters The garden of
Page 211 and 212:
in the command above I didn’t nam
Page 213 and 214:
speaker ee onk oo pip makka-pakka 0
Page 215 and 216:
2 7 3 3 1 3 3 -1 1 -1 BLAH BLAH BLA
Page 217 and 218:
seen it before. In any case, those
Page 219 and 220:
Table 7.2: Two more arithmetic oper
Page 221 and 222:
fact R does provide a function for
Page 223 and 224:
But suppose, on the other hand, tha
Page 225 and 226:
is select several different variabl
Page 227 and 228:
column number. So, if we want to pi
Page 229 and 230:
case.2 upsy-daisy pip 2 case.4 makk
Page 231 and 232:
Note that R will only allow you to
Page 233 and 234:
7.6.2 Sorting a factor You can also
Page 235 and 236:
Two other methods that I want to br
Page 237 and 238:
At this point you should have two q
Page 239 and 240:
(alcohol, caffeine or no drug). Bec
Page 241 and 242:
3 3 female 4.6 643 7.4 226 7.3 412
Page 243 and 244:
discuss melt() and cast() in a fair
Page 245 and 246:
paste( hw, ng, sep = ".", collapse
Page 247 and 248:
Table 7.3: The ordering of various
Page 249 and 250:
Table 7.4: Standard escape characte
Page 251 and 252:
sub( pattern = "a", replacement = "
Page 253 and 254:
Figure 7.1: The booksales2.csv data
Page 255 and 256:
note that if you don’t have the R
Page 257 and 258:
etween the two. However, there are
Page 259 and 260:
a third variable to the cross-tabs
Page 261 and 262:
today print(today) # display the d
Page 263 and 264:
encode numbers in binary, 29 0.1 is
Page 265 and 266:
environment. And so when you type i
Page 267 and 268:
8. Basic programming Machine dreams
Page 269 and 270:
for ages, you figure out some reall
Page 271 and 272:
Figure 8.2: A screenshot showing th
Page 273 and 274:
8.1.5 Differences between scripts a
Page 275 and 276:
to the second value in vector; and
Page 277 and 278:
crunching, we tell R (on line 21) t
Page 279 and 280:
What this does is create a function
Page 281 and 282:
hundreds of other things besides. H
Page 283:
Part IV. Statistical theory - 269 -
Page 286 and 287:
me: I guess I don’t see that. Sur
Page 288 and 289:
- 274 -
Page 290 and 291:
discussion of probability theory is
Page 292 and 293:
In this case 11 of these 20 coin fl
Page 294 and 295:
frequentists do this sometimes. And
Page 296 and 297:
Probability of event 0.0 0.1 0.2 0.
Page 298 and 299:
proceed to roll all 20 dice, what
Page 300 and 301:
(a) Probability 0.00 0.05 0.10 0.15
Page 302 and 303:
In other words, there is a 76.9% ch
Page 304 and 305:
Probability Density 0.0 0.1 0.2 0.3
Page 306 and 307:
Shaded Area = 68.3% Shaded Area = 9
Page 308 and 309:
Probability Density 0.0 0.1 0.2 0.3
Page 310 and 311:
• The χ 2 distribution is anothe
Page 312 and 313:
work rather than rchisq() - the obs
Page 314 and 315: - 300 -
Page 316 and 317: 10.1.1 Defining a population A samp
Page 318 and 319: Figure 10.2: Biased sampling withou
Page 320 and 321: a stratified sampling technique you
Page 322 and 323: 10.2 The law of large numbers In th
Page 324 and 325: Table 10.1: Ten replications of the
Page 326 and 327: Sample Size = 1 Sample Size = 2 Sam
Page 328 and 329: Sample Size = 1 Sample Size = 2 0.0
Page 330 and 331: efer to them. For instance, if true
Page 332 and 333: Average Sample Mean 96 98 100 102 1
Page 334 and 335: 10.5 Estimating a confidence interv
Page 336 and 337: sense idea of what it means to say
Page 338 and 339: Average Attendance 0 10000 30000 19
Page 340 and 341: Here’s how to plot the means and
Page 342 and 343: Let’s suppose that this glorious
Page 344 and 345: Dan’s research hypothesis: “ESP
Page 346 and 347: the type I error rate. As Blackston
Page 348 and 349: Critical Regions for a Two−Sided
Page 350 and 351: Critical Region for a One−Sided T
Page 352 and 353: one based on Neyman’s approach to
Page 354 and 355: alternative hypothesis is a near ce
Page 356 and 357: Sampling Distribution for X if θ=.
Page 358 and 359: Table 11.2: A crude guide to unders
Page 360 and 361: ureaucratic process. It’s not par
Page 362 and 363: (binomial test was non significant)
Page 366 and 367: library( lsr ) > load( "randomness.
Page 368 and 369: clubs diamonds hearts spades 0.25 0
Page 370 and 371: Okay, let’s suppose that the null
Page 372 and 373: The critical value is 7.81 The obse
Page 374 and 375: hearts 64 50 0.25 spades 50 50 0.25
Page 376 and 377: describe it in maths if you like, b
Page 378 and 379: - Futurama, “Fear of a Bot Planet
Page 380 and 381: Next, in much the same way that we
Page 382 and 383: Chi-square test of categorical asso
Page 384 and 385: 12.4 Effect size As we discussed ea
Page 386 and 387: detailed output, showing you the re
Page 388 and 389: chisq.test( salem.tabs ) Pearson’
Page 390 and 391: Next, let’s think about what our
Page 392 and 393: chisq.test( cardChoices ) Pearson
Page 394 and 395: 13.1.1 The inference problem that t
Page 396 and 397: null hypothesis σ=σ 0 μ=μ 0 alt
Page 398 and 399: Let’s also create a variable for
Page 400 and 401: null hypothesis σ=?? μ=μ 0 alter
Page 402 and 403: pnorm(). And so instead of going th
Page 404 and 405: 13.3.1 The data Suppose we have 33
Page 406 and 407: Grade 66 68 70 72 74 76 78 80 Anast
Page 408 and 409: So why not just use these deviation
Page 410 and 411: hypothesis and the alternative hypo
Page 412 and 413: • Homogeneity of variance (also c
Page 414 and 415:
estimated effect size (Cohen’s d)
Page 416 and 417:
Grade 54 56 58 60 Grade for Test 2
Page 418 and 419:
just gone through is that you need
Page 420 and 421:
3 student3 test1 71.7 4 student4 te
Page 422 and 423:
quite similar to the way that mixed
Page 424 and 425:
) Paired samples t-test Variables:
Page 426 and 427:
set up to handle data in long form.
Page 428 and 429:
13.8.2 Cohen’s d from a Student t
Page 430 and 431:
the practical consequences of your
Page 432 and 433:
Skewed Data Normal Q−Q Plot Frequ
Page 434 and 435:
13.10 Testing non-normal data with
Page 436 and 437:
7 29 31 2 8 56 21 -35 9 38 8 -30 10
Page 438 and 439:
- 424 -
Page 440 and 441:
With that as the study design, let
Page 442 and 443:
the ground up and showing you how y
Page 444 and 445:
Between−group variation (i.e., di
Page 446 and 447:
Table 14.1: All of the key quantiti
Page 448 and 449:
statistics: group outcome group mea
Page 450 and 451:
gp.means grand.mean dev.from.gran
Page 452 and 453:
14.3.1 Using the aov() function to
Page 454 and 455:
drug 2 3.45 1.727 18.6 8.6e-05 ***
Page 456 and 457:
By rejecting the null hypothesis, w
Page 458 and 459:
The usual solution to this problem
Page 460 and 461:
As you can see, the biggest p-value
Page 462 and 463:
where median k pY q is the median f
Page 464 and 465:
oneway.test(mood.gain ~ drug, data
Page 466 and 467:
this in mind, let’s follow the sa
Page 468 and 469:
No, really. It’s not just that th
Page 470 and 471:
- 456 -
Page 472 and 473:
My grumpiness (0−100) 40 50 60 70
Page 474 and 475:
Regression Line Close to the Data R
Page 476 and 477:
framework to be able to include mul
Page 478 and 479:
15.3.2 Formula for the general case
Page 480 and 481:
15.4.3 The adjusted R 2 value One f
Page 482 and 483:
I can’t help but notice that the
Page 484 and 485:
15.6 Testing the significance of a
Page 486 and 487:
- correction for multiple testing:
Page 488 and 489:
is that, by converting all the pred
Page 490 and 491:
The first and simplest kind of resi
Page 492 and 493:
High leverage Outcome Predictor Fig
Page 494 and 495:
As a rough guide, Cook’s distance
Page 496 and 497:
Frequency 0 2 4 6 8 10 12 −10 −
Page 498 and 499:
Observed Values 40 50 60 70 80 90 5
Page 500 and 501:
Pearson residuals −10 0 5 10 Pear
Page 502 and 503:
Standardized residuals 0.0 0.5 1.0
Page 504 and 505:
15.10 Model selection One fairly ma
Page 506 and 507:
Okay,sowhatwecanseeisthatremovingth
Page 508 and 509:
might expect from the amount of sle
Page 510 and 511:
- 496 -
Page 512 and 513:
xtabs( ~ drug + therapy, clin.trial
Page 514 and 515:
Now that we have this notation, it
Page 516 and 517:
The last part of the ANOVA table is
Page 518 and 519:
Okay, now let’s calculate the sum
Page 520 and 521:
Why does that happen? The answer li
Page 522 and 523:
Crossover interaction Effect for on
Page 524 and 525:
mood gain 0.0 0.5 1.0 1.5 2.0 CBT n
Page 526 and 527:
the model with interaction has a to
Page 528 and 529:
outcome). Moreover, the sum of all
Page 530 and 531:
Upper 95 Percent Confidence Limits
Page 532 and 533:
16.4.2 Normality of residuals As wi
Page 534 and 535:
MS R1 “ SS R1 df R1 Finally, taki
Page 536 and 537:
16.6.1 Some data To make things con
Page 538 and 539:
for all intents and purposes. 9 Let
Page 540 and 541:
eading 28.000 3.873 7.230 0.00079 *
Page 542 and 543:
value of 43.5 as the expected basel
Page 544 and 545:
druganxifree 0.267 0.148 1.80 0.094
Page 546 and 547:
way I did above. Because you’ve c
Page 548 and 549:
contrasts, in which the intercept t
Page 550 and 551:
contrasts( clin.trial$drug) [,1] [,
Page 552 and 553:
diff lwr upr p adj anxifree-placebo
Page 554 and 555:
papers in psychology without runnin
Page 556 and 557:
the strategy that is the important
Page 558 and 559:
mod anova( mod ) Analysis of Varia
Page 560 and 561:
is meaningless to talk about non-si
Page 562 and 563:
serious flaws: Type I tests are dep
Page 564 and 565:
etaSquared( mod, type=2 ) eta.sq et
Page 566 and 567:
• Understanding the linear model
Page 569 and 570:
17. Bayesian statistics In our reas
Page 571 and 572:
P pd|hq, which you can read as “t
Page 573 and 574:
Umbrella No-umbrella Rainy 0 Dry 0
Page 575 and 576:
Or, to write the same thing in term
Page 577 and 578:
theory is true or it is not, and no
Page 579 and 580:
• Let’s start with option 1. If
Page 581 and 582:
So how bad is it? The answer is sho
Page 583 and 584:
1 robot flower 2 human data 3 human
Page 585 and 586:
Null, independence, a = 1 --- Bayes
Page 587 and 588:
Poisson sampling plan (i.e., nothin
Page 589 and 590:
are, before finally getting to the
Page 591 and 592:
17.6 Bayesian regression Okay, so n
Page 593 and 594:
to know that you can use the head()
Page 595 and 596:
[2] Omit dan.sleep : 1.025401e-26
Page 597 and 598:
By “dividing” the models output
Page 599 and 600:
psychologist, you might want to che
Page 601 and 602:
18. Epilogue “Begin at the beginn
Page 603 and 604:
• Nonlinear regression. When disc
Page 605 and 606:
analysis uncovers a latent variable
Page 607 and 608:
fromthepast. Or,moregenerally,newda
Page 609 and 610:
tool: it is just as applicable to p
Page 611 and 612:
Author’s note - I’ve mentioned
Page 613:
No Starch Press. 268 McGrath, R. E.
show all

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?