06.09.2021 Views

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

hardly unusual: in my experience, most practitioners express views very similar to Fisher’s. In essence,<br />

the p ă .05 convention is assumed to represent a fairly stringent evidentiary st<strong>and</strong>ard.<br />

Well, how true is that? One way to approach this question is to try to convert p-values to Bayes<br />

factors, <strong>and</strong> see how the two compare. It’s not an easy thing to do because a p-value is a fundamentally<br />

different kind of calculation to a Bayes factor, <strong>and</strong> they don’t measure the same thing. However, there<br />

have been some attempts to work out the relationship between the two, <strong>and</strong> it’s somewhat surprising. For<br />

example, Johnson (2013) presents a pretty compelling case that (<strong>for</strong> t-tests at least) the p ă .05 threshold<br />

corresponds roughly to a Bayes factor of somewhere between 3:1 <strong>and</strong> 5:1 in favour of the alternative. If<br />

that’s right, then Fisher’s claim is a bit of a stretch. Let’s suppose that the null hypothesis is true about<br />

half the time (i.e., the prior probability of H 0 is 0.5), <strong>and</strong> we use those numbers to work out the posterior<br />

probability of the null hypothesis given that it has been rejected at p ă .05. Using the data from Johnson<br />

(2013), we see that if you reject the null at p ă .05, you’ll be correct about 80% of the time. I don’t<br />

know about you, but in my opinion an evidentiary st<strong>and</strong>ard that ensures you’ll be wrong on 20% of your<br />

decisions isn’t good enough. The fact remains that, quite contrary to Fisher’s claim, if you reject at<br />

p ă .05 you shall quite often go astray. It’s not a very stringent evidentiary threshold at all.<br />

17.3.3 The p-value is a lie.<br />

The cake is a lie.<br />

The cake is a lie.<br />

The cake is a lie.<br />

The cake is a lie.<br />

– Portal 10<br />

Okay, at this point you might be thinking that the real problem is not <strong>with</strong> orthodox statistics, just<br />

the p ă .05 st<strong>and</strong>ard. In one sense, that’s true. The recommendation that Johnson (2013) gives is not<br />

that “everyone must be a Bayesian now”. Instead, the suggestion is that it would be wiser to shift the<br />

conventional st<strong>and</strong>ard to something like a p ă .01 level. That’s not an unreasonable view to take, but in<br />

my view the problem is a little more severe than that. In my opinion, there’s a fairly big problem built<br />

into the way most (but not all) orthodox hypothesis tests are constructed. They are grossly naive about<br />

how humans actually do research, <strong>and</strong> because of this most p-values are wrong.<br />

Sounds like an absurd claim, right? Well, consider the following scenario. You’ve come up <strong>with</strong> a<br />

really exciting research hypothesis <strong>and</strong> you design a study to test it. You’re very diligent, so you run<br />

a power analysis to work out what your sample size should be, <strong>and</strong> you run the study. You run your<br />

hypothesis test <strong>and</strong> out pops a p-value of 0.072. Really bloody annoying, right?<br />

What should you do? Here are some possibilities:<br />

1. You conclude that there is no effect, <strong>and</strong> try to publish it as a null result<br />

2. You guess that there might be an effect, <strong>and</strong> try to publish it as a “borderline significant” result<br />

3. You give up <strong>and</strong> try a new study<br />

4. You collect some more data to see if the p value goes up or (preferably!) drops below the “magic”<br />

criterion of p ă .05<br />

Which would you choose? Be<strong>for</strong>e reading any further, I urge you to take some time to think about it.<br />

Be honest <strong>with</strong> yourself. But don’t stress about it too much, because you’re screwed no matter what<br />

you choose. Based on my own experiences as an author, reviewer <strong>and</strong> editor, as well as stories I’ve heard<br />

from <strong>other</strong>s, here’s what will happen in each case:<br />

10 http://knowyourmeme.com/memes/the-cake-is-a-lie<br />

- 564 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!