06.09.2021 Views

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

Learning Statistics with R - A tutorial for psychology students and other beginners, 2018a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Frequency<br />

0 5 10 15 20<br />

Normally Distributed Data<br />

Sample Quantiles<br />

−2 −1 0 1 2<br />

Normal Q−Q Plot<br />

−2 −1 0 1 2<br />

Value<br />

(a)<br />

−2 −1 0 1 2<br />

Theoretical Quantiles<br />

(b)<br />

Figure 13.11: Histogram (panel a) <strong>and</strong> normal QQ plot (panel b) of normal.data, a normally distributed<br />

sample <strong>with</strong> 100 observations. The Shapiro-Wilk statistic associated <strong>with</strong> these data is W “ .99, indicating<br />

that no significant departures from normality were detected (p “ .73).<br />

.......................................................................................................<br />

And the results are shown in Figure 13.11. As you can see, these data <strong>for</strong>m a pretty straight line; which<br />

is no surprise given that we sampled them from a normal distribution! In contrast, have a look at the<br />

two data sets shown in Figure 13.12. The top panels show the histogram <strong>and</strong> a QQ plot <strong>for</strong> a data set<br />

that is highly skewed: the QQ plot curves upwards. The lower panels show the same plots <strong>for</strong> a heavy<br />

tailed (i.e., high kurtosis) data set: in this case, the QQ plot flattens in the middle <strong>and</strong> curves sharply at<br />

either end.<br />

13.9.2 Shapiro-Wilk tests<br />

Although QQ plots provide a nice way to in<strong>for</strong>mally check the normality of your data, sometimes<br />

you’ll want to do something a bit more <strong>for</strong>mal. And when that moment comes, the Shapiro-Wilk test<br />

(Shapiro & Wilk, 1965) is probably what you’re looking <strong>for</strong>. 17 As you’d expect, the null hypothesis being<br />

tested is that a set of N observations is normally distributed. The test statistic that it calculates is<br />

conventionally denoted as W , <strong>and</strong> it’s calculated as follows. First, we sort the observations in order of<br />

increasing size, <strong>and</strong> let X 1 be the smallest value in the sample, X 2 be the second smallest <strong>and</strong> so on.<br />

Then the value of W is given by<br />

´řN<br />

i“1 a iX i¯2<br />

W “ ř N<br />

i“1 pX i ´ ¯Xq 2<br />

where ¯X is the mean of the observations, <strong>and</strong> the a i values are ... mumble, mumble ... something<br />

complicated that is a bit beyond the scope of an introductory text.<br />

17 Either that, or the Kolmogorov-Smirnov test, which is probably more traditional than the Shapiro-Wilk, though most<br />

things I’ve read seem to suggest Shapiro-Wilk is the better test of normality; although Kolomogorov-Smirnov is a general<br />

purpose test of distributional equivalence, so it can be adapted to h<strong>and</strong>le <strong>other</strong> kinds of distribution tests; in R it’s<br />

implemented via the ks.test() function.<br />

- 417 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!