OsborneAssumptions <strong>and</strong> <strong>data</strong> <strong>cleaning</strong>go beyond <strong>the</strong> basics of <strong>data</strong> <strong>cleaning</strong> <strong>and</strong> <strong>testing</strong> assumptions—to show that assumptions <strong>and</strong> quality <strong>data</strong> are still relevant <strong>and</strong>important in <strong>the</strong> 21st century. They went above <strong>and</strong> beyondthis challenge in many interesting—<strong>and</strong> unexpected ways. I hopethat this is <strong>the</strong> beginning—or a continuation—of an importantdiscussion that strikes at <strong>the</strong> very heart of our quantitativedisciplines; namely, whe<strong>the</strong>r we can trust any of <strong>the</strong> resultswe read in journals, <strong>and</strong> whe<strong>the</strong>r we can apply (or generalize)those results beyond <strong>the</strong> limited scope of <strong>the</strong> originalsample.REFERENCESBoneau, C. A. (1960). The effects ofviolations of assumptions underlying<strong>the</strong> t test. Psychol. Bull. 57,49–64. doi: 10.1037/h0041412Box, G. (1953). Non-normality <strong>and</strong>tests on variances. Biometrika40, 318.Feir-Walsh, B., <strong>and</strong> Toothaker, L.(1974). An empirical comparisonof <strong>the</strong> ANOVA F-test, normalscores test <strong>and</strong> Kruskal-Wallis testunder violation of assumptions.Educ.Psychol.Meas.34, 789. doi:10.1177/001316447403400406Havlicek, L. L., <strong>and</strong> Peterson, N. L.(1977). Effect of <strong>the</strong> violation ofassumptions upon significance levelsof <strong>the</strong> Pearson r. Psychol. Bull.84, 373–377. doi: 10.1037/0033-2909.84.2.373Keselman, H. J., Huberty, C. J., Lix,L. M., Olejnik, S., Cribbie, R.A., Donahue, B., et al. (1998).Statistical practices of educationalresearchers: an analysis of <strong>the</strong>irANOVA, MANOVA, <strong>and</strong> ANCOVAAnalyses. Rev. Edu. Res. 68, 350–386.doi: 10.3102/00346543068003350Lix, L., Keselman, J., <strong>and</strong> Keselman, H.(1996). Consequences of assumptionviolations revisited: a quantitativereview of alternatives to <strong>the</strong>one-way analysis of variance “F”Test. Rev. Educ. Res. 66, 579–619.Maxwell, S., <strong>and</strong> Delaney, H.(1990). Designing Experiments<strong>and</strong> Analyzing Data: a ModelComparison Perspective. PacificGrove, CA: Brooks Cole PublishingCompany.Osborne, J. W. (2008). <strong>Sweating</strong> <strong>the</strong>small stuff in educational psychology:how effect size <strong>and</strong> powerreporting failed to change from1969 to 1999, <strong>and</strong> what that meansfor <strong>the</strong> future of changing practices.Educ. Psychol. 28, 1–10. doi:10.1080/01443410701491718Osborne, J. W. (2012). Best Practicesin Data Cleaning: A Complete Guideto Everything You Need to DoBefore <strong>and</strong> After Collecting YourData. Thous<strong>and</strong> Oaks, CA: SagePublications.Osborne, J. W., Kocher, B., <strong>and</strong> Tillman,D. (2012). “<strong>Sweating</strong> <strong>the</strong> small stuff:do authors in APA journals clean<strong>data</strong> or test assumptions (<strong>and</strong> shouldanyone care if <strong>the</strong>y do),” in Paperpresented at <strong>the</strong> Annual meetingof <strong>the</strong> Eastern Education ResearchAssociation, (Hilton Head, SC).Pearson, E. (1931). The analysis of variancein cases of non-normal variation.Biometrika 23, 114.Pearson, K. (1901). Ma<strong>the</strong>maticalcontribution to <strong>the</strong> <strong>the</strong>ory of evolution.VII: On <strong>the</strong> correlationof characters not quantitativelymeasurable. Philos. Trans.R. Soc. Lond. B Biol. Sci. 195,1–47.Student. (1908). The probable error ofa mean. Biometrika 6, 1–25.Vardeman, S., <strong>and</strong> Morris, M. (2003).Statistics <strong>and</strong> Ethics. Am. Stat. 57,21–26. doi: 10.1198/0003130031072Wilcox, R. (1987). New designs in analysisof variance. Ann. Rev. Psychol.38, 29–60. doi: 10.1146/annurev.ps.38.020187.000333Received: 16 April 2013; accepted: 06June 2013; published online: 25 June2013.Citation: Osborne JW (2013) Is <strong>data</strong><strong>cleaning</strong> <strong>and</strong> <strong>the</strong> <strong>testing</strong> of assumptionsrelevant in <strong>the</strong> 21st century?Front. Psychol. 4:370. doi: 10.3389/fpsyg.2013.00370This article was submitted to <strong>Frontiers</strong>in Quantitative Psychology <strong>and</strong>Measurement, a specialty of <strong>Frontiers</strong> inPsychology.Copyright © 2013 Osborne. This isan open-access article distributed under<strong>the</strong> terms of <strong>the</strong> Creative CommonsAttribution License, whichpermitsuse,distribution <strong>and</strong> reproduction in o<strong>the</strong>rforums, provided <strong>the</strong> original authors<strong>and</strong> source are credited <strong>and</strong> subject to anycopyright notices concerning any thirdpartygraphics etc.www.frontiersin.org June 2013 | Volume 4 | Article 370 | 7
ORIGINAL RESEARCH ARTICLEpublished: 14 May 2012doi: 10.3389/fpsyg.2012.00137Are assumptions of well-known statistical techniqueschecked, <strong>and</strong> why (not)?Rink Hoekstra 1,2 *, Henk A. L. Kiers 2 <strong>and</strong> Addie Johnson 21GION –Institute for Educational Research, University of Groningen, Groningen, The Ne<strong>the</strong>rl<strong>and</strong>s2Department of Psychology, University of Groningen, Groningen, The Ne<strong>the</strong>rl<strong>and</strong>sEdited by:Jason W. Osborne, Old DominionUniversity, USAReviewed by:Jason W. Osborne, Old DominionUniversity, USAJelte M. Wicherts, University ofAmsterdam, The Ne<strong>the</strong>rl<strong>and</strong>s*Correspondence:Rink Hoekstra, GION, University ofGroningen, Grote Rozenstraat 3,9712 TG Groningen, The Ne<strong>the</strong>rl<strong>and</strong>se-mail: r.hoekstra@rug.nlA valid interpretation of most statistical techniques requires that one or more assumptionsbe met. In published articles, however, little information tends to be reported on whe<strong>the</strong>r<strong>the</strong> <strong>data</strong> satisfy <strong>the</strong> assumptions underlying <strong>the</strong> statistical techniques used. This could bedue to self-selection: Only manuscripts with <strong>data</strong> fulfilling <strong>the</strong> assumptions are submitted.Ano<strong>the</strong>r explanation could be that violations of assumptions are rarely checked for in<strong>the</strong> first place. We studied whe<strong>the</strong>r <strong>and</strong> how 30 researchers checked fictitious <strong>data</strong> forviolations of assumptions in <strong>the</strong>ir own working environment. Participants were asked toanalyze <strong>the</strong> <strong>data</strong> as <strong>the</strong>y would <strong>the</strong>ir own <strong>data</strong>, for which often used <strong>and</strong> well-known techniquessuch as <strong>the</strong> t-procedure, ANOVA <strong>and</strong> regression (or non-parametric alternatives)were required. It was found that <strong>the</strong> assumptions of <strong>the</strong> techniques were rarely checked,<strong>and</strong> that if <strong>the</strong>y were, it was regularly by means of a statistical test. Interviews afterwardrevealed a general lack of knowledge about assumptions, <strong>the</strong> robustness of <strong>the</strong> techniqueswith regards to <strong>the</strong> assumptions, <strong>and</strong> how (or whe<strong>the</strong>r) assumptions should be checked.These <strong>data</strong> suggest that checking for violations of assumptions is not a well-consideredchoice, <strong>and</strong> that <strong>the</strong> use of statistics can be described as opportunistic.Keywords: assumptions, robustness, analyzing <strong>data</strong>, normality, homogeneityINTRODUCTIONMost statistical techniques require that one or more assumptionsbe met, or, in <strong>the</strong> case that it has been proven that a technique isrobust against a violation of an assumption, that <strong>the</strong> assumptionis not violated too extremely. Applying <strong>the</strong> statistical techniqueswhen assumptions are not met is a serious problem when analyzing<strong>data</strong> (Olsen, 2003; Choi, 2005). Violations of assumptionscan seriously influence Type I <strong>and</strong> Type II errors, <strong>and</strong> can resultin overestimation or underestimation of <strong>the</strong> inferential measures<strong>and</strong> effect sizes (Osborne <strong>and</strong> Waters, 2002). Keselman et al.(1998) argue that “The applied researcher who routinely adoptsa traditional procedure without giving thought to its associatedassumptions may unwittingly be filling <strong>the</strong> literature with nonreplicableresults” (p. 351). Vardeman <strong>and</strong> Morris (2003) state“...absolutely never use any statistical method without realizingthat you are implicitly making assumptions, <strong>and</strong> that <strong>the</strong> validityof your results can never be greater than that of <strong>the</strong> most questionableof <strong>the</strong>se” (p. 26). According to <strong>the</strong> sixth edition of <strong>the</strong> APAPublication Manual, <strong>the</strong> methods researchers use “...must support<strong>the</strong>ir analytic burdens, including robustness to violations of<strong>the</strong> assumptions that underlie <strong>the</strong>m...” [American PsychologicalAssociation (APA, 2009); p. 33]. The Manual does not explicitlystate that researchers should check for possible violations ofassumptions <strong>and</strong> report whe<strong>the</strong>r <strong>the</strong> assumptions were met, butit seems reasonable to assume that in <strong>the</strong> case that researchers donot check for violations of assumptions, <strong>the</strong>y should be aware of<strong>the</strong> robustness of <strong>the</strong> technique.Many articles have been written on <strong>the</strong> robustness of certaintechniques with respect to violations of assumptions (e.g., Kohr<strong>and</strong> Games, 1974; Bradley, 1980; Sawilowsky <strong>and</strong> Blair, 1992;Wilcox <strong>and</strong> Keselman, 2003; Bathke, 2004), <strong>and</strong> many ways ofchecking to see if assumptions have been met (as well as solutionsto overcoming problems associated with any violations)have been proposed (e.g., Keselman et al., 2008). Using a statisticaltest is one of <strong>the</strong> frequently mentioned methods of checkingfor violations of assumptions (for an overview of statisticalmethodology textbooks that directly or indirectly advocate thismethod, see e.g., Hayes <strong>and</strong> Cai, 2007). However, it has alsobeen argued that it is not appropriate to check assumptions bymeans of tests (such as Levene’s test) carried out before decidingon which statistical analysis technique to use because suchtests compound <strong>the</strong> probability of making a Type I error (e.g.,Schucany <strong>and</strong> Ng, 2006). Even if one desires to check whe<strong>the</strong>ror not an assumption is met, two problems st<strong>and</strong> in <strong>the</strong> way.First, assumptions are usually about <strong>the</strong> population, <strong>and</strong> in asample <strong>the</strong> population is by definition not known. For example,it is usually not possible to determine <strong>the</strong> exact variance of<strong>the</strong> population in a sample-based study, <strong>and</strong> <strong>the</strong>refore it is alsoimpossible to determine that two population variances are equal,as is required for <strong>the</strong> assumption of equal variances (also referredto as <strong>the</strong> assumption of homogeneity of variances) to be satisfied.Second, because assumptions are usually defined in a verystrict way (e.g., all groups have equal variances in <strong>the</strong> population,or <strong>the</strong> variable is normally distributed in <strong>the</strong> population),<strong>the</strong> assumptions cannot reasonably be expected to be satisfied.Given <strong>the</strong>se complications, researchers can usually only examinewhe<strong>the</strong>r assumptions are not violated “too much” in <strong>the</strong>irsample; for deciding on what is too much, information aboutwww.frontiersin.org May 2012 | Volume 3 | Article 137 | 8