13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Hoekstra et al.Why assumptions are seldom checked<strong>the</strong> robustness of <strong>the</strong> technique with regard to violations of <strong>the</strong>assumptions is necessary.The assumptions of normality <strong>and</strong> of homogeneity of variancesare required to be met for <strong>the</strong> t-test for independent groupmeans, one of <strong>the</strong> most widely used statistical tests (Hayes <strong>and</strong>Cai, 2007), as well as for <strong>the</strong> frequently used techniques ANOVA<strong>and</strong> regression (Kashy et al., 2009). The assumption of normalityis that <strong>the</strong> scores in <strong>the</strong> population in case of a t-test or ANOVA,<strong>and</strong> <strong>the</strong> population residuals in case of regression, be normallydistributed. The assumption of homogeneity of variance requiresequal population variances per group in case of a t-test or ANOVA,<strong>and</strong> equal population variances for every value of <strong>the</strong> independentvariable for regression. Although researchers might be tempted tothink that most statistical procedures are relatively robust againstmost violations, several studies have shown that this is often not<strong>the</strong> case, <strong>and</strong> that in <strong>the</strong> case of one-way ANOVA, unequal groupsizes can have a negative impact on <strong>the</strong> technique’s robustness(e.g., Havlicek <strong>and</strong> Peterson, 1977; Wilcox, 1987; Lix et al., 1996).Many textbooks advise that <strong>the</strong> assumptions of normality <strong>and</strong>homogeneity of variance be checked graphically (Hazelton, 2003;Schucany <strong>and</strong> Ng, 2006), such as by making normal quantile plotsfor checking for normality. Ano<strong>the</strong>r method, which is advisedin many o<strong>the</strong>r textbooks (Hayes <strong>and</strong> Cai, 2007), is to use a socalledpreliminary test to determine whe<strong>the</strong>r to continue with<strong>the</strong> intended technique or to use an alternative technique instead.Preliminary tests could, for example, be used to choose betweena pooled t-test <strong>and</strong> a Welch t-test or between ANOVA <strong>and</strong> anon-parametric alternative. Following <strong>the</strong> argument that preliminarytests should not be used because, amongst o<strong>the</strong>rs, <strong>the</strong>y caninflate <strong>the</strong> probability of making a Type I error (e.g., Gans, 1981;Wilcox et al., 1986; Best <strong>and</strong> Rayner, 1987; Zimmerman, 2004,2011; Schoder et al., 2006; Schucany <strong>and</strong> Ng, 2006; Rochon <strong>and</strong>Kieser, 2011), it has also been argued that in many cases unconditionaltechniques should be <strong>the</strong> techniques of choice (Hayes <strong>and</strong>Cai, 2007). For example, <strong>the</strong> Welch t-test, which does not requirehomogeneity of variance,would be seen a priori as preferable to <strong>the</strong>pooled variance t-test (Zimmerman, 1996; Hayes <strong>and</strong> Cai, 2007).Although <strong>the</strong> conclusions one can draw when analyzing a <strong>data</strong>set with statistical techniques depend on whe<strong>the</strong>r <strong>the</strong> assumptionsfor that technique are met, <strong>and</strong>, if that is not <strong>the</strong> case, whe<strong>the</strong>r<strong>the</strong> technique is robust against violations of <strong>the</strong> assumption, nowork, to our knowledge, describing whe<strong>the</strong>r researchers check forviolations of assumptions in practice has been published. Whenpossible violations of assumptions are checked, <strong>and</strong> why <strong>the</strong>ysometimes are not, is a relevant question given <strong>the</strong> continuingprevalence of preliminary tests. For example, an inspection of <strong>the</strong>most recent 50 articles published in 2011 in Psychological Sciencethat contained at least one t-test, ANOVA or regression analysis,revealed that in only three of <strong>the</strong>se articles was <strong>the</strong> normality of <strong>the</strong><strong>data</strong> or <strong>the</strong> homogeneity of variances discussed, leaving open <strong>the</strong>question of whe<strong>the</strong>r <strong>the</strong>se assumptions were or were not checkedin practice, <strong>and</strong> how. Keselman et al. (1998) showed in a reviewof 61 articles that used a between-subject univariate design that inonly a small minority of articles anything about <strong>the</strong> assumptionsof normality (11%) or homogeneity (8%) was mentioned, <strong>and</strong>that in only 5% of <strong>the</strong> articles something about both assumptionswas mentioned. In <strong>the</strong> same article, Keselman et al. present resultsof ano<strong>the</strong>r study in which 79 articles with a between-subject multivariatedesign were checked for references to assumptions, <strong>and</strong>again <strong>the</strong> assumptions were rarely mentioned. Osborne (2008)found similar results: In 96 articles published in high quality journals,checking of assumptions was reported in only 8% of <strong>the</strong>cases.In <strong>the</strong> present paper a study is presented in which <strong>the</strong> behaviorof researchers while analyzing <strong>data</strong> was observed, particularlywith regard to <strong>the</strong> checking for violations of assumptions whenanalyzing <strong>data</strong>. It was hypo<strong>the</strong>sized that <strong>the</strong> checking of assumptionsmight not routinely occur when analyzing <strong>data</strong>. There can,of course, be rational reasons for not checking assumptions.Researchers might, for example, have knowledge about <strong>the</strong> robustnessof <strong>the</strong> technique <strong>the</strong>y are using with respect to violations ofassumptions, <strong>and</strong> <strong>the</strong>refore consider <strong>the</strong> checking of possible violationsunnecessary. In addition to observing whe<strong>the</strong>r or not <strong>the</strong><strong>data</strong> were checked for violations of assumptions, we <strong>the</strong>refore alsoadministered a questionnaire to assess why <strong>data</strong> were not alwayschecked for possible violations of assumptions. Specifically, wefocused on four possible explanations for failing to check for violationsof assumptions: (1) lack of knowledge of <strong>the</strong> assumption,(2) not knowing how to check whe<strong>the</strong>r <strong>the</strong> assumption has beenviolated, (3) not considering a possible violation of an assumptionproblematic (for example, because of <strong>the</strong> robustness of <strong>the</strong> technique),<strong>and</strong> (4) lack of knowledge of an alternative in <strong>the</strong> case thatan assumption seems to be violated.MATERIALS AND METHODSPARTICIPANTSThirty Ph.D. students, 13 men <strong>and</strong> 17 women (mean age = 27,SD = 1.5), working at Psychology Departments (but not in <strong>the</strong>area of methodology or statistics) throughout The Ne<strong>the</strong>rl<strong>and</strong>s,participated in <strong>the</strong> study. All had at least 2 years experienceconducting research at <strong>the</strong> university level. Ph.D. students wereselected because of <strong>the</strong>ir active involvement in <strong>the</strong> collection <strong>and</strong>analysis of <strong>data</strong>. Moreover, <strong>the</strong>y were likely to have had <strong>the</strong>irstatistical education relatively recently, assuring a relatively upto-dateknowledge of statistics. They were required to have at leastonce applied a t-procedure, a linear regression analysis <strong>and</strong> anANOVA, although not necessarily in <strong>the</strong>ir own research project.Ten participants were r<strong>and</strong>omly selected from each of <strong>the</strong> Universitiesof Tilburg, Groningen, <strong>and</strong> Amsterdam, three cities indifferent regions of The Ne<strong>the</strong>rl<strong>and</strong>s. In order to get 30 participants,41 Ph.D. students were approached for <strong>the</strong> study, of whom11 chose not to participate. Informed consent was obtained fromall participants, <strong>and</strong> anonymity was ensured.TASKThe task consisted of two parts: <strong>data</strong> analysis <strong>and</strong> questionnaire.For <strong>the</strong> <strong>data</strong> analysis task participants were asked to analyze six<strong>data</strong> sets,<strong>and</strong> write down <strong>the</strong>ir inferential conclusions for each <strong>data</strong>set. The t-test, ANOVA <strong>and</strong> regression (or unconditional alternativesto those techniques) were intended to be used, because <strong>the</strong>yare relatively simple, frequently used, <strong>and</strong> because it was expectedthat most participants would be familiar with those techniques.Participants could take as much time as <strong>the</strong>y wanted, <strong>and</strong> no limitwas given to <strong>the</strong> length of <strong>the</strong> inferential conclusions. The second<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement May 2012 | Volume 3 | Article 137 | 9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!