13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Hoekstra et al.Why assumptions are seldom checkedRESULTSOf <strong>the</strong> six <strong>data</strong>sets that <strong>the</strong> 30 participants were required to analyze,in all but three instances <strong>the</strong> expected technique was chosen. In <strong>the</strong>remaining three instances, ANOVA was used to analyze <strong>data</strong> setsthat were meant to be analyzed by means of a t-test. Since ANOVAis in this case completely equivalent to an independent-samples t-test, it can be concluded that an appropriate technique was chosenfor all <strong>data</strong> sets. In none of <strong>the</strong>se cases, an unconditional techniquewas chosen.Violations of,or conformance with,<strong>the</strong> assumptions of normality<strong>and</strong> homogeneity of variance were correctly checked in 12%(95%CI = [8%, 18%]) <strong>and</strong> 23% (95%CI = [18%, 30%]), respectively,of <strong>the</strong> analyzed <strong>data</strong> sets. Figure 2 shows for each of <strong>the</strong> threetechniques how frequently possible violations of <strong>the</strong> assumptionsof normality <strong>and</strong> homogeneity of variance occurred, <strong>and</strong> whe<strong>the</strong>r<strong>the</strong> checking was done correctly, or whe<strong>the</strong>r a preliminary test wasused. Note that <strong>the</strong> assumption of normality was rarely checked forregression, <strong>and</strong> never correctly. In <strong>the</strong> few occasions that normalitywas checked <strong>the</strong> normality of <strong>the</strong> scores instead of <strong>the</strong> residualswas examined. Although this approach might be useful for studying<strong>the</strong> distribution of <strong>the</strong> scores, it is insufficient for determiningwhe<strong>the</strong>r <strong>the</strong> assumption of normality has been violated.The percentages of participants giving each of <strong>the</strong> four reasonsfor not checking assumptions as measured by <strong>the</strong> questionnaireare given in Figure 3. A majority of <strong>the</strong> participants were unfamiliarwith <strong>the</strong> assumptions. For each assumption, only a minority ofparticipants mentioned at least one of <strong>the</strong> correct ways to checkfor a violation of <strong>the</strong> assumption. The majority of <strong>the</strong> participantsfailed to indicate that <strong>the</strong> alleged robustness of a technique againstviolations of <strong>the</strong> relevant assumption was a reason not to check<strong>the</strong>se assumptions in <strong>the</strong> first place. Many participants did notknow whe<strong>the</strong>r a violation of an assumption was important or not.Only in a minority of instances was an acceptable remedy for aviolation of an assumption mentioned. No unacceptable remedieswere mentioned. In general, participants indicated little knowledgeof how to overcome a violation of one of <strong>the</strong> assumptions,<strong>and</strong> most participants reported never having looked for a remedyagainst a violation of statistical assumptions.Participants had been told what <strong>the</strong> relevant assumptions werebefore <strong>the</strong>y had to answer <strong>the</strong>se questions. Therefore, <strong>the</strong> resultsfor <strong>the</strong> last three explanations per assumption in Figure 3 arereported for all participants,despite <strong>the</strong> fact that many participantsreported being unfamiliar with <strong>the</strong> assumption. This implies that,especially for <strong>the</strong> assumption of normality <strong>and</strong> to a lesser extentfor <strong>the</strong> assumption of equal variances, <strong>the</strong> results regarding <strong>the</strong> lastthree explanations should be interpreted with caution.DISCUSSIONIn order to examine people’s underst<strong>and</strong>ing of <strong>the</strong> assumptionsof statistical tests <strong>and</strong> <strong>the</strong>ir behavior with regard to checking<strong>the</strong>se assumptions, 30 researchers were asked to analyze six <strong>data</strong>sets using <strong>the</strong> t-test, ANOVA, regression or a non-parametricFIGURE 2 | The frequency of whe<strong>the</strong>r two assumptions were checked atall, whe<strong>the</strong>r <strong>the</strong>y were checked correctly, <strong>and</strong> whe<strong>the</strong>r a preliminary testwas used for three often used techniques in percentages of <strong>the</strong> totalnumber of cases. Between brackets are 95% CIs for <strong>the</strong> percentages.www.frontiersin.org May 2012 | Volume 3 | Article 137 | 12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!