Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

More documents

Recommendations

Info

Hoekstra et al.Why assumptions are seldom checkedchecking for the assumption, provided that the assessment wasappropriate for the technique at hand. A correct check for theassumption of normality was recorded if, for the t-test andANOVA, a graphical representation of the different groups wasrequested, except when the graph was used only to detect outliers.Merely looking at the numbers, without making a visualrepresentation was considered insufficient. For regression analysis,making a plot of the residuals was considered to be a correctcheck of the assumption of normality. Deciding whether this wasdone explicitly was based on whether the participant made anyreference to normality when thinking aloud. A second optionwas to make a QQ- or PP-plot of the residuals. Selecting theKolmogorov–Smirnov test or the Shapiro–Wilk test within SPSSwas considered checking for the assumption of normality using apreliminary test.Three ways of checking for the assumption of homogeneity ofvariance for the t-test and ANOVA were considered adequate. Thefirst was to make a graphical representation of the data in such away that difference in variance between the groups was visible (e.g.,boxplots or scatter plots, provided that they are given per group).A second way was to make an explicit reference to the varianceof the groups. A final possibility was to compare standard deviationsof the groups in the output, with or without making useof a rule of thumb to discriminate between violations and nonviolations.For regression analysis, a scatter plot or a residual plotwas considered necessary to check the assumption of homogeneityof variance. Although the assumption of homogeneity of varianceassumes equality of the population variations, an explicit referenceto the population was not required. The preliminary tests that wererecorded included Levene’s test, the F-ratio test, Bartlett’s test, andthe Brown–Forsythe test.The frequency of using preliminary tests was reported separatelyfrom other ways of checking for assumptions. Althoughthe use of preliminary tests is often considered an inappropriatemethod for checking assumptions, their use does show awarenessof the existence of the assumption. Occurrences of checking forirrelevant assumptions, such as equal group sizes for the t-test, ornormality of all scores for one variable (instead of checking fornormality per group) for all three techniques were also counted,but scored as incorrectly checking for an assumption.QuestionnaireThe questionnaire addressed four explanations for why anassumption was not checked: (1) Unfamiliarity with the assumption,(2) Unfamiliarity with how to check the assumptions, (3)Violation of the assumption not being regarded problematic, and(4) Unfamiliarity with a remedy against a violation of the assumption.Each of these explanations was operationalized before thequestionnaires were analyzed. The experimenter was present duringquestionnaire administration to stimulate the participants toanswer more extensively, if necessary, or ask them to reformulatetheir answer when they seemed to have misread the question.Unfamiliarity with the assumptions. Participants were asked towrite down the assumptions they thought it was necessary to checkfor each of the three statistical techniques used in the study. Simplymentioning the assumption of normality or homogeneity ofvariance was scored as being familiar with the assumption, evenif the participants did not specify what, exactly, was requiredto follow a normal distribution or which variances were supposedto be equal. Explaining the assumptions without explicitlymentioning them was also scored as being familiar with thisassumption.Unfamiliarity with how to check the assumptions. Participantswere asked if they could think of a way to investigate whetherthere was a violation of each of the two assumptions (normalityand homogeneity of variance) for t-tests, ANOVA and regression,respectively. Thus, the assumptions per technique were explicitlygiven, whether or not they had been correctly reported in answerto the previous question. For normality, specifying how to visualizethe data in such a way that a possible violation was visible wascategorized as a correct way of checking for assumption violations(for example: making a QQ-plot, or making a histogram), evenwhen no further information was given about how to make such avisualization. Mentioning a measure of or a test for normality wasalso considered correct. For studying homogeneity of variance,rules of thumb or tests, such as Levene’s test for testing equalityof variances, were categorized as a correct way of checking thisassumption, and the same holds for eyeballing visual representationsfrom which variances could be deduced. Note that the criteriafor a correct check are lenient, since they include preliminary teststhat are usually considered inappropriate.Violation of the assumption not being regarded problematic.For techniques for which it has been shown that they are robustagainst certain assumption violations, it can be argued that itmakes sense not to check for these assumptions, because theoutcome of this checking process would not influence the interpretationof the data anyway. To study this explanation, participantswere asked per assumption and for the three techniques whetherthey considered a possible violation to be influential. Afterward,the answers that indicated that this influence was small or absentwere scored as satisfying the criteria for this explanation.Unfamiliarity with a remedy against a violation of an assumption.One could imagine that a possible violation of assumptionsis not checked because no remedy for such violations is known.Participants were thus asked to note remedies for possible violationsof normality and homogeneity of variance for each of thethree statistical analysis techniques. Correct remedies were definedas transforming the data (it was not required that participantsspecify which transformation), using a different technique (e.g., anon-parametric technique when the assumption of normality hasbeen violated) and increasing the sample size.DATA ANALYSISAll results are presented as percentages of the total number of participantsor of the total number of analyzed data sets, dependingon the specific research question. Confidence intervals (CIs) aregiven, but should be interpreted cautiously because the samplecannot be regarded as being completely random. The CIs for percentageswere calculated by the so-called Score CIs (Wilson, 1927).All CIs are 95% CIs.Frontiers in Psychology | Quantitative Psychology and Measurement May 2012 | Volume 3 | Article 137 |11
Hoekstra et al.Why assumptions are seldom checkedRESULTSOf the six datasets that the 30 participants were required to analyze,in all but three instances the expected technique was chosen. In theremaining three instances, ANOVA was used to analyze data setsthat were meant to be analyzed by means of a t-test. Since ANOVAis in this case completely equivalent to an independent-samples t-test, it can be concluded that an appropriate technique was chosenfor all data sets. In none of these cases, an unconditional techniquewas chosen.Violations of,or conformance with,the assumptions of normalityand homogeneity of variance were correctly checked in 12%(95%CI = [8%, 18%]) and 23% (95%CI = [18%, 30%]), respectively,of the analyzed data sets. Figure 2 shows for each of the threetechniques how frequently possible violations of the assumptionsof normality and homogeneity of variance occurred, and whetherthe checking was done correctly, or whether a preliminary test wasused. Note that the assumption of normality was rarely checked forregression, and never correctly. In the few occasions that normalitywas checked the normality of the scores instead of the residualswas examined. Although this approach might be useful for studyingthe distribution of the scores, it is insufficient for determiningwhether the assumption of normality has been violated.The percentages of participants giving each of the four reasonsfor not checking assumptions as measured by the questionnaireare given in Figure 3. A majority of the participants were unfamiliarwith the assumptions. For each assumption, only a minority ofparticipants mentioned at least one of the correct ways to checkfor a violation of the assumption. The majority of the participantsfailed to indicate that the alleged robustness of a technique againstviolations of the relevant assumption was a reason not to checkthese assumptions in the first place. Many participants did notknow whether a violation of an assumption was important or not.Only in a minority of instances was an acceptable remedy for aviolation of an assumption mentioned. No unacceptable remedieswere mentioned. In general, participants indicated little knowledgeof how to overcome a violation of one of the assumptions,and most participants reported never having looked for a remedyagainst a violation of statistical assumptions.Participants had been told what the relevant assumptions werebefore they had to answer these questions. Therefore, the resultsfor the last three explanations per assumption in Figure 3 arereported for all participants,despite the fact that many participantsreported being unfamiliar with the assumption. This implies that,especially for the assumption of normality and to a lesser extentfor the assumption of equal variances, the results regarding the lastthree explanations should be interpreted with caution.DISCUSSIONIn order to examine people’s understanding of the assumptionsof statistical tests and their behavior with regard to checkingthese assumptions, 30 researchers were asked to analyze six datasets using the t-test, ANOVA, regression or a non-parametricFIGURE 2 | The frequency of whether two assumptions were checked atall, whether they were checked correctly, and whether a preliminary testwas used for three often used techniques in percentages of the totalnumber of cases. Between brackets are 95% CIs for the percentages.www.frontiersin.org May 2012 | Volume 3 | Article 137 | 12
Page 2 and 3: FRONTIERS COPYRIGHTSTATEMENT© Copy
Page 4 and 5: Table of Contents05 Is Data Cleanin
Page 7 and 8: OsborneAssumptions and data cleanin
Page 9 and 10: ORIGINAL RESEARCH ARTICLEpublished:
Page 14 and 15: Hoekstra et al.Why assumptions are
Page 16 and 17: Hoekstra et al.Why assumptions are
Page 20 and 21: García-PérezStatistical conclusio
Page 30 and 31: Sheng and ShengEffect of non-normal
Page 42 and 43: REVIEW ARTICLEpublished: 12 April 2
Page 44 and 45: Nimon et al.The assumption of relia
Page 56 and 57: TressoldiPower replication unreliab
Page 58 and 59: TressoldiPower replication unreliab
Page 62 and 63:
FinchModern methods for the detecti
Page 64 and 65:
Page 66 and 67:
Page 68 and 69:
Page 70 and 71:
Page 72 and 73:
MINI REVIEW ARTICLEpublished: 28 Au
Page 74 and 75:
NimonStatistical assumptionsand Del
Page 76 and 77:
NimonStatistical assumptionsFor exa
Page 78 and 79:
Kraha et al.Interpreting multiple r
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
SmithsonComparing moderation of slo
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
REVIEW ARTICLEpublished: 01 March 2
Page 104 and 105:
Flora et al.Factor analysis assumpt
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Kasper and ÜnlüAssumptions of fac
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Lages and JaworskaHow predictable a
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Cummiskey et al.Testing assumptions
Page 154 and 155:
Page 156 and 157:
Page 158:
show all

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?