Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

More documents

Recommendations

Info

ORIGINAL RESEARCH ARTICLEpublished: 29 August 2012doi: 10.3389/fpsyg.2012.00325Statistical conclusion validity: some common threats andsimple remediesMiguel A. García-Pérez*Facultad de Psicología, Departamento de Metodología, Universidad Complutense, Madrid, SpainEdited by:Jason W. Osborne, Old DominionUniversity, USAReviewed by:Megan Welsh, University ofConnecticut, USADavid Flora, York University, Canada*Correspondence:Miguel A. García-Pérez, Facultad dePsicología, Departamento deMetodología, Campus deSomosaguas, UniversidadComplutense, 28223 Madrid, Spain.e-mail: miguel@psi.ucm.esThe ultimate goal of research is to produce dependable knowledge or to provide the evidencethat may guide practical decisions. Statistical conclusion validity (SCV) holds whenthe conclusions of a research study are founded on an adequate analysis of the data, generallymeaning that adequate statistical methods are used whose small-sample behavioris accurate, besides being logically capable of providing an answer to the research question.Compared to the three other traditional aspects of research validity (external validity,internal validity, and construct validity), interest in SCV has recently grown on evidencethat inadequate data analyses are sometimes carried out which yield conclusions that aproper analysis of the data would not have supported. This paper discusses evidence ofthree common threats to SCV that arise from widespread recommendations or practicesin data analysis, namely, the use of repeated testing and optional stopping without controlof Type-I error rates, the recommendation to check the assumptions of statistical tests,and the use of regression whenever a bivariate relation or the equivalence between twovariables is studied. For each of these threats, examples are presented and alternativepractices that safeguard SCV are discussed. Educational and editorial changes that mayimprove the SCV of published research are also discussed.Keywords: data analysis, validity of research, regression, stopping rules, preliminary testsPsychologists are well aware of the traditional aspects of researchvalidity introduced by Campbell and Stanley (1966) and furthersubdivided and discussed by Cook and Campbell (1979).Despite initial criticisms of the practically oriented and somewhatfuzzy distinctions among the various aspects (see Cookand Campbell, 1979, pp. 85–91; see also Shadish et al., 2002,pp. 462–484), the four facets of research validity have gainedrecognition and they are currently covered in many textbookson research methods in psychology (e.g., Beins, 2009; Goodwin,2010; Girden and Kabacoff, 2011). Methods and strategiesaimed at securing research validity are also discussed inthese and other sources. To simplify the description, constructvalidity is sought by using well-established definitions and measurementprocedures for variables, internal validity is sought byensuring that extraneous variables have been controlled and confoundshave been eliminated, and external validity is sought byobserving and measuring dependent variables under natural conditionsor under an appropriate representation of them. Thefourth aspect of research validity, which Cook and Campbellcalled statistical conclusion validity (SCV), is the subject of thispaper.Cook and Campbell, 1979, pp. 39–50) discussed that SCVpertains to the extent to which data from a research study can reasonablybe regarded as revealing a link (or lack thereof) betweenindependent and dependent variables as far as statistical issues areconcerned. This particular facet was separated from other factorsacting in the same direction (the three other facets of validity) andincludes three aspects: (1) whether the study has enough statisticalpower to detect an effect if it exists, (2) whether there is a riskthat the study will “reveal” an effect that does not actually exist,and (3) how can the magnitude of the effect be confidently estimated.They nevertheless considered the latter aspect as a merestep ahead once the first two aspects had been satisfactorily solved,and they summarized their position by stating that SCV “refersto inferences about whether it is reasonable to presume covariationgiven a specified α level and the obtained variances” (Cookand Campbell, 1979, p. 41). Given that mentioning “the obtainedvariances” was an indirect reference to statistical power and mentioningα was a direct reference to statistical significance, theirposition about SCV may have seemed to only entail considerationthat the statistical decision can be incorrect as a result of Type-Iand Type-II errors. Perhaps as a consequence of this literal interpretation,review papers studying SCV in published research havefocused on power and significance (e.g., Ottenbacher, 1989; Ottenbacherand Maas, 1999), strategies aimed at increasing SCV haveonly considered these issues (e.g., Howard et al., 1983), and tutorialson the topic only or almost only mention these issues along witheffect sizes (e.g., Orme, 1991; Austin et al., 1998; Rankupalli andTandon, 2010). This emphasis on issues of significance and powermay also be the reason that some sources refer to threats to SCV as“any factor that leads to a Type-I or a Type-II error” (e.g., Girdenand Kabacoff, 2011, p. 6; see also Rankupalli and Tandon, 2010,Section 1.2), as if these errors had identifiable causes that could beprevented. It should be noted that SCV has also occasionally beenpurported to reflect the extent to which pre-experimental designsprovide evidence for causation (Lee, 1985) or the extent to whichwww.frontiersin.org August 2012 | Volume 3 | Article 325 | 17
García-PérezStatistical conclusion validitymeta-analyses are based on representative results that make theconclusion generalizable (Elvik, 1998).But Cook and Campbell’s (1979, p. 80) aim was undoubtedlybroader, as they stressed that SCV “is concerned with sources ofrandom error and with the appropriate use of statistics and statisticaltests” (italics added). Moreover, Type-I and Type-II errors arean essential and inescapable consequence of the statistical decisiontheory underlying significance testing and, as such, the potentialoccurrence of one or the other of these errors cannot be prevented.The actual occurrence of them for the data on hand cannot beassessed either. Type-I and Type-II errors will always be with usand, hence, SCV is only trivially linked to the fact that research willnever unequivocally prove or reject any statistical null hypothesisor its originating research hypothesis. Cook and Campbell seemedto be well aware of this issue when they stressed that SCV refersto reasonable inferences given a specified significance level and agiven power. In addition, Stevens (1950, p. 121) forcefully emphasizedthat“it is a statistician’s duty to be wrong the stated number oftimes,” implying that a researcher should accept the assumed risksof Type-I and Type-II errors, use statistical methods that guaranteethe assumed error rates, and consider these as an essential partof the research process. From this position, these errors do notaffect SCV unless their probability differs meaningfully from thatwhich was assumed. And this is where an alternative perspectiveon SCV enters the stage, namely, whether the data were analyzedproperly so as to extract conclusions that faithfully reflect what thedata have to say about the research question. A negative answerraises concerns about SCV beyond the triviality of Type-I or Type-II errors. There are actually two types of threat to SCV from thisperspective. One is when the data are subjected to thoroughly inadequatestatistical analyses that do not match the characteristics ofthe design used to collect the data or that cannot logically give ananswer to the research question. The other is when a proper statisticaltest is used but it is applied under conditions that alter thestated risk probabilities. In the former case, the conclusion will bewrong except by accident; in the latter, the conclusion will fail tobe incorrect with the declared probabilities of Type-I and Type-IIerrors.The position elaborated in the foregoing paragraph is well summarizedin Milligan and McFillen’s (1984, p. 439) statement that“under normal conditions (. . .) the researcher will not know whena null effect has been declared significant or when a valid effecthas gone undetected (. . .) Unfortunately, the statistical conclusionvalidity, and the ultimate value of the research, rests on the explicitcontrol of (Type-I and Type-II) error rates.” This perspective onSCV is explicitly discussed in some textbooks on research methods(e.g., Beins, 2009, pp. 139–140; Goodwin, 2010, pp. 184–185) andsome literature reviews have been published that reveal a soundfailure of SCV in these respects.For instance, Milligan and McFillen’s (1984, p. 438) reviewedevidence that “the business research community has succeededin publishing a great deal of incorrect and statistically inadequateresearch” and they dissected and discussed in detail fouradditional cases (among many others that reportedly could havebeen chosen) in which a breach of SCV resulted from gross mismatchesbetween the research design and the statistical analysis.Similarly, García-Pérez (2005) reviewed alternative methods tocompute confidence intervals for proportions and discussed threepapers (among many others that reportedly could have been chosen)in which inadequate confidence intervals had been computed.More recently, Bakker and Wicherts (2011) conducted a thoroughanalysis of psychological papers and estimated that roughly 50%of published papers contain reporting errors, although they onlychecked whether the reported p value was correct and not whetherthe statistical test used was appropriate. A similar analysis carriedout by Nieuwenhuis et al. (2011) revealed that 50% of the papersreporting the results of a comparison of two experimental effectsin top neuroscience journals had used an incorrect statistical procedure.And Bland and Altman (2011) reported further data onthe prevalence of incorrect statistical analyses of a similar nature.An additional indicator of the use of inadequate statistical proceduresarises from consideration of published papers whose titleexplicitly refers to a re-analysis of data reported in some otherpaper. A literature search for papers including in their title theterms “a re-analysis,” “a reanalysis,” “re-analyses,” “reanalyses,” or“alternative analysis” was conducted on May 3, 2012 in the Webof Science (WoS; http://thomsonreuters.com), which rendered 99such papers with subject area “Psychology” published in 1990 orlater. Although some of these were false positives, a sizeable numberof them actually discussed the inadequacy of analyses carriedout by the original authors and reported the results of proper alternativeanalyses that typically reversed the original conclusion. Thistype of outcome upon re-analyses of data are more frequent thanthe results of this quick and simple search suggest, because theinformation for identification is not always included in the title ofthe paper or is included in some other form: For a simple example,the search for the clause “a closer look” in the title rendered131 papers, many of which also presented re-analyses of data thatreversed the conclusion of the original study.Poor design or poor sample size planning may, unbeknownst tothe researcher, lead to unacceptable Type-II error rates, which willcertainly affect SCV (as long as the null is not rejected; if it is, theprobability of a Type-II error is irrelevant). Although insufficientpower due to lack of proper planning has consequences on statisticaltests, the thread of this paper de-emphasizes this aspect ofSCV (which should perhaps more reasonably fit within an alternativecategory labeled design validity) and emphasizes the ideathat SCV holds when statistical conclusions are incorrect with thestated probabilities of Type-I and Type-II errors (whether the latterwas planned or simply computed). Whether or not the actualsignificance level used in the research or the power that it hadis judged acceptable is another issue, which does not affect SCV:The statistical conclusion is valid within the stated (or computed)error probabilities. A breach of SCV occurs, then, when the dataare not subjected to adequate statistical analyses or when controlof Type-I or Type-II errors is lost.It should be noted that a further component was included intoconsideration of SCV in Shadish et al.’s (2002) sequel to Cook andCampbell’s (1979) book, namely, effect size. Effect size relates towhat has been called a Type-III error (Crawford et al., 1998), thatis, a statistically significant result that has no meaningful practicalimplication and that only arises from the use of a huge sample.This issue is left aside in the present paper because adequate considerationand reporting of effect sizes precludes Type-III errors,Frontiers in Psychology | Quantitative Psychology and Measurement August 2012 | Volume 3 | Article 325 | 18
Page 2 and 3: FRONTIERS COPYRIGHTSTATEMENT© Copy
Page 4 and 5: Table of Contents05 Is Data Cleanin
Page 7 and 8: OsborneAssumptions and data cleanin
Page 9 and 10: ORIGINAL RESEARCH ARTICLEpublished:
Page 12 and 13: Hoekstra et al.Why assumptions are
Page 20 and 21: García-PérezStatistical conclusio
Page 30 and 31: Sheng and ShengEffect of non-normal
Page 42 and 43: REVIEW ARTICLEpublished: 12 April 2
Page 44 and 45: Nimon et al.The assumption of relia
Page 56 and 57: TressoldiPower replication unreliab
Page 58 and 59: TressoldiPower replication unreliab
Page 60 and 61: ORIGINAL RESEARCH ARTICLEpublished:
Page 62 and 63: FinchModern methods for the detecti
Page 68 and 69:
FinchModern methods for the detecti
Page 70 and 71:
FinchModern methods for the detecti
Page 72 and 73:
MINI REVIEW ARTICLEpublished: 28 Au
Page 74 and 75:
NimonStatistical assumptionsand Del
Page 76 and 77:
NimonStatistical assumptionsFor exa
Page 78 and 79:
Kraha et al.Interpreting multiple r
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
SmithsonComparing moderation of slo
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
REVIEW ARTICLEpublished: 01 March 2
Page 104 and 105:
Flora et al.Factor analysis assumpt
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Kasper and ÜnlüAssumptions of fac
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Lages and JaworskaHow predictable a
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Cummiskey et al.Testing assumptions
Page 154 and 155:
Page 156 and 157:
Page 158:
show all

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?