13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

García-PérezStatistical conclusion validityhow SCV could be improved, a few words are worth about howBayesian approaches fare on SCV.THE BAYESIAN APPROACHAdvocates of Bayesian approaches to <strong>data</strong> analysis, hypo<strong>the</strong>sis<strong>testing</strong>, <strong>and</strong> model selection (e.g., Jennison <strong>and</strong> Turnbull, 1990;Wagenmakers, 2007; Mat<strong>the</strong>ws, 2011) overemphasize <strong>the</strong> problemsof <strong>the</strong> frequentist approach <strong>and</strong> praise <strong>the</strong> solutions offeredby <strong>the</strong> Bayesian approach: Bayes factors (BFs) for hypo<strong>the</strong>sis <strong>testing</strong>,credible intervals for interval estimation, Bayesian posteriorprobabilities, Bayesian information criterion (BIC) as a tool formodel selection <strong>and</strong>, above all else, strict reliance on observed <strong>data</strong><strong>and</strong> independence of <strong>the</strong> sampling plan (i.e., fixed vs. sequentialsampling). There is unquestionable merit in <strong>the</strong>se alternatives <strong>and</strong>a fair comparison with <strong>the</strong>ir frequentist counterparts requires adetailed analysis that is beyond <strong>the</strong> scope of this paper. Yet, I cannotresist <strong>the</strong> temptation of commenting on <strong>the</strong> presumed problems of<strong>the</strong> frequentist approach <strong>and</strong> also on <strong>the</strong> st<strong>and</strong>ing of <strong>the</strong> Bayesianapproach with respect to SCV.One of <strong>the</strong> preferred objections to p values is that <strong>the</strong>y relate to<strong>data</strong> that were never collected <strong>and</strong> which, thus, should not affect<strong>the</strong> decision of what hypo<strong>the</strong>sis <strong>the</strong> observed <strong>data</strong> support or failto support. Intuitively appealing as it may seem, <strong>the</strong> argument isflawed because <strong>the</strong> referent for a p value is not o<strong>the</strong>r <strong>data</strong> setsthat could have been observed in undone replications of <strong>the</strong> sameexperiment. Instead, <strong>the</strong> referent is <strong>the</strong> properties of <strong>the</strong> test statisticitself, which is guaranteed to have <strong>the</strong> declared samplingdistribution when <strong>data</strong> are collected as assumed in <strong>the</strong> derivationof such distribution. Statistical tests are calibrated procedures withknown properties, <strong>and</strong> this calibration is what makes <strong>the</strong>ir resultsinterpretable. As is <strong>the</strong> case for any o<strong>the</strong>r calibrated procedure ormeasuring instrument, <strong>the</strong> validity of <strong>the</strong> outcome only rests onadherence to <strong>the</strong> usage specifications. And, of course, <strong>the</strong> test statistic<strong>and</strong> <strong>the</strong> resultant p value on application cannot be blamed for<strong>the</strong> consequences of a failure to collect <strong>data</strong> properly or to apply<strong>the</strong> appropriate statistical test.Consider a two-sample t test for means. Those who need a referentmay want to notice that <strong>the</strong> p value for <strong>the</strong> <strong>data</strong> from a givenexperiment relates to <strong>the</strong> uncountable times that such test has beenapplied to <strong>data</strong> from any experiment in any discipline. Calibrationof <strong>the</strong> t test ensures that a proper use with a significance level of,say, 5% will reject a true null hypo<strong>the</strong>sis on 5% of <strong>the</strong> occasions,no matter what <strong>the</strong> experimental hypo<strong>the</strong>sis is, what <strong>the</strong> variablesare, what <strong>the</strong> <strong>data</strong> are, what <strong>the</strong> experiment is about, who carriesit out, or in what research field. What a p value indicates is howtenable it is that <strong>the</strong> t statistic will attain <strong>the</strong> observed value if<strong>the</strong> null were correct, with only a trivial link to <strong>the</strong> <strong>data</strong> observedin <strong>the</strong> experiment of concern. And this only places in a precisequantitative framework <strong>the</strong> logic that <strong>the</strong> man on <strong>the</strong> street usesto judge, for instance, that getting struck by lightning four timesover <strong>the</strong> past 10 years is not something that could identically havehappened to anybody else, or that <strong>the</strong> source of a politician’s huge<strong>and</strong> untraceable earnings is not <strong>the</strong> result of allegedly winning toplottery prizes numerous times over <strong>the</strong> past couple of years. In anycase, <strong>the</strong> advantage of <strong>the</strong> frequentist approach as regards SCV isthat <strong>the</strong> probability of a Type-I or a Type-II error can be clearly <strong>and</strong>unequivocally stated, which is not to be mistaken for a statementthat a p value is <strong>the</strong> probability of a Type-I error in <strong>the</strong> currentcase, or that it is a measure of <strong>the</strong> strength of evidence against <strong>the</strong>null that <strong>the</strong> current <strong>data</strong> provide. The most prevalent problems ofp values are <strong>the</strong>ir potential for misuse <strong>and</strong> <strong>the</strong>ir widespread misinterpretation(Nickerson, 2000). But misuse or misinterpretationdo not make NHST <strong>and</strong> p values uninterpretable or worthless.Bayesian approaches are claimed to be free of <strong>the</strong>se presumedproblems, yielding a conclusion that is exclusively grounded on <strong>the</strong><strong>data</strong>. In a naive account of Bayesian hypo<strong>the</strong>sis <strong>testing</strong>, Malakoff(1999) attributes to biostatistician Steven Goodman <strong>the</strong> assertionthat <strong>the</strong> Bayesian approach “says <strong>the</strong>re is an X% probability thatyour hypo<strong>the</strong>sis is true–not that <strong>the</strong>re is some convoluted chancethat if you assume <strong>the</strong> null hypo<strong>the</strong>sis is true, you will get a similaror more extreme result if you repeated your experiment thous<strong>and</strong>sof times.” Besides being misleading <strong>and</strong> reflecting a poorunderst<strong>and</strong>ing of <strong>the</strong> logic of calibrated NHST methods, whatgoes unmentioned in this <strong>and</strong> o<strong>the</strong>r accounts is that <strong>the</strong> Bayesianpotential to find out <strong>the</strong> probability that <strong>the</strong> hypo<strong>the</strong>sis is true willnot materialize without two crucial extra pieces of information.One is <strong>the</strong> a priori probability of each of <strong>the</strong> competing hypo<strong>the</strong>ses,which certainly does not come from <strong>the</strong> <strong>data</strong>. The o<strong>the</strong>r is<strong>the</strong> probability of <strong>the</strong> observed <strong>data</strong> under each of <strong>the</strong> competinghypo<strong>the</strong>sis, which has <strong>the</strong> same origin as <strong>the</strong> frequentist p value<strong>and</strong> whose computation requires distributional assumptions thatmust necessarily take <strong>the</strong> sampling method into consideration.In practice, Bayesian hypo<strong>the</strong>sis <strong>testing</strong> generally computes BFs<strong>and</strong> <strong>the</strong> result might be stated as “<strong>the</strong> alternative hypo<strong>the</strong>sis is xtimes more likely than <strong>the</strong> null,” although <strong>the</strong> probability that thistype of statement is wrong is essentially unknown. The researchermay be content with a conclusion of this type, but how much of<strong>the</strong>se odds comes from <strong>the</strong> <strong>data</strong> <strong>and</strong> how much comes from <strong>the</strong>extra assumptions needed to compute a BF is undecipherable. Inmany cases research aims at ga<strong>the</strong>ring <strong>and</strong> analyzing <strong>data</strong> to makeinformed decisions such as whe<strong>the</strong>r application of a treatmentshould be discontinued, whe<strong>the</strong>r changes should be introducedin an educational program, whe<strong>the</strong>r daytime headlights should beenforced, or whe<strong>the</strong>r in-car use of cell phones should be forbidden.Like frequentist analyses, Bayesian approaches do not guaranteethat <strong>the</strong> decisions will be correct. One may argue that stating howmuch more likely is one hypo<strong>the</strong>sis over ano<strong>the</strong>r bypasses <strong>the</strong> decisionto reject or not reject any of <strong>the</strong>m <strong>and</strong>, <strong>the</strong>n, that Bayesianapproaches to hypo<strong>the</strong>sis <strong>testing</strong> are free of Type-I <strong>and</strong> Type-IIerrors. Although this is technically correct, <strong>the</strong> problem remainsfrom <strong>the</strong> perspective of SCV: Statistics is only a small part of aresearch process whose ultimate goal is to reach a conclusion <strong>and</strong>make a decision, <strong>and</strong> researchers are in a better position to defend<strong>the</strong>ir claims if <strong>the</strong>y can supplement <strong>the</strong>m with a statement of <strong>the</strong>probability with which those claims are wrong.Interestingly, analyses of decisions based on Bayesianapproaches have revealed that <strong>the</strong>y are no better than frequentistdecisions as regards Type-I <strong>and</strong> Type-II errors <strong>and</strong> that parametricassumptions (i.e., <strong>the</strong> choice of prior <strong>and</strong> <strong>the</strong> assumed distributionof <strong>the</strong> observations) crucially determine <strong>the</strong> performance ofBayesian methods. For instance, Bayesian estimation is also subjectto potentially large bias <strong>and</strong> lack of precision (Alcalá-Quintana <strong>and</strong>García-Pérez, 2004; García-Pérez <strong>and</strong> Alcalá-Quintana, 2007), <strong>the</strong>coverage probability of Bayesian credible intervals can be worsewww.frontiersin.org August 2012 | Volume 3 | Article 325 | 23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!