Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

More documents

Recommendations

Info

TressoldiPower replication unreliabilitypower = 0.9 with α = 0.05 given the observed random ESs, wasestimated.Statistical power was calculated using the software G ∗ Power(Faul et al., 2007).COMMENTThe results are quite clear: apart from the unconscious semanticpriming for semantic categorization, where the number of participantsin a typical experiment is sufficient to obtain a statisticalpower above 0.90, for all remaining phenomena, to achieve thislevel of power, it is necessary to increase the number of participantsin a typical study, from a minimum of seven participants forthe unconscious semantic priming for lexical decision and namingto around 3400 to investigate NLP using the forced-choice withnormal state of consciousness protocol.GENERAL DISCUSSIONThe response to the question posed in the introduction, as towhether there are elusive phenomena or an elusive power to detectthem, is quite clear. If there are clear estimates of ESs from theevidence of the phenomenon derived from a sufficient numberof studies analyzed meta-analytically and their values are moderateor low, it is mandatory to increase the number of participantsto achieve a statistical power of 0.90, with the inevitable consequenceof investing more time and money into each study beforeinterpreting the results as support for reality or unreality of aphenomenon.Are there alternatives to this obligation? Yes, and we briefly illustratesome of these, also providing references for those interestedin using them.CONFIDENCE INTERVALSIn line with the statistical reform movement (i.e., Cumming, 2012),in the APA manual (American Psychological Association, APA,2010), there are the following statistical recommendations “Alternatively,(to the use of NHST) use calculations based on a chosentarget precision (confidence interval width) to determine samplesizes. Use the resulting confidence intervals to justify conclusionsconcerning ESs (e.g., that some effect is negligibly small) p. 30.”EQUIVALENCE TESTINGEquivalence tests are inferential statistics designed to provideevidence for a null hypothesis. Like effect tests, the nil–null iseschewed in equivalence testing. However unlike standard NHST,equivalence tests provide evidence that there is little differenceor effect. A significant result in an equivalence test means thatthe hypothesis that the effects or differences are substantial can berejected. Hence, equivalence tests are appropriate when researcherswant to show little difference or effect (Levine et al., 2008).EVALUATING INFORMATIVE HYPOTHESESEvaluating specific expectations directly produces more usefulresults than sequentially testing traditional null hypotheses againstcatch-all rivals. Researchers are often interested in the evaluationof informative hypotheses and already know that the traditionalnull hypothesis is an unrealistic hypothesis. This presupposes thatprior knowledge is often available; if this is not the case, testing thetraditional null hypothesis is appropriate. In most applied studies,however, prior knowledge is indeed available in the form ofspecific expectations about the ordering of statistical parameters(Kuiper and Hoijtink, 2010; Van de Schoot et al., 2011).BAYESIAN APPROACHAnother alternative is to abandon the frequentist approach anduse a Bayesian one (Wagenmakers et al., 2011). With a Bayesianapproach the problem of statistical power is substituted with parameterestimation and/or model comparison (Kruschke, 2011). Inthe first approach, assessing null values, the analyst simply setsup a range of candidate values, including the null value, and usesBayesian inference to compute the relative credibility of all thecandidate values. In the model comparison approach, the analystsets up two competing models of what values are possible.One model posits that only the null value is possible whereas thealternative model posits that a broad range of other values is alsopossible. Bayesian inference is used to compute which model ismore credible, given the data.FINAL COMMENTIs there a chance to abandon “The Null Ritual” in the near futureand to think of science as cumulative knowledge? The answer is“yes”if we approach scientific discovery thinking meta-analytically(Cumming, 2012), that is, simply reporting observed (standardized)ES and the corresponding confidence intervals, both whenNHST is refuted and when it is not refuted (Nickerson, 2000;American Psychological Association, APA, 2010) without drawingdichotomous decisions. The statistical approaches listed above aregood tools to achieve this goal.How many editors and reviewers are committed to pursuing it?ACKNOWLEDGMENTSSuggestions and comments by the reviewers were greatly appreciatedfor improving the clarity and quality of the paper. ProofReading Service revised the English.REFERENCESAlcock, J. E. (2003). Give the nullhypothesis a chance: reasons toremain doubtful about the existenceof PSI. J. Conscious. Stud. 10, 29–50.American Psychological Association.(2010). Publication manual of theAmerican Psychological Association,6th Edn, Washington, DC: AmericanPsychological Association.Bezeau, S., and Graves, R. (2001). Statisticalpower and effect sizes of clinicalneuropsychology research. J. Clin.Exp. Neuropsychol. 23, 399–406.Bouwmeester, D., Pan, J. W., Mattle,K., Eibl, M., Weinfurter, H., andZeilinger, A. (1997). Experimentalquantum teleportation. Nature 390,575–579.Busemeyer, J. R., Pothos, E. M., Franco,R., and Trueblood, J. S. (2011). Aquantum theoretical explanation forprobability judgment errors. Psychol.Rev. 118, 2, 193–218.Cohen, J. (1992). A power primer. Psychol.Bull. 112, 1,155–1,159.Cohen, J. (1994). The earth isround (p < .085). Am. Psychol.49, 997–1003.Cumming, G. (2012). Understanding theNew Statistics: Effect Sizes, ConfidenceIntervals, and Meta-Analysis.New York: Routledge.Dijksterhuis, A., Bos, M. W., Nordgren,L. F., and Van Baaren, R. B.(2006). On making the right choice:the deliberation-without attentioneffect. Science 311, 1005–1007.Faul, F., Erdfelder, E., Lang, A.-G., andBuchner, A. (2007). G∗Power 3:a flexible statistical power analysisprogram for the social, behavioral,and biomedical sciences. Behav. Res.Methods 39, 175–191.Genovese, M. (2005). Research on hiddenvariable theories, a review ofrecent progresses. Phys. Rep. 413,319–396.Frontiers in Psychology | Quantitative Psychology and Measurement July 2012 | Volume 3 | Article 218 | 57
TressoldiPower replication unreliabilityGenovese, M. (2010). Interpretationsof quantum mechanics and measurementproblem. Adv. Sci. Lett. 3,249–258.Gigerenzer, G., Krauss, S., and Vitouch,O. (2004). “The null ritual whatyou always wanted to know aboutsignificance testing but were afraidto ask,” in The Sage Handbookof Quantitative Methodology forthe Social Sciences, ed. D. Kaplan(Thousand Oaks, CA: Sage),391–408.Gutiérrez, R., Caetano, R.,Woiczikowski, P. B., Kubar, T.,Elstner, M., and Cuniberti, G.(2010). Structural fluctuationsand quantum transport throughDNA molecular wires, a combinedmolecular dynamics and modelHamiltonian approach. New J. Phys.12, 023022.Kennedy, J. E. (2001). Why is PSY so elusive?A review and proposed model.J. Parapsychol. 65, 219–246.Khrennikov, A. Y. (2010). UbiquitousQuantum Structure from Psychologyto Finance. Berlin: Springer-Verlag.Kline, R. B. (2004). Beyond SignificanceTesting. Reforming Data AnalysisMethods in Behavioral Research.Washington, DC: APA.Kruschke, J. (2011). Bayesian assessmentof null values via parameterestimation and model comparison.Perspect. Psychol. Sci. 6, 299–312.Kuiper, R. M., and Hoijtink, H.(2010). Comparisons of meansusing exploratory and confirmatoryapproaches. Psychol. Methods 15,69–86.Levine, T. R., Weber, R., Sun Park, H.,and Hullett, C. R. (2008). A communicationresearchers’ guide to nullhypothesis significance testing andalternatives. Hum. Commun. Res. 34,188–209.Maxwell, S. E. (2004). The persistence ofunderpowered studies in psychologicalresearch: causes, consequences,and remedies. Psychol. Methods 9,147–163.Milton, J. (1997). Meta-analysis of freeresponseESP studies without alteredstates of consciousness. J. Parapsychol.61, 279–319.Nickerson, R. S. (2000). Null hypothesissignificance testing: a review ofan old and continuing controversy.Psychol. Methods 5, 241–301.Pratte, M. S., and Rouder, J. N. (2009). Atask-difficulty artifact in subliminalpriming. Atten. Percept. Psychophys.71,1276–1283.Richard, F. D., Bond, C. F., and Stokes-Zoota, J. J. (2003). One hundredyears of social psychology quantitativelydescribed. Rev. Gen. Psychol. 7,331–363.Schmidt, S. (2009). Shall we reallydo it again? The powerful conceptof replication is neglected in thesocial sciences. Rev. Gen. Psychol. 13,90–100.Sedlmeier, P., and Gigerenzer, G. (1989).Do studies of statistical power havean effect on the power of studies?Psychol. Bull. 105, 309–316.Sio, U. N., and Ormerod, T. C. (2009).Does incubation enhance problemsolving? A meta-analytic review. Psychol.Bull. 135, 94–120.Storm, L., Tressoldi, P. E., and Di Risio,L. (2010). Meta-analysis of freeresponsestudies, 1992–2008: assessingthe noise reduction model inparapsychology. Psychol. Bull. 136,471–485.Storm, L., Tressoldi, P. E., and Di Risio,L. (in press). Meta-analysis of ESPstudies, 1987–2010, assessing thesuccess of the forced-choice designin parapsychology.Strick, M., Dijksterhuis, A., Bos, M. W.,Sjoerdsma, A., van Baaren, R. B.,and Nordgren, L. F. (2011). A metaanalysison unconscious thoughteffects. Soc. Cogn. 29, 738–762.Valentine, J. C., Pigott, T. D., and Rothstein,H. R. (2009). How manystudies to you need? A primeron statistical power for metaanalysis.J. Educ. Behav. Stat. 35,215–247.Van de Schoot, R., Hoijtink, H.,and Jan-Willem, R. (2011). Movingbeyond traditional null hypothesistesting: evaluating expectationsdirectly. Front. Psychol. 2:24.doi:10.3389/fpsyg.2011.00024Van den Bussche, E., den Noortgate,W., and Reynvoet, B. (2009). Mechanismsof masked priming: a metaanalysis.Psychol. Bull. 135, 452–477.von Lucadou, W., Römer, H., andWAlach, H. (2007). Synchronisticphenomena as entanglement correlationsin generalized quantumtheory. J. Conscious. Stud. 14, 4,50–74.Wagenmakers, E. J., Wetzels, R., Borsboom,D., and Van der Maas, H.(2011). Why psychologists mustchange the way they analyze theirdata: the case of psi. J. Pers. Soc.Psychol. 100, 426–432.Walach, H., and von Stillfried, N.(2011). Generalised quantum theory.Basic idea and general intuition:a background story and overview.Axiomathes 21, 185–209.Waroquier, L., Marchiori, D., Klein, O.,and Cleeremans, A. (2009). Methodologicalpitfalls of the unconsciousthought paradigm. Judgm. Decis.Mak. 4, 601–610.Conflict of Interest Statement: Theauthor declares that the research wasconducted in the absence of any commercialor financial relationships thatcould be construed as a potential conflictof interest.Received: 13 April 2012; accepted: 12 June2012; published online: 04 July 2012.Citation: Tressoldi PE (2012) Replicationunreliability in psychology: elusivephenomena or “elusive” statisticalpower? Front. Psychology 3:218. doi:10.3389/fpsyg.2012.00218This article was submitted to Frontiersin Quantitative Psychology and Measurement,a specialty of Frontiers in Psychology.Copyright © 2012 Tressoldi. This is anopen-access article distributed under theterms of the Creative Commons AttributionLicense, which permits use, distributionand reproduction in other forums,provided the original authors and sourceare credited and subject to any copyrightnotices concerning any third-partygraphics etc.www.frontiersin.org July 2012 | Volume 3 | Article 218 | 58
Page 2 and 3:
FRONTIERS COPYRIGHTSTATEMENT© Copy
Page 4 and 5:
Table of Contents05 Is Data Cleanin
Page 7 and 8: OsborneAssumptions and data cleanin
Page 9 and 10: ORIGINAL RESEARCH ARTICLEpublished:
Page 12 and 13: Hoekstra et al.Why assumptions are
Page 20 and 21: García-PérezStatistical conclusio
Page 30 and 31: Sheng and ShengEffect of non-normal
Page 42 and 43: REVIEW ARTICLEpublished: 12 April 2
Page 44 and 45: Nimon et al.The assumption of relia
Page 56 and 57: TressoldiPower replication unreliab
Page 62 and 63: FinchModern methods for the detecti
Page 72 and 73: MINI REVIEW ARTICLEpublished: 28 Au
Page 74 and 75: NimonStatistical assumptionsand Del
Page 76 and 77: NimonStatistical assumptionsFor exa
Page 78 and 79: Kraha et al.Interpreting multiple r
Page 94 and 95: SmithsonComparing moderation of slo
Page 102 and 103: REVIEW ARTICLEpublished: 01 March 2
Page 104 and 105: Flora et al.Factor analysis assumpt
Page 106 and 107: Flora et al.Factor analysis assumpt
Page 108 and 109:
Flora et al.Factor analysis assumpt
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Kasper and ÜnlüAssumptions of fac
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Lages and JaworskaHow predictable a
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Cummiskey et al.Testing assumptions
Page 154 and 155:
Page 156 and 157:
Page 158:
show all

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?