Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

More documents

Recommendations

Info

Nimon et al.The assumption of reliable dataAPPENDIXEXAMPLE WRITE-UP FOR SAMPLE, INSTRUMENT, AND RESULT SECTIONSSampleA convenience sample of 420 students (200 fifth graders, 220 sixth graders) were from a suburban public intermediate school in thesouthwest of the United States and included 190 Whites (100 males, 90 females; 135 regular education, 55 gifted education), 105 Blacks(55 males, 50 females; 83 regular education, 22 gifted education), 95 Hispanics (48 males, 47 females; 84 regular education, 11 giftededucation), 18 Asians (9 males, 9 females; 13 regular education, 5 gifted education), and 12 Others (5 males, 7 females; 10 regulareducation, 2 gifted education). The school consisted of 45% of students in high-poverty as defined by number of students on freelunch. None of the students were in special education. Parental and/or student consent was obtained by 94% of the students, providinga high response rate.INSTRUMENTMarat (2005) included an instrument that contained several predictors of self-efficacy (see Pintrich et al., 1991; Bandura, 2006). Inthe present study, five constructs were included: Motivation Strategies (MS: 5 items); Cognitive Strategies (CS; 15 items); ResourceManagement Strategies (MS; 12 items); Self-Regulated Learning (SRL; 16 items); and Self-Assertiveness (SA; 6 items). No modificationswere made to the items or the subscales but only five of the subscales from the original instrument listed above were administered. TheEnglish version of the instrument was administered via paper to students by the researchers during regular class time and utilized afive-point Likert scale anchored from 1 (not well) to 5 (very well). Composite scores were created for each construct by averaging theitems for each subscale.RESULTSCoefficient alpha was calculated for the data in hand resulting in acceptable levels of reliability for MS (0.82, 0.84), CS (0.85, 0.84), RMS(0.91, 0.83), SRL (0.84, 86), and SA (0.87, 0.83), fall and spring, respectively (Thompson, 2003a).GIFTED AND REGULAR STUDENTSTable A1 provides the reliability coefficients, bivariate correlations, means, SD for each factor disaggregated by gifted and regularstudents.Table A1 | Bivariate correlations, means, SD, and reliability coefficient disaggregated by gifted andregular education students.α, Coefficient alpha; M, mean; SD, standard deviation. Bivariate correlations below the diagonal are for Gifted studentsand above the diagonal are for Regular education students. MS, motivation strategies; CS, cognitive strategies; RMS,resource management strategies; SRL, self-regulated learning; SA, self-assertiveness.www.frontiersin.org April 2012 | Volume 3 | Article 102 | 53
PERSPECTIVE ARTICLEpublished: 04 July 2012doi: 10.3389/fpsyg.2012.00218Replication unreliability in psychology: elusive phenomenaor “elusive” statistical power?Patrizio E. Tressoldi*Dipartimento di Psicologia Generale, Università di Padova, Padova, ItalyEdited by:Jason W. Osborne, Old DominionUniversity, USAReviewed by:Fiona Fidler, University of Melbourne,AustraliaDarrell Hull, University North Texas,USADonald Sharpe, University of Regina,Canada*Correspondence:Patrizio E. Tressoldi, Dipartimento diPsicologia Generale, Università diPadova, Padova, Italy.e-mail: patrizio.tressoldi@unipd.itThe focus of this paper is to analyze whether the unreliability of results related to certaincontroversial psychological phenomena may be a consequence of their low statisticalpower. Applying the Null Hypothesis StatisticalTesting (NHST), still the widest used statisticalapproach, unreliability derives from the failure to refute the null hypothesis, in particularwhen exact or quasi-exact replications of experiments are carried out. Taking as examplethe results of meta-analyses related to four different controversial phenomena, subliminalsemantic priming, incubation effect for problem solving, unconscious thought theory,and non-local perception, it was found that, except for semantic priming on categorization,the statistical power to detect the expected effect size (ES) of the typical study, is low orvery low. The low power in most studies undermines the use of NHST to study phenomenawith moderate or low ESs. We conclude by providing some suggestions on how toincrease the statistical power or use different statistical approaches to help discriminatewhether the results obtained may or may not be used to support or to refute the reality ofa phenomenon with small ES.Keywords: incubation effect, non-local perception, power, subliminal priming, unconscious thought theoryINTRODUCTIONARE THERE ELUSIVE PHENOMENA OR IS THERE AN “ELUSIVE” POWERTO DETECT THEM?When may a phenomenon be considered to be real or veryprobable, following the rules of current scientific methodology?Among the many requirements, there is a substantial consensusthat replication is one of the more fundamental (Schmidt,2009). In other words, a phenomenon may be considered realor very probable when it has been observed many times andpreferably by different people or research groups. Whereas afailure to replicate is quite expected in the case of conceptualreplication, or when the experimental procedure or materialsentail relevant modifications, a failure in the case of an exactor quasi-exact replication, give rise to serious concerns aboutthe reality of the phenomenon under investigation. This is thecase in the four phenomena used as examples in this paper,namely, semantic subliminal priming, incubation effects on problemsolving,unconscious thought,and non-local perception (NLP;e.g., Kennedy, 2001; Pratte and Rouder, 2009; Waroquier et al.,2009).The focus of this paper is to demonstrate that for all phenomenawith a moderate or small effect size (ES), approximately below 0.5if we refer to standardized differences such as Cohen’s d, the typicalstudy shows a power level insufficient to detect the phenomenonunder investigation.Given that the majority of statistical analyses are based on theNull Hypothesis Statistical Testing (NHST) frequentist approach,their failure is determined by the rejection of the (nil) null hypothesisH 0 , usually setting α < 0.05. Even if this procedure is consideredincorrect because the frequentist approach only supports H 0rejection and not H 0 validity 1 , it may be tolerated if there is proofof a high level of statistical power as recommended in the recentAPA statistical recommendations [American Psychological Association,APA (2010)]: “Power: When applying inferential statistics,take seriously the statistical power considerations associated withthe tests of hypotheses. Such considerations relate to the likelihoodof correctly rejecting the tested hypotheses, given a particularalpha level, ES, and sample size. In that regard, routinely provideevidence that the study has sufficient power to detect effects of substantiveinterest. Be similarly careful in discussing the role playedby sample size in cases in which not rejecting the null hypothesisis desirable (i.e., when one wishes to argue that there are nodifferences), when testing various assumptions underlying the statisticalmodel adopted (e.g., normality, homogeneity of variance,homogeneity of regression), and in model fitting. pag. 30.”HOW MUCH POWER?Statistical power depends on three classes of parameters: (1) thesignificance level (i.e., the Type I error probability) of the test, (2)the size(s) of the sample(s) used for the test, and (3) an ES parameterdefining H 1 and thus indexing the degree of deviation fromH 0 in the underlying population.Power analysis should be used prospectively to calculate theminimum sample size required so that one can reasonably detectan effect of a given size. Power analysis can also be used to calculate1 The line of reasoning from “the null hypothesis is false” to “the theory is thereforetrue” involves the logical fallacy of affirming the consequent: “If the theory is true,the null hypothesis will prove to be false. The null hypothesis proved to be false;therefore, the theory must be true” (Nickerson, 2000).www.frontiersin.org July 2012 | Volume 3 | Article 218 | 54
Page 2 and 3:
FRONTIERS COPYRIGHTSTATEMENT© Copy
Page 4 and 5: Table of Contents05 Is Data Cleanin
Page 7 and 8: OsborneAssumptions and data cleanin
Page 9 and 10: ORIGINAL RESEARCH ARTICLEpublished:
Page 12 and 13: Hoekstra et al.Why assumptions are
Page 20 and 21: García-PérezStatistical conclusio
Page 30 and 31: Sheng and ShengEffect of non-normal
Page 42 and 43: REVIEW ARTICLEpublished: 12 April 2
Page 44 and 45: Nimon et al.The assumption of relia
Page 56 and 57: TressoldiPower replication unreliab
Page 58 and 59: TressoldiPower replication unreliab
Page 62 and 63: FinchModern methods for the detecti
Page 72 and 73: MINI REVIEW ARTICLEpublished: 28 Au
Page 74 and 75: NimonStatistical assumptionsand Del
Page 76 and 77: NimonStatistical assumptionsFor exa
Page 78 and 79: Kraha et al.Interpreting multiple r
Page 94 and 95: SmithsonComparing moderation of slo
Page 102 and 103: REVIEW ARTICLEpublished: 01 March 2
Page 104 and 105:
Flora et al.Factor analysis assumpt
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Kasper and ÜnlüAssumptions of fac
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Lages and JaworskaHow predictable a
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Cummiskey et al.Testing assumptions
Page 154 and 155:
Page 156 and 157:
Page 158:
show all

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?