Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

More documents

Recommendations

Info

REVIEW ARTICLEpublished: 12 April 2012doi: 10.3389/fpsyg.2012.00102The assumption of a reliable instrument and other pitfallsto avoid when considering the reliability of dataKim Nimon 1 *, Linda Reichwein Zientek 2 and Robin K. Henson 31Department of Learning Technologies, University of North Texas, Denton, TX, USA2Department of Mathematics and Statistics, Sam Houston State University, Huntsville, TX, USA3Department of Educational Psychology, UNT, University of North Texas, Denton, TX, USAEdited by:Jason W. Osborne, Old DominionUniversity, USAReviewed by:Matthew D. Finkelman, TuftsUniversity, USAMartin Lages, University of Glasgow,UK*Correspondence:Kim Nimon, Department of LearningTechnologies, University of NorthTexas, 3940 North Elm Street, G150,Denton, TX 76207, USA.e-mail: kim.nimon@unt.eduThe purpose of this article is to help researchers avoid common pitfalls associated withreliability including incorrectly assuming that (a) measurement error always attenuatesobserved score correlations, (b) different sources of measurement error originate from thesame source, and (c) reliability is a function of instrumentation.To accomplish our purpose,we first describe what reliability is and why researchers should care about it with focus onits impact on effect sizes. Second, we review how reliability is assessed with comment onthe consequences of cumulative measurement error. Third, we consider how researcherscan use reliability generalization as a prescriptive method when designing their researchstudies to form hypotheses about whether or not reliability estimates will be acceptablegiven their sample and testing conditions. Finally, we discuss options that researchers mayconsider when faced with analyzing unreliable data.Keywords: reliability, measurement error, correlated errorThe vast majority of commonly used parametric statistical proceduresassume data are measured without error (Yetkiner andThompson, 2010). However, research indicates that there areat least three problems concerning application of the statisticalassumption of reliable data. First and foremost, researchersfrequently neglect to report reliability coefficients for their data(Vacha-Haase et al., 1999, 2002; Zientek et al., 2008). Presumably,these same researchers fail to consider if data are reliable andthus ignore the consequences of results based on data that areconfounded with measurement error. Second, researchers oftenreference reliability coefficients from test manuals or prior researchpresuming that the same level of reliability applies to their data(Vacha-Haase et al., 2000). Such statements ignore admonitionsfrom Henson (2001),Thompson (2003a),Wilkinson and APA TaskForce on Statistical Inference (1999), and others stating that reliabilityis a property inured to scores not tests. Third, researchers thatdo consider the reliability of their data may attempt to correctfor measurement error by applying Spearman’s (1904) correctionformula to sample data without considering how error in onevariable relates to observed score components in another variableor the true score component of its own variable (cf. Onwuegbuzieet al., 2004; Lorenzo-Seva et al., 2010). These so-callednuisance correlations, however, can seriously influence the accuracyof the statistics that have been corrected by Spearman’s formula(Wetcher-Hendricks, 2006; Zimmerman, 2007). In fact, as readerswill see, the term correction for attenuation may be considered amisnomer as unreliable data do not always produce effects that aresmaller than they would have been had data been measured withperfect reliability.PURPOSEThe purpose of this article is to help researchers avoid commonpitfalls associated with reliability including incorrectly assumingthat (a) measurement error always attenuates observed scorecorrelations, (b) different sources of measurement error originatefrom the same source, and (c) reliability is a function of instrumentation.To accomplish our purpose, the paper is organized asfollows.First, we describe what reliability is and why researchers shouldcare about it. We focus on bivariate correlation (r) and discuss howreliability affects its magnitude. [Although the discussion is limitedto r for brevity, the implications would likely extend to manyother commonly used parametric statistical procedures (e.g., t-test, analysis of variance, canonical correlation) because many are“correlational in nature” (Zientek and Thompson, 2009, p. 344)and “yield variance-accounted-for effect sizes analogous to r 2 ”(Thompson, 2000, p. 263).] We present empirical evidence thatdemonstrates why measurement error does not always attenuateobserved score correlations and why simple steps that attemptto correct for unreliable data may produce misleading results.Second, we review how reliability is commonly assessed. In additionto describing several techniques, we highlight the cumulativenature of different types of measurement error. Third, we considerhow researchers can use reliability generalization (RG) asa prescriptive method when designing their research studies toform hypotheses about whether or not reliability estimates will beacceptable given their sample and testing conditions. In additionto reviewing RG theory and studies that demonstrate that reliabilityis a function of data and not instrumentation, we reviewbarriers to conducting RG studies and propose a set of metricsto be included in research reports. It is our hope that editors willchampion the inclusion of such data and thereby broaden whatis known about the reliability of educational and psychologicaldata published in research reports. Finally, we discuss options thatresearchers may consider when faced with analyzing unreliabledata.www.frontiersin.org April 2012 | Volume 3 | Article 102 | 41
Nimon et al.The assumption of reliable dataRELIABILITY: WHAT IS IT AND WHY DO WE CARE?The predominant applied use of reliability is framed by classicaltest theory (CTT, Hogan et al., 2000) which conceptualizesobserved scores into two independent additive components: (a)true scores and (b) error scores:Observed Score (O X ) = True Score (T X ) + Error Score (E X ) (1)True scores reflect the construct of interest (e.g., depression,intelligence) while error scores reflect error in the measurement ofthe construct of interest (e.g., misunderstanding of items, chanceresponses due to guessing). Error scores are referred to as measurementerror (Zimmerman and Williams, 1977) and stem fromrandom and systematic occurrences that keep observed data fromconveying the “truth” of a situation (Wetcher-Hendricks, 2006,p.207). Systematic measurement errors are“those which consistentlyaffect an individual’s score because of some particular characteristicof the person or the test that has nothing to do with theconstruct being measured” (Crocker and Algina, 1986, p. 105).Random errors of measurement are those which “affect an individual’sscore because of purely chance happenings” (Crocker andAlgina, 1986, p. 106).The ratio between true score variance and observed score varianceis referred to as reliability. In data measured with perfectreliability, the ratio between true score variance and observed scorevariance is 1 (Crocker and Algina, 1986). However, the nature ofeducational and psychological research means that most, if not all,variables are difficult to measure and yield reliabilities less than 1(Osborne and Waters, 2002).Researchers should care about reliability as the vast majorityof parametric statistical procedures assume that sample data aremeasured without error (cf. Yetkiner and Thompson, 2010). Poorreliability even presents a problem for descriptive statistics suchas the mean because part of the average score is actually error.It also causes problems for statistics that consider variable relationshipsbecause poor reliability impacts the magnitude of thoseresults. Measurement error is even a problem in structural equationmodel (SEM) analyses, as poor reliability affects overall fitstatistics (Yetkiner and Thompson, 2010). In this article, though,we focus our discussion on statistical analyses based on observedvariable analyses because latent variable analyses are reported lessfrequently in educational and psychological research (cf. Kiefferet al., 2001; Zientek et al., 2008).Contemporary literature suggests that unreliable data alwaysattenuate observed score variable relationships (e.g., Muchinsky,1996; Henson, 2001; Onwuegbuzie et al., 2004). Such literaturestems from Spearman’s (1904) correction formula that estimatesa true score correlation (r TX T Y) by dividing an observed score correlation(r OX O Y) by the square root of the product of reliabilities(r XX r YY ):of their reliabilities:r OX O Y= r TX T Y√rXX r YY (3)Using derivatives of Eqs 2 and 3, Henson (2001) claimed, forexample, that if one variable was measured with 70% reliabilityand another variable was measured with 60% reliability, themaximum possible observed score correlation would be 0.65 (i.e.,√ 0.70 × 0.60). He similarly indicated that the observed correlationbetween two variables will only reach its theoretical maximumof 1 when (a) the reliability of the variables are perfect and (b)the correlation between the true score equivalents is equal to 1.(Readers may also consult Trafimow and Rice, 2009 for an interestingapplication of this correction formula to behavioral taskperformance via potential performance theory).The problem with the aforementioned claims is that they donot consider how error in one variable relates to observed scorecomponents in another variable or the true score component of itsown variable. In fact, Eqs 2 and 3, despite being written for sampledata, should only be applied to population data in the case whenerror does not correlate or share common variance (Zimmerman,2007), as illustrated in Figure 1. However, in the case of correlatederror in the population (see Figure 2) or in the case of sampleFIGURE 1 | Conceptual representation of observed score variance inthe case of no correlated error. T, true scores; E, error scores. Note:shaded area denotes shared variance between T X and T Y .r TX T Y=r O X O Y √rXXr YY(2)Spearman’s formula suggests that the observed score correlation issolely a function of the true score correlation and the reliability ofthe measured variables such that the observed correlation betweentwo variables can be no greater than the square root of the productFIGURE 2 | Conceptual representation of observed score variance inthe case of correlated error. T, true scores; E, error scores. Note: lightedshaded area denotes shared variance between T X and T Y . Dark shaded areadenotes shared variance between E X and E Y .Frontiers in Psychology | Quantitative Psychology and Measurement April 2012 | Volume 3 | Article 102 | 42
Page 2 and 3: FRONTIERS COPYRIGHTSTATEMENT© Copy
Page 4 and 5: Table of Contents05 Is Data Cleanin
Page 7 and 8: OsborneAssumptions and data cleanin
Page 9 and 10: ORIGINAL RESEARCH ARTICLEpublished:
Page 12 and 13: Hoekstra et al.Why assumptions are
Page 20 and 21: García-PérezStatistical conclusio
Page 30 and 31: Sheng and ShengEffect of non-normal
Page 44 and 45: Nimon et al.The assumption of relia
Page 56 and 57: TressoldiPower replication unreliab
Page 58 and 59: TressoldiPower replication unreliab
Page 62 and 63: FinchModern methods for the detecti
Page 72 and 73: MINI REVIEW ARTICLEpublished: 28 Au
Page 74 and 75: NimonStatistical assumptionsand Del
Page 76 and 77: NimonStatistical assumptionsFor exa
Page 78 and 79: Kraha et al.Interpreting multiple r
Page 92 and 93:
Kraha et al.Interpreting multiple r
Page 94 and 95:
SmithsonComparing moderation of slo
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
REVIEW ARTICLEpublished: 01 March 2
Page 104 and 105:
Flora et al.Factor analysis assumpt
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Kasper and ÜnlüAssumptions of fac
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Lages and JaworskaHow predictable a
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Cummiskey et al.Testing assumptions
Page 154 and 155:
Page 156 and 157:
Page 158:
show all

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?