Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

More documents

Recommendations

Info

García-PérezStatistical conclusion validityunknown alterations of Type-I and Type-II error rates and, hence,a breach of SCV.Some authors suggest that the problem can be solved by replacingthe formal test of assumptions with a decision based on asuitable graphical display of the data that helps researchers judgeby eye whether the assumption is tenable. It should be emphasizedthat the problem still remains, because the decision on how to analyzethe data is conditioned on the results of a preliminary analysis.The problem is not brought about by a formal preliminary test,but by the conditional approach to data analysis. The use of a nonformalpreliminary test only prevents a precise investigation of theconsequences on Type-I and Type-II error rates. But the “out ofsight, out of mind” philosophy does not eliminate the problem.It thus seems that a researcher must make a choice between twoevils: either not testing assumptions (and, thus, threatening SCVas a result of the uncontrolled Type-I and Type-II error rates thatarise from a potentially undue application of the statistical test) ortesting them (and, then, also losing control of Type-I and Type-II error rates owing to the two-stage approach). Both approachesare inadequate, as applying non-robust statistical tests to data thatdo not satisfy the assumptions has generally as severe implicationson SCV as testing preliminary assumptions in a two-stageapproach. One of the solutions to the dilemma consists of switchingto statistical procedures that have been designed for use underthe two-stage approach. For instance, Albers et al. (2000) usedsecond-order asymptotics to derive the size and power of a twostagetest for independent means preceded by a test of equality ofvariances. Unfortunately, derivations of this type are hard to carryout and, hence, they are not available for most of the cases of interest.A second solution consists of using classical test statistics thathave been shown to be robust to violation of their assumptions.Indeed, dependable unconditional tests for means or for regressionparameters have been identified (see Sullivan and D’Agostino,1992; Lumley et al., 2002; Zimmerman, 2004, 2011; Hayes and Cai,2007; Ng and Wilcox, 2011). And a third solution is switching tomodern robust methods (see, e.g., Wilcox and Keselman, 2003;Keselman et al., 2004; Wilcox, 2006; Erceg-Hurn and Mirosevich,2008; Fried and Dehling, 2011).Avoidance of the two-stage approach in either of these wayswill restore SCV while observing the important requirement thatstatistical methods should be used whose assumptions are notviolated by the characteristics of the data.REGRESSION AS A MEANS TO INVESTIGATE BIVARIATERELATIONS OF ALL TYPESCorrelational methods define one of the branches of scientific psychology(Cronbach, 1957) and they are still widely used these daysin some areas of psychology. Whether in regression analyses orin latent variable analyses (Bollen, 2002), vast amounts of dataare subjected to these methods. Regression analyses rely on anassumption that is often overlooked in psychology, namely, thatthe predictor variables have fixed values and are measured withouterror. This assumption, whose validity can obviously be assessedwithout recourse to any preliminary statistical test, is listed in allstatistics textbooks.In some areas of psychology, predictors actually have thischaracteristic because they are physical variables defining themagnitude of stimuli, and any error with which these magnitudesare measured (or with which stimuli with the selected magnitudesare created) is negligible in practice. Among others, this is thecase in psychophysical studies aimed at estimating psychophysicalfunctions describing the form of the relation between physicalmagnitude and perceived magnitude (e.g., Green, 1982) or psychometricfunctions describing the form of the relation betweenphysical magnitude and performance in a detection, discrimination,or identification task (Armstrong and Marks, 1997; Saberiand Petrosyan, 2004; García-Pérez et al., 2011). Regression or analogousmethods are typically used to estimate the parameters ofthese relations, with stimulus magnitude as the independent variableand perceived magnitude (or performance) as the dependentvariable. The use of regression in these cases is appropriate becausethe independent variable has fixed values measured without error(or with a negligible error). Another area in which the use of regressionis permissible is in simulation studies on parameter recovery(García-Pérez et al., 2010), where the true parameters generatingthe data are free of measurement error by definition.But very few other predictor variables used in psychology meetthis requirement, as they are often test scores or performance measuresthat are typically affected by non-negligible and sometimeslarge measurement error. This is the case of the proportion of hitsand the proportion of false alarms in psychophysical tasks, whosetheoretical relation is linear under some signal detection models(DeCarlo, 1998) and, thus, suggests the use of simple linearregression to estimate its parameters. Simple linear regression isalso sometimes used as a complement to statistical tests of equalityof means in studies in which equivalence or agreement is assessed(e.g., Maylor and Rabbitt, 1993; Baddeley and Wilson, 2002), andin these cases equivalence implies that the slope should not differsignificantly from unity and that the intercept should not differsignificantly from zero. The use of simple linear regression is alsowidespread in priming studies after Greenwald et al. (1995; see alsoDraine and Greenwald, 1998), where the intercept (and sometimesthe slope) of the linear regression of priming effect on detectabilityof the prime are routinely subjected to NHST.In all the cases just discussed and in many others where theX variable in the regression of Y on X is measured with error,a study of the relation between X and Y through regression isinadequate and has serious consequences on SCV. The least ofthese problems is that there is no basis for assigning the roles ofindependent and dependent variable in the regression equation(as a non-directional relation exists between the variables, oftenwithout even a temporal precedence relation), but regression parameterswill differ according to how these roles are assigned. Ininfluential papers of which most researchers in psychology seemto be unaware, Wald (1940) and Mandansky (1959) distinguishedregression relations from structural relations, the latter reflectingthe case in which both variables are measured with error. Bothauthors illustrated the consequences of fitting a regression linewhen a structural relation is involved and derived suitable estimatorsand significance tests for the slope and intercept parametersof a structural relation. This topic was brought to the attentionof psychologists by Isaac (1970) in a criticism of Treisman andWatts’ (1966) use of simple linear regression to assess the equivalenceof two alternative estimates of psychophysical sensitivity (d ′www.frontiersin.org August 2012 | Volume 3 | Article 325 | 21
García-PérezStatistical conclusion validitymeasures from signal detection theory analyses). The differencebetween regression and structural relations is briefly mentioned inpassing in many elementary books on regression,the issue of fittingstructural relations (sometimes referred to as Deming’s regressionor the errors-in-variables regression model) is addressed in detail inmost intermediate and advance books on regression (e.g., Fuller,1987; Draper and Smith, 1998) and hands-on tutorials have beenpublished (e.g., Cheng and Van Ness, 1994; Dunn and Roberts,1999; Dunn, 2007). But this type of analysis is not in the toolboxof the average researcher in psychology 1 . In contrast, recourse tothis type analysis is quite common in the biomedical sciences.Use of this commendable method may generalize whenresearchers realize that estimates of the slope β and the intercept αof a structural relation can be easily computed throughˆβ =√ ( 2Sy 2 − λS2 x + Sy x) 2 − λS2 + 4λS 2 xy2S xy, (1)ˆα = Ȳ − ˆβ ¯X, (2)where ¯X, Ȳ , Sx 2, S2 y , and S xy are the sample means, variances, andcovariance of X and Y, and λ = σε 2 y/σε 2 xis the ratio of the variancesof measurement errors in Y and in X. When X and Y arethe same variable measured at different times or under differentconditions (as in Maylor and Rabbitt, 1993; Baddeley and Wilson,2002), λ = 1 can safely be assumed (for an actual application, seeSmith et al., 2004). In other cases, a rough estimate can be used,as the estimates of α and β have been shown to be robust exceptunder extreme departures of the guesstimated λ from its true value(Ketellapper, 1983).For illustration, consider Yeshurun et al. (2008) comparison ofsignal detection theory estimates of d ′ in each of the intervals of atwo alternative forced-choice task, which they pronounced differentas revealed by a regression analysis through the origin. Notethat this is the context in which Isaac (1970) had illustrated theinappropriateness of regression. The data are shown in Figure 1,and Yeshurun et al. rejected equality of d 1 ′ and d′ 2 because theregression slope through the origin (red line, whose slope is 0.908)differed significantly from unity: The 95% confidence interval forthe slope ranged between 0.844 and 0.973. Using Eqs 1 and 2,the estimated structural relation is instead given by the blue linein Figure 1. The difference seems minor by eye, but the slope ofthe structural relation is 0.963, which is not significantly differentfrom unity (p = 0.738, two-tailed; see Isaac, 1970, p. 215). Thisoutcome, which reverses a conclusion raised upon inadequate dataanalyses, is representative of other cases in which the null hypothesisH 0 : β = 1 was rejected. The reason is dual: (1) the slope ofa structural relation is estimated with severe bias through regression(Riggs et al., 1978; Kalantar et al., 1995; Hawkins, 2002) and1 SPSS includes a regression procedure called “two-stage least squares” which onlyimplements the method described by Mandansky (1959) as “use of instrumentalvariables” to estimate the slope of the relation between X and Y. Use of this methodrequires extra variables with specific characteristics (variables which may simply notbe available for the problem at hand) and differs meaningfully from the simpler andmore generally applicable method to be discussed nextdˆ’ 25432100 1 2 3 4 5dˆ’ 1FIGURE 1 | Replot of data from Yeshurun et al. (2008, their Figure 8)with their fitted regression line through the origin (red line) and afitted structural relation (blue line). The identity line is shown withdashed trace for comparison. For additional analyses bearing on the SCV ofthe original study, see García-Pérez and Alcalá-Quintana (2011).(2) regression-based statistical tests of H 0 : β = 1 render empiricalType-I error rates that are much higher than the nominal ratewhen both variables are measured with error (García-Pérez andAlcalá-Quintana, 2011).In sum, SCV will improve if structural relations instead ofregression equations were fitted when both variables are measuredwith error.CONCLUSIONType-I and Type-II errors are essential components of the statisticaldecision theory underlying NHST and, therefore, data cannever be expected to answer a research question unequivocally.This paper has promoted a view of SCV that de-emphasizes considerationof these unavoidable errors and considers instead twoalternative issues: (1) whether statistical tests are used that matchthe research design, goals of the study, and formal characteristicsof the data and (2) whether they are applied in conditions underwhich the resultant Type-I and Type-II error rates match thosethat are declared as limiting the validity of the conclusion. Someexamples of common threats to SCV in these respects have beendiscussed and simple and feasible solutions have been proposed.For reasons of space, another threat to SCV has not been coveredin this paper, namely, the problems arising from multiple testing(i.e., in concurrent tests of more than one hypothesis). Multipletesting is commonplace in brain mapping studies and some implicationson SCV have been discussed, e.g., by Bennett et al. (2009),Vul et al. (2009a,b), and Vecchiato et al. (2010).All the discussion in this paper has assumed the frequentistapproach to data analysis. In closing, and before commenting onFrontiers in Psychology | Quantitative Psychology and Measurement August 2012 | Volume 3 | Article 325 | 22
Page 2 and 3: FRONTIERS COPYRIGHTSTATEMENT© Copy
Page 4 and 5: Table of Contents05 Is Data Cleanin
Page 7 and 8: OsborneAssumptions and data cleanin
Page 9 and 10: ORIGINAL RESEARCH ARTICLEpublished:
Page 12 and 13: Hoekstra et al.Why assumptions are
Page 20 and 21: García-PérezStatistical conclusio
Page 30 and 31: Sheng and ShengEffect of non-normal
Page 42 and 43: REVIEW ARTICLEpublished: 12 April 2
Page 44 and 45: Nimon et al.The assumption of relia
Page 56 and 57: TressoldiPower replication unreliab
Page 58 and 59: TressoldiPower replication unreliab
Page 62 and 63: FinchModern methods for the detecti
Page 72 and 73:
MINI REVIEW ARTICLEpublished: 28 Au
Page 74 and 75:
NimonStatistical assumptionsand Del
Page 76 and 77:
NimonStatistical assumptionsFor exa
Page 78 and 79:
Kraha et al.Interpreting multiple r
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
SmithsonComparing moderation of slo
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
REVIEW ARTICLEpublished: 01 March 2
Page 104 and 105:
Flora et al.Factor analysis assumpt
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Kasper and ÜnlüAssumptions of fac
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Lages and JaworskaHow predictable a
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Cummiskey et al.Testing assumptions
Page 154 and 155:
Page 156 and 157:
Page 158:
show all

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Create successful ePaper yourself

Delete template?

Save as template?