Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

More documents

Recommendations

Info

NimonStatistical assumptionsand Delaney, 2004). Others (e.g., Howell, 1992; Harris, 2001) indicatethat the validity of statistical conclusions depends only onwhether data meet distributional assumptions not on the scalingprocedures used to obtain data (Warner, 2008). Because measurementlevel plays a pivotal role in statistical analyses decisiontrees (e.g., Tabachnick and Fidell, 2001, pp. 27–29), Table 1 relatesmeasurement level to statistical analyses from a pragmatic perspective.It is important to note that lowering the measurement levelof data (e.g., dichotomizing intervally scaled data) is ill-advisedunless data meet certain characteristics (e.g., multiple modes, seriousskewness; Kerlinger, 1986). Although such a transformationmakes data congruent with statistics that assume only the nominalmeasurement level, it discards important information and mayproduce misleading or erroneous information (see Thompson,1988).NORMALITYFor many inferential statistics reported in educational and psychologicalresearch (cf. Kieffer et al., 2001; Zientek et al., 2008), thereis an assumption that population data are normally distributed.The requirement for, type of, and loci of normality assumptiondepend on the analysis conducted. Univariate group comparisontests (t tests, ANOVA, ANCOVA) assume univariate normality(Warner, 2008). Simple linear regression assumes bivariate normality(Warner, 2008). Multivariate analyses (repeated measuresANOVA, MANOVA, MANCOVA, multiple linear regression, andcanonical correlation) assume multivariate normality (cf. Tabachnickand Fidell, 2001; Stevens, 2002). For analysis-of-variance typetests (OVA-type tests) involving multiple samples, the normalityassumption applies to each level of the IV.UNIVARIATEThe assumption of univariate normality is met when a distributionof scores is symmetrical and when there is an appropriateproportion of distributional height to width (Thompson, 2006).To assess univariate normality, researchers may conduct graphicalor non-graphical tests (Stevens, 2002): Non-graphical tests includethe chi-square goodness of fit test, the Kolmogorov–Smirnov test,the Shapiro–Wilks test, and the evaluation of kurtosis and skewnessvalues. Graphical tests include the normality probability plotand the histogram (or stem-and-leave plot).Non-graphical tests are preferred for small to moderate samplesizes, with the Shapiro–Wilks test and the evaluation of kurtosisand skewness values being preferred methods for sample sizesof less than 20 (Stevens, 2002). The normal probability plot inwhich observations are ordered in increasing degrees of magnitudeand then plotted against expected normal distribution valuesis preferred over histograms (or stem-and leave plots). Evaluatingnormality by examining the shape of histogram scan beproblematic (Thompson, 2006), because there are infinitely differentdistribution shapes that may be normal (Bump, 1991). Thebell-shaped distribution that many educational professionals arefamiliar with is not the only normal distribution (Henson, 1999).BIVARIATEThe assumption of bivariate normality is met when the linear relationshipbetween two variables follows a normal distribution (Burdenski,2000). A necessary but insufficient condition for bivariatenormality is univariate normality for each variable. Bivariate normalitycan be evaluated graphically (e.g., scatterplots). However, inpractice, even large datasets (n > 200) have insufficient data pointsto evaluate bivariate normality (Warner, 2008) which may explainwhy this assumption often goes unchecked and unreported.MULTIVARIATEThe assumption of multivariate normality is met when each variablein a set is normally distributed around fixed values on all othervariables in the set (Henson, 1999). Necessary but insufficient conditionsfor multivariate normality include univariate normalityfor each variable along with bivariate normality for each variablepair. Multivariate normality can be assessed graphically or withstatistical tests.To assess multivariate normality graphically, a scatterplot ofMahalanobis distances and paired χ 2 -values may be examined,where Mahalanobis distance indicates how far each “set of scoresis from the group means adjusting for correlation of the variables”(Burdenski, 2000, p. 20). If the plot approximates a straight-line,data are considered multivariate normal. Software to produce theMahalanobis distance by χ 2 scatterplot can be found in Thompson(1990); Henson (1999), and Fan (1996).Researchers may also assess multivariate normality by testingMardia’s (1985) coefficient of multivariate kurtosis and examiningits critical ratio. If the critical ratio of Mardia’s coefficient ofmultivariate kurtosis is less than 1.96, a sample can be consideredmultivariate normal at the 0.05 significance level, indicating thatthe multivariate kurtosis is not statistically significantly differentthan zero. Mardia’s coefficient of multivariate kurtosis is availablein statistical software packages including AMOS, EQS, LISREL,and PASW (see DeCarlo, 1997).VIOLATIONSThe effect of violating the assumption of normality depends onthe level of non-normality and the statistical test examined. Aslong the assumption of normality is not severely violated, theactual Type I error rates approximate nominal rates for t testsand OVA-tests (cf. Boneau, 1960; Glass et al., 1972; Stevens, 2002).However, in the case of data that are severely platykurtic, poweris reduced in t tests and OVA-type tests (cf. Boneau, 1960; Glasset al., 1972; Stevens, 2002). Non-normal variables that are highlyskewed or kurtotic distort relationships and significance tests inlinear regression (Osborne and Waters, 2002). Similarly, properinferences regarding statistical significance tests in canonical correlationdepend on multivariate normality (Tabachnick and Fidell,2001). If the normality assumption is violated, researchers maydelete outlying cases, transform data, or conduct non-parametrictests (see Conover, 1999; Osborne, 2012), as long as the process isclearly reported.LINEARITYFor parametric statistics involving two or more continuous variables(ANCOVA, repeated measures ANOVA, MANOVA, MAN-COVA, linear regression, and canonical correlation) linearitybetween pairs of continuous variables is assumed (cf. Tabachnickand Fidell, 2001; Warner, 2008). The assumption of linearity isthat there is a straight-line relationship between two variables.www.frontiersin.org August 2012 | Volume 3 | Article 322 | 73
NimonStatistical assumptionsLinearity is important in a practical sense because Pearson’s r,which is fundamental to the vast majority of parametric statisticalprocedures (Graham, 2008), captures only the linear relationshipamong variables (Tabachnick and Fidell, 2001). Pearson’s runderestimates the true relationship between two variables that isnon-linear (i.e., curvilinear; Warner, 2008).Unless there is strong theory specifying non-linear relationships,researchers may assume linear relationships in their data(Cohen et al., 2003). However, linearity is not guaranteed andshould be validated with graphical methods (see Tabachnickand Fidell, 2001). Non-linearity reduces the power of statisticaltests such as ANCOVA, MANOVA, MANCOVA, linear regression,and canonical correlation (Tabachnick and Fidell, 2001).In the case of ANCOVA and MANCOVA, non-linearity resultsin improper adjusted means (Stevens, 2002). If non-linearity isdetected, researchers may transform data, incorporate curvilinearcomponents, eliminate the variable producing non-linearity,or conduct a non-linear analysis (cf. Tabachnick and Fidell, 2001;Osborne and Waters, 2002; Stevens, 2002; Osborne, 2012), as longas the process is clearly reported.VARIANCEAcross parametric statistical procedures commonly used in quantitativeresearch, at least five assumptions relate to variance. Theseare: homogeneity of variance, homogeneity of regression, sphericity,homoscedasticity, and homogeneity of variance-covariancematrix.Homogeneity of variance applies to univariate group analyses(independent samples t test,ANOVA,ANCOVA) and assumes thatthe variance of the DV is roughly the same at all levels of the IV(Warner, 2008). The Levene’s test validates this assumption, wheresmaller statistics indicate greater homogeneity. Research (Boneau,1960; Glass et al., 1972) indicates that univariate group analysesare generally robust to moderate violations of homogeneity ofvariance as long as the sample sizes in each group are approximatelyequal. However, with unequal sample sizes, heterogeneitymay compromise the validity of null hypothesis decisions. Largesample variances from small-group sizes increase the risk of TypeI error. Large sample variances from large-group sizes increasethe risk of Type II error. When the assumption of homogeneityof variance is violated, researchers may conduct and report nonparametrictests such as the Kruskal–Wallis. However, Maxwell andDelaney (2004) noted that the Kruskal–Wallis test also assumesequal variances and suggested that data be either transformed tomeet the assumption of homogeneity of variance or analyzed withtests such as Brown–Forsythe F ∗ or Welch’s W.Homogeneity of regression applies to group analyses withcovariates, including ANCOVA and MANCOVA, and assumes thatthe regression between covariate(s) and DV(s) in one group is thesame as the regression in other groups (Tabachnick and Fidell,2001). This assumption can be examined graphically or by conductinga statistical test on the interaction between the COV(s)and the IV(s). Violation of this assumption can lead to very misleadingresults if covariance is used (Stevens, 2002). For example,in the case of heterogeneous slopes, group means that have beenadjusted by a covariate could indicate no difference when, in fact,group differences might exist at different values of the covariate. Ifheterogeneity of regression exists, ANCOVA and MANCOVA areinappropriate analytic strategies (Tabachnick and Fidell, 2001).Sphericity applies to repeated measures analyses that involvethree or more measurement occasions (repeated measuresANOVA) and assumes that the variances of the differences for allpairs of repeated measures are equal (Stevens, 2002). Presumingthat data are multivariate normal, the Mauchly test can be usedto test this assumption, where smaller statistics indicate greaterlevels of sphericity (Tabachnick and Fidell, 2001). Violating thesphericity assumption increases the risk of Type I error (Box,1954). To adjust for this risk and provide better control for Type Ierror rate, the degrees of freedom for the repeated measures F testmay be corrected using and reporting one of three adjustments:(a) Greenhouse–Geisser, (b) Huynh–Feldt, and (c) Lower-bound(see Nimon and Williams, 2009). Alternatively, researchers mayconduct and report analyses that do not assume sphericity (e.g.,MANOVA).Homoscedasticity applies to multiple linear regression andcanonical correlation and assumes that the variability in scores forone continuous variable is roughly the same at all values of anothercontinuous variable (Tabachnick and Fidell, 2001). Scatterplotsare typically used to test homoscedasticity. Linear regression isgenerally robust to slight violations of homoscedasticity; however,marked heteroscedasticity increases the risk of Type I error(Osborne and Waters, 2002). Canonical correlation performs bestwhen relationships among pairs of variables are homoscedastic(Tabachnick and Fidell, 2001). If the homoscedasticity assumptionis violated, researchers may delete outlying cases, transform data,or conduct non-parametric tests (see Conover, 1999; Osborne,2012), as long as the process is clearly reported.Homogeneity of variance-covariance matrix is a multivariategeneralization of homogeneity of variance. It applies to multivariategroup analyses (MANOVA and MANCOVA) and assumes thatthe variance-covariance matrix is roughly the same at all levelsof the IV (Stevens, 2002). The Box M test tests this assumption,where smaller statistics indicate greater homogeneity. Tabachnickand Fidell (2001) provided the following guidelines for interpretingviolations of this assumption: if sample sizes are equal,heterogeneity is not an issue. However, with unequal sample sizes,heterogeneity may compromise the validity of null hypothesisdecisions. Large sample variances from small-group sizes increasethe risk of Type I error whereas large sample variances from largegroupsizes increase the risk of Type II error. If sample sizes areunequal and the Box M test is significant at p < 0.001, researchersshould conduct the Pillai’s test or equalize sample sizes by randomdeletion of cases if power can be retained.DISCUSSIONWith the advances in statistical software, it is easy for researchers touse point and click methods to conduct a wide variety of statisticalanalyses on their datasets. However, the output from statisticalsoftware packages typically does not fully indicate if necessary statisticalassumptions have been met. I invite editors and reviewers touse the information presented in this article as a basic checklist ofthe statistical assumptions to be reported in scholarly reports. Thereferences cited in this article should also be helpful to researcherswho are unfamiliar with a particular assumption or how to test it.Frontiers in Psychology | Quantitative Psychology and Measurement August 2012 | Volume 3 | Article 322 | 74
Page 2 and 3:
FRONTIERS COPYRIGHTSTATEMENT© Copy
Page 4 and 5:
Table of Contents05 Is Data Cleanin
Page 7 and 8:
OsborneAssumptions and data cleanin
Page 9 and 10:
ORIGINAL RESEARCH ARTICLEpublished:
Page 12 and 13:
Hoekstra et al.Why assumptions are
Page 14 and 15:
Page 16 and 17:
Page 18 and 19:
ORIGINAL RESEARCH ARTICLEpublished:
Page 20 and 21:
García-PérezStatistical conclusio
Page 22 and 23:
García-PérezStatistical conclusio
Page 24 and 25: García-PérezStatistical conclusio
Page 30 and 31: Sheng and ShengEffect of non-normal
Page 42 and 43: REVIEW ARTICLEpublished: 12 April 2
Page 44 and 45: Nimon et al.The assumption of relia
Page 56 and 57: TressoldiPower replication unreliab
Page 58 and 59: TressoldiPower replication unreliab
Page 60 and 61: ORIGINAL RESEARCH ARTICLEpublished:
Page 62 and 63: FinchModern methods for the detecti
Page 72 and 73: MINI REVIEW ARTICLEpublished: 28 Au
Page 76 and 77: NimonStatistical assumptionsFor exa
Page 78 and 79: Kraha et al.Interpreting multiple r
Page 94 and 95: SmithsonComparing moderation of slo
Page 102 and 103: REVIEW ARTICLEpublished: 01 March 2
Page 104 and 105: Flora et al.Factor analysis assumpt
Page 124 and 125:
Kasper and ÜnlüAssumptions of fac
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Lages and JaworskaHow predictable a
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Cummiskey et al.Testing assumptions
Page 154 and 155:
Page 156 and 157:
Page 158:
show all

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?