13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Flora et al.Factor analysis assumptionsFIGURE 6 | Histograms of st<strong>and</strong>ardized residuals for each observed variable from three-factor model fitted to perturbed sample <strong>data</strong> (N = 100).a warning message) 7 . Eigenvalues extremely close to zero may alsobe suggestive of near-collinearity 8 . A commonly used statistic forevaluating near-collinearity in ordinary regression is <strong>the</strong> conditionindex, which equals <strong>the</strong> square root of <strong>the</strong> ratio of <strong>the</strong> largest tosmallest eigenvalue, with larger values more strongly indicativeof near-collinearity. For <strong>the</strong> current example, <strong>the</strong> condition indexequals 6.34, which is well below <strong>the</strong> value of 30 suggested by Belsleyet al. (1980) as indicative of problematic near-collinearity.In <strong>the</strong> context of factor analysis, collinearity often implies anill-conceived selection of variables for <strong>the</strong> analysis. For example,if both total scores <strong>and</strong> sub-scale scores (or just one sub-scale)from <strong>the</strong> same instrument are included in a factor analysis, <strong>the</strong>total score is linearly dependent on, or collinear with, <strong>the</strong> subscalescore. A more subtle example may be <strong>the</strong> inclusion of both a“positive mood”scale <strong>and</strong> a“negative mood”scale; although scoresfrom <strong>the</strong>se two scales may not be perfectly related, <strong>the</strong>y will likely7 Polychoric correlation matrices (defined below) are often non-positive definite.Unlike a product-moment R, a non-positive definite polychoric correlation matrixis not necessarily problematic.8 Most EFA software includes eigenvalues of R as default output. However, it isimportant not to confuse <strong>the</strong> eigenvalues of R with <strong>the</strong> eigenvalues of <strong>the</strong> “reduced”correlation matrix, R − ˆΘ. The reduced correlation matrix often has negativeeigenvalues when R does not.have a very strong (negative) correlation,which can be problematicfor factor analysis. In <strong>the</strong>se situations, researchers should carefullyreconsider <strong>the</strong>ir choice of variables for <strong>the</strong> analysis <strong>and</strong> remove anythat is collinear with one or more of <strong>the</strong> o<strong>the</strong>r observed variables,which in turn redefines <strong>the</strong> overall research question addressed by<strong>the</strong> analysis (see Fox, 2008, p. 342).NON-NORMALITYIn terms of obtaining accurate parameter estimates, <strong>the</strong> ULS estimationmethod mentioned above makes no assumption regardingobserved variable distributions whereas ML estimation is based on<strong>the</strong> multivariate normal distribution (MacCallum, 2009). Specifically,for both estimators, parameter estimates are trustworthy aslong as <strong>the</strong> sample size is large <strong>and</strong> <strong>the</strong> model is properly specified(i.e., <strong>the</strong> estimator is consistent), even when <strong>the</strong> normality assumptionfor ML is violated (Bollen, 1989). However, SE estimates <strong>and</strong>certain model fit statistics (i.e., <strong>the</strong> χ 2 fit statistic <strong>and</strong> statisticsbased on χ 2 such as CFI <strong>and</strong> RMSEA) are adversely affected bynon-normality, particularly excess multivariate kurtosis. Although<strong>the</strong>y are less commonly used with EFA, SEs <strong>and</strong> model fit statisticsare equally applicable with EFA as with CFA <strong>and</strong> are easily obtainedwith modern software (but see Cudeck <strong>and</strong> O’Dell, 1994, for cautions<strong>and</strong> recommendations regarding SEs with EFA). With bothapproaches,SEs convey information about <strong>the</strong> sampling variability<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement March 2012 | Volume 3 | Article 55 | 110

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!