13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Flora et al.Factor analysis assumptionsFIGURE 5 | Scatterplot of “Remainders” by “Mixed Arithmetic” for perturbed sample with influential case indicated.Table4|Factorloading matrix obtained with perturbed sample <strong>data</strong>.VariableFactorη 1 η 2 η 3WrdMean 0.97 −0.13 0.09SntComp 0.70 0.29 −0.23OddWrds 0.74 −0.01 0.21MxdArit −0.07 1.01 0.03Remndrs 0.17 0.49 0.32MissNum 0.08 0.81 0.06Gloves 0.01 0.09 0.68Boots 0.05 0.12 0.44Hatchts 0.02 −0.08 0.89N = 100. WrdMean, word meaning; SntComp, sentence completion; OddWrds,odd words; MxdArit, mixed arithmetic; Remndrs, remainders; MissNum, missingnumbers; Hatchts, hatchets. Primary loadings for each observed variable are inbold.effect of an individual case on model fit with ML estimation canbe formally measured with an influence statistic known as likelihooddistance, which measures <strong>the</strong> difference in <strong>the</strong> likelihood of<strong>the</strong> model when a potentially influential case is deleted (Pek <strong>and</strong>MacCallum, 2011).Upon discovering unusual cases, it is important to determine<strong>the</strong>ir likely source. Often, outliers <strong>and</strong> influential cases arise fromei<strong>the</strong>r researcher error (e.g., <strong>data</strong> entry error or faulty administrationof study procedures) or participant error (e.g., misunderst<strong>and</strong>ingof study instructions or non-compliance with r<strong>and</strong>omresponding) or <strong>the</strong>y may be observations from a populationo<strong>the</strong>r than <strong>the</strong> population of interest (e.g., a participant with nohistory of depression included in a study of depressed individuals).In <strong>the</strong>se situations, it is best to remove such cases from<strong>the</strong> <strong>data</strong> set. Conversely, if unusual cases are simply extremecases with o<strong>the</strong>rwise legitimate values, most methodologists recommendthat <strong>the</strong>y not be deleted from <strong>the</strong> <strong>data</strong> set prior tomodel fitting (e.g., Bollen <strong>and</strong> Arminger, 1991; Yuan <strong>and</strong> Zhong,2008; Pek <strong>and</strong> MacCallum, 2011). Instead, robust procedures thatminimize <strong>the</strong> excessive influence of extreme cases are recommended;in particular, case-robust methods developed by Yuan<strong>and</strong> Bentler (1998) are implemented in <strong>the</strong> EQS software package(Bentler, 2004) or one can factor analyze a minimum covariancedeterminant (MCD) estimated covariance matrix (Pisonet al., 2003), which can be calculated with SAS or <strong>the</strong> R package“MASS.”COLLINEARITYAno<strong>the</strong>r potential concern for both multiple regression analysis<strong>and</strong> factor analysis is collinearity, which refers to perfect ornear-perfect linear relationships among observed variables. Withmultiple regression, <strong>the</strong> focus is on collinearity among explanatoryvariables, but with factor analysis, <strong>the</strong> concern is collinearityamong dependent variables, that is, <strong>the</strong> set of variables being factoranalyzed. When collinear variables are included, <strong>the</strong> productmomentcorrelation matrix R will be singular, ornon-positivedefinite. ML estimation cannot be used with a singular R, <strong>and</strong>although ULS is possible, collinearity is still indicative of conceptualissues with variable selection. Collinearity in factor analysisis relatively simple to diagnose: if any eigenvalues of a productmomentR equal zero or are negative, <strong>the</strong>n R is non-positivedefinite <strong>and</strong> collinearity is present (<strong>and</strong> software will likely producewww.frontiersin.org March 2012 | Volume 3 | Article 55 | 109

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!