13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Flora et al.Factor analysis assumptionsBut because <strong>the</strong> factor scores (i.e., scores on latent variables inη) are unobserved <strong>and</strong> cannot be calculated precisely, so too <strong>the</strong>residuals cannot be calculated precisely, even with known populationfactor loadings, Λ. Thus, to obtain estimates of <strong>the</strong> residuals,ˆε, it is necessary first to estimate <strong>the</strong> factor scores, ˆη, which <strong>the</strong>mselvesmust be based on sample estimates of ˆΛ <strong>and</strong> ˆΘ. Bollen<strong>and</strong> Arminger (1991) show how two well-known approaches toestimating factor scores, <strong>the</strong> least-squares regression method <strong>and</strong>Bartlett’s method, can be applied to obtain ˆε. As implied by Eq.4, <strong>the</strong> estimated residuals ˆε are unst<strong>and</strong>ardized in that <strong>the</strong>y arein <strong>the</strong> metric of <strong>the</strong> observed variables, y. Bollen <strong>and</strong> Armingerthus present formulas to convert unst<strong>and</strong>ardized residuals intost<strong>and</strong>ardized residuals. Simulation results by Bollen <strong>and</strong> Armingershow good performance of <strong>the</strong>ir st<strong>and</strong>ardized residuals for revealingoutliers, with little difference according to whe<strong>the</strong>r <strong>the</strong>y areestimated from regression-based factor scores or Bartlett-basedfactor scores.Returning to our example sample <strong>data</strong>, we estimated <strong>the</strong> st<strong>and</strong>ardizedresiduals from <strong>the</strong> three-factor EFA model for each of<strong>the</strong> N = 100 cases; Figure 2 illustrates <strong>the</strong>se residuals for all nineobserved variables (recall that in this example, y consists of ninevariables <strong>and</strong> thus <strong>the</strong>re are nine unique factors in ˆε). Because <strong>the</strong><strong>data</strong> were drawn from a st<strong>and</strong>ard normal distribution conformingto a known population model, <strong>the</strong>se residuals <strong>the</strong>mselves shouldbe approximately normally distributed with no extreme outliers.Any deviation from normality in Figure 2 is thus only due to samplingerror <strong>and</strong> error due to <strong>the</strong> approximation of factor scores.Later, we introduce an extreme outlier in <strong>the</strong> <strong>data</strong> set to illustrateits effects.LEVERAGEAs mentioned above,in regression it is important to consider leveragein addition to outlying residuals. The same concept applies infactor analysis (see Yuan <strong>and</strong> Zhong, 2008, for an extensive discussion).Leverage is most commonly quantified using “hat values”in multiple regression analysis, but a related statistic, Mahalonobisdistance (MD), also can be used to measure leverage (e.g.,Fox, 2008). MD helps measure <strong>the</strong> extent to which an observationis a multivariate outlier with respect to <strong>the</strong> set of explanatoryvariables 6 . Here, our use of MD draws from Pek <strong>and</strong> MacCallum(2011), who recommend its use for uncovering multivariate outliersin <strong>the</strong> context of SEM. In a factor analysis, <strong>the</strong> MD for a given6 Weisberg (1985) showed that (n − 1)h ∗ is <strong>the</strong> MD for a given case from <strong>the</strong> centroidof <strong>the</strong> explanatory variables, where h ∗ is <strong>the</strong> regression hat value calculated usingmean-deviated independent variables.FIGURE 2 | Histograms of st<strong>and</strong>ardized residuals for each observed variable from three-factor model fitted to r<strong>and</strong>om sample <strong>data</strong> (N = 100; nounusual cases).<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement March 2012 | Volume 3 | Article 55 | 106

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!