13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Flora et al.Factor analysis assumptions<strong>the</strong>ir variation <strong>and</strong> covariation; that is, covariation between anytwo observed variables is due to <strong>the</strong>m being influenced by <strong>the</strong>same factor. This idea was introduced by Spearman (1904) <strong>and</strong>,largely due to Thurstone (1947), evolved into <strong>the</strong> common factormodel, which remains <strong>the</strong> dominant paradigm for factor analysistoday. Factor analysis is traditionally a method for fitting modelsto <strong>the</strong> bivariate associations among a set of variables, with EFAmost commonly using Pearson product-moment correlations <strong>and</strong>CFA most commonly using covariances. Use of product-momentcorrelations or covariances follows from <strong>the</strong> fact that <strong>the</strong> commonfactor model specifies a linear relationship between <strong>the</strong> factors <strong>and</strong><strong>the</strong> observed variables.Lawley <strong>and</strong> Maxwell (1963) showed that <strong>the</strong> common factormodel can be formally expressed as a linear model with observedvariables as dependent variables <strong>and</strong> factors as explanatory orindependent variables:y j = λ 1 η 1 + λ 2 η 2 + λ k η k + ε jwhere y j is <strong>the</strong> jth observed variable from a battery of p observedvariables, η k is <strong>the</strong> kth of m common factors, λ k is <strong>the</strong> regressioncoefficient, or factor loading, relating each factor to y j , <strong>and</strong> ε j is <strong>the</strong>residual, or unique factor, for y j . (Often <strong>the</strong>re are only one or twofactors, in which case <strong>the</strong> right h<strong>and</strong> side of <strong>the</strong> equation includesonly λ 1 η 1 + ε j or only λ 1 η 1 + λ 2 η 2 + ε j .) It is convenient to workwith <strong>the</strong> model in matrix form:y = Λη + ε, (1)where y isavectorof<strong>the</strong>p observed variables, Λ is a p × m matrixof factor loadings, η isavectorofm common factors, <strong>and</strong> ε isavectorofp unique factors 1 . Thus, each common factor mayinfluence more than one observed variable while each unique factor(i.e., residual) influences only one observed variable. As with<strong>the</strong> st<strong>and</strong>ard regression model, <strong>the</strong> residuals are assumed to beindependent of <strong>the</strong> explanatory variables; that is, all unique factorsare uncorrelated with <strong>the</strong> common factors. Additionally, <strong>the</strong>unique factors are usually assumed uncorrelated with each o<strong>the</strong>r(although this assumption may be tested <strong>and</strong> relaxed in CFA).Given Eq. 1, it is straightforward to show that <strong>the</strong> covariancesamong <strong>the</strong> observed variables can be written as a function of modelparameters (factor loadings, common factor variances <strong>and</strong> covariances,<strong>and</strong> unique factor variances). Thus, in CFA, <strong>the</strong> parametersare typically estimated from <strong>the</strong>ir covariance structure:Σ = ΛΨΛ ′ + Θ, (2)where Σ is <strong>the</strong> p × p population covariance matrix for <strong>the</strong> observedvariables, Ψ is <strong>the</strong> m × m interfactor covariance matrix, <strong>and</strong> Θ is<strong>the</strong> p × p matrix unique factor covariance matrix that often containsonly diagonal elements, i.e., <strong>the</strong> unique factor variances. Thecovariance structure model shows that <strong>the</strong> observed covariances1 For traditional factor analysis models, <strong>the</strong> means of <strong>the</strong> observed variables arearbitrary <strong>and</strong> unstructured by <strong>the</strong> model, which allows omission of an interceptterm in Eq. 1 by assuming <strong>the</strong> observed variables are mean-deviated, or centered(MacCallum, 2009).are a function of <strong>the</strong> parameters but not <strong>the</strong> unobservable scoreson <strong>the</strong> common or unique factors; hence, it is not necessary toobserve scores on <strong>the</strong> latent variables to estimate <strong>the</strong> model parameters.In EFA, <strong>the</strong> parameters are most commonly estimatedfrom <strong>the</strong> correlation structureP = Λ ∗ Ψ ∗ Λ ∗′ + Θ ∗ (3)where P is <strong>the</strong> population correlation matrix <strong>and</strong>, in that P issimply a re-scaled version of Σ, we can view Λ ∗ , Ψ ∗ , <strong>and</strong> Θ ∗as re-scaled versions of Λ, Ψ, <strong>and</strong> Θ, respectively. This tendencyto conduct EFA using correlations is mainly a result of historicaltradition, <strong>and</strong> it is possible to conduct EFA using covariances orCFA using correlations. For simplicity, we focus on <strong>the</strong> analysisof correlations from this point forward, noting that <strong>the</strong> principleswe discuss apply equivalently to <strong>the</strong> analysis of both correlation<strong>and</strong> covariance matrices (MacCallum, 2009; but see Cudeck, 1989;Bentler, 2007; Bentler <strong>and</strong> Savalei, 2010 for discussions of <strong>the</strong>analysis of correlations vs. covariances). We also drop <strong>the</strong> asteriskswhen referring to <strong>the</strong> parameter matrices in Eq. 3.Jöreskog (1969) showed how <strong>the</strong> traditional EFA model, oran “unrestricted solution” for <strong>the</strong> general factor model describedabove, can be constrained to produce <strong>the</strong> “restricted solution”that is commonly understood as today’s CFA model <strong>and</strong> is wellintegratedin <strong>the</strong> structural equation modeling (SEM) literature.Specifically, in <strong>the</strong> EFA model, <strong>the</strong> elements of Λ are all freely estimated;that is, each of <strong>the</strong> m factors has an estimated relationship(i.e., factor loading) with every observed variable; factor rotationis <strong>the</strong>n used to aid interpretation by making some values in Λlarge <strong>and</strong> o<strong>the</strong>rs small. But in <strong>the</strong> CFA model, depending on <strong>the</strong>researcher’s hypo<strong>the</strong>sized model, many of <strong>the</strong> elements of Λ arerestricted,or constrained,to equal zero,often so that each observedvariable is determined by one <strong>and</strong> only one factor (i.e., so that <strong>the</strong>reare no “cross-loadings”). Because <strong>the</strong> common factors are unobservedvariables <strong>and</strong> thus have an arbitrary scale, it is conventionalto define <strong>the</strong>m as st<strong>and</strong>ardized (i.e., with variance equal to one);thus Ψ is <strong>the</strong> interfactor correlation matrix 2 . This convention is nota testable assumption of <strong>the</strong> model, but ra<strong>the</strong>r imposes necessaryidentification restrictions that allow <strong>the</strong> model parameters to beestimated (although alternative identification constraints are possible,such as <strong>the</strong> marker variable approach often used with CFA).In addition to constraining <strong>the</strong> factor variances, EFA requires adiagonal matrix for Θ, with <strong>the</strong> unique factor variances along <strong>the</strong>diagonal.Exploratory factor analysis <strong>and</strong> CFA <strong>the</strong>refore share <strong>the</strong> goalof using <strong>the</strong> common factor model to represent <strong>the</strong> relationshipsamong a set of observed variables using a small number of factors.Hence, EFA <strong>and</strong> CFA should not be viewed as disparate methods,despite that <strong>the</strong>ir implementation with conventional softwaremight seem quite different. Instead, <strong>the</strong>y are two approaches to2 In EFA, <strong>the</strong> model is typically estimated by first setting Ψ to be an identity matrix,which implies that <strong>the</strong> factors are uncorrelated, or orthogonal, leading to <strong>the</strong> initialunrotated factor loadings in Λ. Applying an oblique factor rotation obtains a new setof factor loadings along with non-zero interfactor correlations. Although rotationis not a focus of <strong>the</strong> current paper, we recommend that researchers always use anoblique rotation.<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement March 2012 | Volume 3 | Article 55 | 102

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!