Flora et al.Factor analysis assumptionsproduct-moment correlations were attenuated which led to negativelybiased factor loading estimates (especially with fewerresponse categories), whereas polychoric correlations remainedaccurate <strong>and</strong> produced accurate factor loadings. When <strong>the</strong>observed variable set contained a mix of positively <strong>and</strong> negativelyskewed items (Cases 3 <strong>and</strong> 4), product-moment correlationswere strongly affected by <strong>the</strong> direction of skewness (especiallywith fewer response categories), which can lead to dramaticallymisleading factor analysis results. Unfortunately, strongly skeweditems can also be problematic for factor analyses of polychoric R inpart because <strong>the</strong>y produce sparse observed frequency tables, whichleads to higher rates of improper solutions <strong>and</strong> inaccurate results(e.g., Flora <strong>and</strong> Curran, 2004; Forero et al., 2009). Yet this difficultyis alleviated with larger sample size <strong>and</strong> more item-responsecategories; if a proper solution is obtained, <strong>the</strong>n <strong>the</strong> results of afactor analysis of polychoric R are more trustworthy than thoseobtained with product-moment R.GENERAL DISCUSSION AND CONCLUSIONFactor analysis is traditionally <strong>and</strong> most commonly an analysis of<strong>the</strong> correlations (or covariances) among a set of observed variables.A unifying <strong>the</strong>me of this paper is that if <strong>the</strong> correlationsbeing analyzed are misrepresentative or inappropriate summariesof <strong>the</strong> relationships among <strong>the</strong> variables, <strong>the</strong>n <strong>the</strong> factor analysisis compromised. Thus, <strong>the</strong> process of <strong>data</strong> screening <strong>and</strong> assumption<strong>testing</strong> for factor analysis should begin with a focus on <strong>the</strong>adequacy of <strong>the</strong> correlation matrix among <strong>the</strong> observed variables.In particular, analysis of product-moment correlations or covariancesimplies that <strong>the</strong> bivariate association between two observedvariables can be adequately modeled with a straight line, whichin turn leads to <strong>the</strong> expression of <strong>the</strong> common factor model asa linear regression model. This crucial linearity assumption takesprecedence over any concerns about having normally distributedvariables, although normality is important for certain model fitstatistics <strong>and</strong> estimating parameter SEs. O<strong>the</strong>r concerns about<strong>the</strong> appropriateness of <strong>the</strong> correlation matrix involve collinearity<strong>and</strong> potential influence of unusual cases. Once one accepts<strong>the</strong> sufficiency of a given correlation matrix as a representationof <strong>the</strong> observed <strong>data</strong>, <strong>the</strong> actual formal assumptions of <strong>the</strong> commonfactor model are relatively mild. These assumptions are that<strong>the</strong> unique factors (i.e., <strong>the</strong> residuals, ε,inEq. 1) are uncorrelatedwith each o<strong>the</strong>r <strong>and</strong> are uncorrelated with <strong>the</strong> common factors(i.e., η in Eq. 1). Substantial violation of <strong>the</strong>se assumptions typicallymanifests as poor model-<strong>data</strong> fit, <strong>and</strong> is o<strong>the</strong>rwise difficult toassess with a priori <strong>data</strong>-screening procedures based on descriptivestatistics or graphs.After reviewing <strong>the</strong> common factor model, we gave separatepresentations of issues concerning factor analysis of continuousobserved variables <strong>and</strong> issues concerning factor analysis of categorical,item-level observed variables. For <strong>the</strong> former, we showedhow concepts from regression diagnostics apply to factor analysis,given that <strong>the</strong> common factor model is itself a multivariate multipleregression model with unobserved explanatory variables. Animportant point was that cases that appear as outliers in univariateor bivariate plots are not necessarily influential <strong>and</strong> conversely thatinfluential cases may not appear as outliers in univariate or bivariateplots (though <strong>the</strong>y often do). If one can determine that unusualobservations are not a result of researcher or participant error,<strong>the</strong>nwe recommend <strong>the</strong> use of robust estimation procedures instead ofdeleting <strong>the</strong> unusual observation. Likewise, we also recommend<strong>the</strong> use of robust procedures for calculating model fit statistics<strong>and</strong> SEs when observed, continuous variables are non-normal.Next, a crucial message was that <strong>the</strong> linearity assumption isnecessarily violated when <strong>the</strong> common factor model is fitted toproduct-moment correlations among categorical, ordinally scaleditems, including <strong>the</strong> ubiquitous Likert-type items. At worst (e.g.,with a mix of positively <strong>and</strong> negatively skewed dichotomousitems), this assumption violation has severe consequences forfactor analysis results. At best (e.g., with symmetric items withfive response categories), this assumption violation still producesbiased factor-loading estimates. Alternatively, factor analysis ofpolychoric R among item-level variables explicitly specifies a nonlinearlink between <strong>the</strong> common factors <strong>and</strong> <strong>the</strong> observed variables,<strong>and</strong> as such is <strong>the</strong>oretically well-suited to <strong>the</strong> analysis ofitem-level variables. However, this method is also vulnerable tocertain <strong>data</strong> characteristics, particularly sparseness in <strong>the</strong> bivariatefrequency tables for item pairs, which occurs when stronglyskewed items are analyzed with a relatively small sample. Yet, factoranalysis of polychoric R among items generally produces superiorresults compared to those obtained with product-moment R,especially if <strong>the</strong>re are five or fewer item-response categories.We have not yet directly addressed <strong>the</strong> role of sample size. Inshort,no simple rule-of-thumb regarding sample size is reasonablygeneralizable across factor analysis applications. Instead, adequatesample size depends on many features of <strong>the</strong> research, such as <strong>the</strong>major substantive goals of <strong>the</strong> analysis, <strong>the</strong> number of observedvariables per factor, closeness to simple structure, <strong>and</strong> <strong>the</strong> strengthof <strong>the</strong> factor loadings (MacCallum et al., 1999, 2001). Beyond <strong>the</strong>seconsiderations, having a larger sample size can guard against someof <strong>the</strong> harmful consequences of unusual cases <strong>and</strong> assumptionviolation. For example, unusual cases are less likely to exert stronginfluence on model estimates as overall sample size increases. Conversely,removing unusual cases decreases <strong>the</strong> sample size, whichreduces <strong>the</strong> precision of parameter estimation <strong>and</strong> statistical powerfor hypo<strong>the</strong>sis tests about model fit or parameter estimates. Yet,having a larger sample size does not protect against <strong>the</strong> negativeconsequences of treating categorical item-level variables as continuousby factor analyzing product-moment R. But we did illustratethat larger sample size produces better results for factor analysis ofpolychoric R among strongly skewed items, in part because largersample size reduces <strong>the</strong> occurrence of sparseness.In closing, we emphasize that factor analysis, whe<strong>the</strong>r EFA orCFA, is a method for modeling relationships among observedvariables. It is important for researchers to recognize that itis impossible for a statistical model to be perfect; assumptionswill always be violated to some extent in that no model canever exactly capture <strong>the</strong> intricacies of nature. Instead, researchersshould strive to find models that have an approximate fit to<strong>data</strong> such that <strong>the</strong> inevitable assumption violations are trivial,but <strong>the</strong> models can still provide useful results that help answerimportant substantive research questions (see MacCallum, 2003,<strong>and</strong> Rodgers, 2010, for discussions of this principle). We recommendextensive use of sensitivity analyses <strong>and</strong> cross-validation toaid in this endeavor. For example, researchers should comparewww.frontiersin.org March 2012 | Volume 3 | Article 55 | 119
Flora et al.Factor analysis assumptionsresults obtained from <strong>the</strong> same <strong>data</strong> using different estimationprocedures, such as comparing traditional ULS or ML estimationwith robust procedures with continuous variables or comparingfull-information factor analysis results with limited-informationresults with item-level variables. Additionally, as even CFA analysesmay become exploratory through model modification, it isimportant to cross-validate models across independent <strong>data</strong> sets.Because different modeling procedures place different dem<strong>and</strong>s on<strong>data</strong>, comparing results obtained with different methods <strong>and</strong> samplescan help researchers gain a fuller, richer underst<strong>and</strong>ing of <strong>the</strong>usefulness of <strong>the</strong>ir statistical models given <strong>the</strong> natural complexityof real <strong>data</strong>.REFERENCESAsparouhov, T., <strong>and</strong> Muthén, B.(2009). Exploratory structural equationmodeling. Struct. Equ. Modeling16, 397–438.Babakus, E., Ferguson, C. E., <strong>and</strong>Joreskög, K. G. (1987). The sensitivityof confirmatory maximum likelihoodfactor analysis to violationsof measurement scale <strong>and</strong> distributionalassumptions. J. Mark. Res. 37,72–141.Bartholomew, D. J. (2007). “Three facesof factor analysis,” in Factor Analysisat 100: Historical Developments<strong>and</strong> Future Directions, eds R. Cudeck<strong>and</strong> R. C. MacCallum (Mahwah,NJ: Lawrence Erlbaum Associates),9–21.Belsley, D. A., Kuh, E., <strong>and</strong> Welsch, R. E.(1980). Regression Diagnostics: IdentifyingInfluential Data <strong>and</strong> Sourcesof Collinearity. New York: Wiley.Bentler, P. M. (2004). EQS 6 StructuralEquations Program Manual. Encino,CA: Multivariate Software, Inc.Bentler, P. M. (2007). Can scientificallyuseful hypo<strong>the</strong>ses be tested with correlations?Am. Psychol. 62, 772–782.Bentler, P. M., <strong>and</strong> Savalei, V. (2010).“Analysis of correlation structures:current status <strong>and</strong> open problems,”in Statistics in <strong>the</strong> Social Sciences:Current Methodological Developments,eds S. Kolenikov, D. Steinley,<strong>and</strong> L. Thombs, (Hoboken, NJ:Wiley), 1–36.Bernstein, I. H., <strong>and</strong> Teng, G. (1989).Factoring items <strong>and</strong> factoring scalesare different: spurious evidence formultidimensionality due to itemcategorization. Psychol. Bull. 105,467–477.Bock, R. D., Gibbons, R., <strong>and</strong> Muraki,E. (1988). Full-information item factoranalysis. Appl. Psychol. Meas. 12,261–280.Bollen, K. A. (1987). Outliers <strong>and</strong>improper solutions: a confirmatoryfactor analysis example. Sociol.Methods Res. 15, 375–384.Bollen, K. A. (1989). Structural Equationswith Latent Variables. NewYork: Wiley.Bollen, K. A., <strong>and</strong> Arminger, G.(1991). Observational residuals infactor analysis <strong>and</strong> structural equationmodels. Sociol. Methodol. 21,235–262.Briggs, N. E., <strong>and</strong> MacCallum, R. C.(2003). Recovery of weak commonfactors by maximum likelihood <strong>and</strong>ordinary least squares estimation.Multivariate Behav. Res. 38, 25–56.Browne, M. W., Cudeck, R., Tateneni,K., <strong>and</strong> Mels, G. (2010). CEFA:Comprehensive Exploratory FactorAnalysis, Version 3. 03 [ComputerSoftware, <strong>and</strong> Manual]. Availableat: http://faculty.psy.ohio-state.edu/browne/Cadigan, N. G. (1995). Local influencein structural equation models.Struct. Equ. Modeling 2, 13–30.Cudeck, R. (1989). Analysis of correlationmatrices using covariancestructure models. Psychol. Bull. 105,317–327.Cudeck, R., <strong>and</strong> O’Dell, L. L. (1994).Applications of st<strong>and</strong>ard error estimatesin unrestricted factor analysis:significance tests for factor loadings<strong>and</strong> correlations. Psychol. Bull. 115,475–487.Dolan, C. V. (1994). Factor analysis ofvariables with 2, 3, 5 <strong>and</strong> 7 responsecategories: a comparison of categoricalvariable estimators using simulated<strong>data</strong>. Br. J. Math. Stat. Psychol.47, 309–326.Ferguson, G. A. (1941). The factorialinterpretation of test difficulty.Psychometrika 6, 323–329.Finney, S. J., <strong>and</strong> DiStefano, C. (2006).“Non-normal <strong>and</strong> categorical <strong>data</strong>in structural equation modeling,”in Structural Equation Modeling: ASecond Course, eds G. R. Hancock<strong>and</strong> R. O. Mueller (Greenwich,CT: Information Age Publishing),269–314.Flora, D. B., <strong>and</strong> Curran, P. J. (2004).An empirical evaluation of alternativemethods of estimation forconfirmatory factor analysis withordinal <strong>data</strong>. Psychol. Methods 9,466–491.Forero, C. G., <strong>and</strong> Maydeu-Olivares, A.(2009). Estimation of IRT gradedmodels for rating <strong>data</strong>: limited vs.full information methods. Psychol.Methods 14, 275–299.Forero, C. G., Maydeu-Olivares, A.,<strong>and</strong> Gallardo-Pujol, D. (2009). Factoranalysis with ordinal indicators:a Monte Carlo study comparingDWLS <strong>and</strong> ULS estimation. Struct.Equ. Modeling 16, 625–641.Fox, J. (1991). Regression Diagnostics: AnIntroduction. Thous<strong>and</strong> Oaks, CA:SAGE Publications.Fox, J. (2008). Applied Regression Analysis<strong>and</strong> Generalized Linear Models,2nd Edn. Thous<strong>and</strong> Oaks, CA: SAGEPublications.Gorsuch, R. L. (1983). Factor Analysis,2nd Edn. Hillsdale, NJ: LawrenceErlbaum Associates.Green, S. B., Akey, T. M., Fleming, K. K.,Hershberger, S. L., <strong>and</strong> Marquis, J. G.(1997). Effect of <strong>the</strong> number of scalepoints on chi-square fit indices inconfirmatory factor analysis. Struct.Equ. Modeling 4, 108–120.Harman, H. H. (1960). Modern FactorAnalysis. Chicago, IL: University ofChicago Press.Jöreskog, K. G. (1969). A generalapproach to confirmatory maximumlikelihood factor analysis. Psychometrika34, 183–202.Lawley, D. N., <strong>and</strong> Maxwell, A. E.(1963). Factor Analysis as a StatisticalMethod. London: Butterworth.Lee, S.-Y., <strong>and</strong> Wang, S. J. (1996). Sensitivityanalysis of structural equationmodels. Psychometrika 61, 93–108.MacCallum, R. C. (2003). Workingwith imperfect models. MultivariateBehav. Res. 38, 113–139.MacCallum,R. C. (2009).“Factor analysis,”inThe SAGE H<strong>and</strong>book of QuantitativeMethods in Psychology, eds R.E. Millsap <strong>and</strong> A. Maydeu-Olivares(Thous<strong>and</strong> Oaks, CA: SAGE Publications),123–147.MacCallum, R. C., Widaman, K. F.,Preacher, K., <strong>and</strong> Hong, S. (2001).Sample size in factor analysis: <strong>the</strong>role of model error. MultivariateBehav. Res. 36, 611–637.MacCallum, R. C., Widaman, K. F.,Zhang, S., <strong>and</strong> Hong, S. (1999). Samplesize in factor analysis. Psychol.Methods 4, 84–99.Mavridis, D., <strong>and</strong> Moustaki, I. (2008).Detecting outliers in factor analysisusing <strong>the</strong> forward search algorithm.Multivariate Behav. Res. 43,453–475.Muthén, B. (1978). Contributions tofactor analysis of dichotomous variables.Psychometrika 43, 551–560.Muthén, B., <strong>and</strong> Kaplan, D. (1985).A comparison of some methodologiesfor <strong>the</strong> factor-analysis of nonnormalLikert variables. Br. J. Math.Stat. Psychol. 38, 171–180.Muthén, B., <strong>and</strong> Kaplan, D. (1992).A comparison of some methodologiesfor <strong>the</strong> factor-analysis of nonnormalLikert variables: a note on<strong>the</strong> size of <strong>the</strong> model. Br. J. Math.Stat. Psychol. 45, 19–30.Muthén, L., <strong>and</strong> Muthén, B. (2010).Mplus User’s Guide, 6th Edn. LosAngeles, CA: Muthén & Muthén.Nunnally, J.C., <strong>and</strong> Bernstein, I. H.(1994). Psychometric Theory, 3rdEdn. New York: McGraw-Hill.Olsson, U. (1979). Maximum likelihoodestimation of <strong>the</strong> polychoric correlationcoefficient. Psychometrika 44,443–460.Pek, J., <strong>and</strong> MacCallum, R. C. (2011).Sensitivity analysis in structuralequation models: cases <strong>and</strong> <strong>the</strong>irinfluence. Multivariate Behav. Res.46, 202–228.Pison, G., Rousseeuw, P. J., Filzmoser, P.,<strong>and</strong> Croux, C. (2003). Robust factoranalysis. J. Multivar. Anal. 84,145–172.Poon, W.-Y., <strong>and</strong> Wong, Y.-K. (2004). Aforward search procedure for identifyinginfluential observations in <strong>the</strong>estimation of a covariance matrix.Struct. Equ. Modeling 11, 357–374.Ramsey, J. B. (1969). Tests for specificationerrors in classical linear leastsquares regression analysis. J. R. Stat.Soc. Series B 31, 350–371.Rodgers, J. L. (2010). The epistemologyof ma<strong>the</strong>matical <strong>and</strong> statisticalmodeling: a quiet methodologicalrevolution. Am. Psychol. 65, 1–12.Satorra, A., <strong>and</strong> Bentler, P. M. (1994).“Corrections to test statistics <strong>and</strong>st<strong>and</strong>ard errors in covariance structureanalysis,” in Latent VariablesAnalysis: Applications for DevelopmentalResearch, eds A. von Eye <strong>and</strong>C. C. Clogg (Thous<strong>and</strong> Oaks, CA:SAGE Publications), 399–419.Savalei, V. (2011). What to do aboutzero frequency cells when estimatingpolychoric correlations. Struct. Equ.Modeling 18, 253–273.Spearman, C. (1904). General intelligenceobjectively determined <strong>and</strong>measured. Am.J.Psychol.5,201–293.Thurstone, L. L. (1947). Multiple FactorAnalysis. Chicago: University ofChicago Press.Weisberg, S. (1985). Applied LinearRegression, 2nd Edn. New York:Wiley.<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement March 2012 | Volume 3 | Article 55 |120