13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Flora et al.Factor analysis assumptionsTable1|Population correlation matrix.Variable 1 2 3 4 5 6 7 8 9WrdMean 1SntComp 0.75 1OddWrds 0.78 0.72 1MxdArit 0.44 0.52 0.47 1Remndrs 0.45 0.53 0.48 0.82 1MissNum 0.51 0.58 0.54 0.82 0.74 1Gloves 0.21 0.23 0.28 0.33 0.37 0.35 1Boots 0.30 0.32 0.37 0.33 0.36 0.38 0.45 1Hatchts 0.31 0.30 0.37 0.31 0.36 0.38 0.52 0.67 1WrdMean, word meaning; SntComp, sentence completion; OddWrds, odd words;MxdArit, mixed arithmetic; Remndrs, remainders; MissNum, missing numbers,Hatchts, hatchets.Table2|Population factor loading matrix.VariableFactorη 1 η 2 η 3WrdMean 0.94 −0.05 −0.03SntComp 0.77 0.14 −0.03OddWrds 0.83 0.00 0.08MxdArit −0.05 1.01 −0.05Remndrs 0.04 0.80 0.06MissNum 0.14 0.75 0.06Gloves −0.06 0.13 0.56Boots 0.05 −0.01 0.74Hatchts 0.03 −0.09 0.91WrdMean, word meaning; SntComp, sentence completion; OddWrds, odd words;MxdArit, mixed arithmetic; Remndrs, remainders; MissNum, missing numbers;Hatchts, hatchets. Primary loadings for each observed variable are in bold.model using ULS <strong>and</strong> applied a quartimin rotation. The estimatedrotated factor loading matrix, ˆΛ, isinTable 3. Not surprisingly,<strong>the</strong>se factor loading estimates are similar, but not identical, to <strong>the</strong>population factor loadings.REGRESSION DIAGNOSTICS FOR FACTOR ANALYSIS OFCONTINUOUS VARIABLESBecause <strong>the</strong> common factor model is just a linear regression model(Eq. 1), many of <strong>the</strong> well-known concepts about regression diagnosticsgeneralize to factor analysis. Regression diagnostics are aset of methods that can be used to reveal aspects of <strong>the</strong> <strong>data</strong> thatare problematic for a model that has been fitted to that <strong>data</strong> (seeBelsley et al., 1980; Fox, 1991, 2008, for thorough treatments).Many <strong>data</strong> characteristics that are problematic for ordinary multipleregression are also problematic for factor analysis; <strong>the</strong> trickis that in <strong>the</strong> common factor model, <strong>the</strong> explanatory variables(<strong>the</strong> factors) are unobserved variables whose values cannot bedetermined precisely. In this section, we illustrate how regressiondiagnostic principles can be applied to factor analysis using <strong>the</strong>example <strong>data</strong> presented above.Table 3 | Sample factor loading matrix (multivariate normal <strong>data</strong>, nooutlying cases).VariableFactorη 1 η 2 η 3WrdMean 0.96 −0.08 0.04SntComp 0.81 0.15 −0.13OddWrds 0.75 0.05 0.12MxdArit −0.06 1.01 −0.03Remndrs 0.09 0.75 0.09MissNum 0.09 0.80 0.06Gloves −0.03 0.20 0.56Boots 0.07 0.00 0.68Hatchts −0.01 −0.01 0.85N = 100. WrdMean, word meaning; SntComp, sentence completion; OddWrds,odd words; MxdArit, mixed arithmetic; Remndrs, remainders; MissNum, missingnumbers; Hatchts, hatchets. Primary loadings for each observed variable are inbold.But taking a step back, given that factor analysis is basicallyan analysis of correlations, all of <strong>the</strong> principles about correlationthat are typically covered in introductory statistics coursesare also relevant to factor analysis. Foremost is that a productmomentcorrelation measures <strong>the</strong> linear relationship between twovariables. Two variables may be strongly related to each o<strong>the</strong>r,but <strong>the</strong> actual correlation between <strong>the</strong>m might be close to zero if<strong>the</strong>ir relationship is poorly approximated by a straight line (forexample, a U-shaped relationship). In o<strong>the</strong>r situations, <strong>the</strong>re maybe a clear curvilinear relationship between two variables, but aresearcher decides that a straight line is still a reasonable modelfor that relationship. Thus, some amount of subjective judgmentmay be necessary to decide whe<strong>the</strong>r a product-moment correlationis an adequate summary of a given bivariate relationship. If not,variables involved in non-linear relationships may be transformedprior to fitting <strong>the</strong> factor model or alternative correlations, suchas Spearman rank-order correlations, may be factor analyzed (seeGorsuch, 1983, pp. 297–309 for a discussion of <strong>the</strong>se strategies, butsee below for methods for item-level categorical variables).Visual inspection of simple scatterplots (or a scatterplot matrixcontaining many bivariate scatterplots) is an effective method forassessing linearity, although formal tests of linearity are possible(e.g., <strong>the</strong> RESET test of Ramsey, 1969). If <strong>the</strong>re is a large numberof variables, <strong>the</strong>n it may be overly tedious to inspect every bivariaterelationship. In this situation, one might focus on scatterplotsinvolving variables with odd univariate distributions (e.g.,stronglyskewed or bimodal) or r<strong>and</strong>omly select several scatterplots to scrutinizeclosely. Note that it is entirely possible for two variables tobe linearly related even when one or both of <strong>the</strong>m is non-normal;conversely, if two variables are normally distributed, <strong>the</strong>ir bivariaterelationship is not necessarily linear. Returning to our <strong>data</strong>example, <strong>the</strong> scatterplot matrix in Figure 1 shows that none of <strong>the</strong>bivariate relationships among <strong>the</strong>se nine variables has any cleardeparture from linearity.Scatterplots can also be effective for identifying unusual cases,although relying on scatterplots alone for this purpose is not<strong>Frontiers</strong> in Psychology | Quantitative Psychology <strong>and</strong> Measurement March 2012 | Volume 3 | Article 55 | 104

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!