13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Kasper <strong>and</strong> ÜnlüAssumptions of factor analytic approachesWe decided to analyze released items of <strong>the</strong> PIRLS 2006 study(IEA, 2007) to have an empirical basis for <strong>the</strong> selection of skewnessvalues for ω(= ν). We used a <strong>data</strong> set of dichotomously scoredresponses of 7,899 German students to 125 test items. Figure 1displays <strong>the</strong> distribution of <strong>the</strong> PIRLS items’ (empirical) skewnessvalues. 6We decided to simulate under three conditions for <strong>the</strong> distributionsof ω. Under <strong>the</strong> first condition, ω m (m = 1, . . ., k) arenormal with µ 1m = 0, µ 2m = 1, µ 3m = 0, <strong>and</strong> µ 4m = 3. Under <strong>the</strong>second condition, ω m (m = 1, . . ., k) are slightly skewed withµ 1m = 0, µ 2m = 1, µ 3m = −0.20, <strong>and</strong> µ 4m = 3. Under <strong>the</strong> thirdcondition, ω m (m = 1, . . ., k) are strongly skewed with µ 1m = 0,µ 2m = 1, µ 3m = −2, <strong>and</strong> µ 4m = 9. The error terms were assumedto be unit normal, that is, we specified µ 1h = 0, µ 2h = 1, µ 3h = 0,<strong>and</strong> µ 4m = 3 for ω h (h = k + 1, . . ., k + p). Skewness <strong>and</strong> kurtosisof any z i under each of <strong>the</strong> three conditions were computedusing Mattson’s method (Section 4.1). The values are reported inTables 1 <strong>and</strong> 2 for <strong>the</strong> four <strong>and</strong> eight dimensional factor spaces,respectively.Under <strong>the</strong> slightly skewed distribution condition, <strong>the</strong> <strong>the</strong>oreticalvalues of skewness for <strong>the</strong> manifest variables range between−0.060 <strong>and</strong> −0.005, a condition that captured approximately 20%of <strong>the</strong> considered PIRLS test items. Under <strong>the</strong> strongly skewed distributioncondition, <strong>the</strong> <strong>the</strong>oretical values of skewness lie between−0.599 <strong>and</strong> −0.047, a condition that covered circa 30% of <strong>the</strong>PIRLS items (cf. Figure 1). Based on <strong>the</strong>se <strong>the</strong>oretical skewness<strong>and</strong> kurtosis statistics, we can see to what extent under <strong>the</strong>se modelspecifications <strong>the</strong> distributions of <strong>the</strong> manifest variables deviatefrom <strong>the</strong> normal distribution.How to generate variates ω i (i = 1, . . ., k + p) such that <strong>the</strong>ypossess predetermined moments µ 1i , µ 2i , µ 3i , <strong>and</strong> µ 4i ? To simulatevalues for ω i with predetermined moments, we used <strong>the</strong>generalized lambda distribution (Ramberg et al., 1979)ω i = λ 1 + uλ 3− (1 − u) λ 4λ 2,6 All figures of this paper were produced using <strong>the</strong> R statistical computing environment(R Development Core Team, 2011; www.r-project.org). The source files arefreely available from <strong>the</strong> authors.where u is uniform (0, 1), λ 1 is a location parameter, λ 2 a scaleparameter, <strong>and</strong> λ 3 <strong>and</strong> λ 4 are shape parameters. To realize <strong>the</strong>desired distribution conditions for <strong>the</strong> simulation study (normal,slightly skewed, strongly skewed) using this general distributionits parameters λ 1 , λ 2 , λ 3 , <strong>and</strong> λ 4 had to be specified accordingly.Ramberg et al. (1979) tabulate <strong>the</strong> required values for <strong>the</strong> λ parametersfor different values of µ. In particular, for a (more orless) normal distribution with µ 1 = 0, µ 2 = 1, µ 3 = 0, <strong>and</strong> µ 4 = 3<strong>the</strong> corresponding values are λ 1 = 0, λ 2 = 0.197, λ 3 = 0.135, <strong>and</strong>λ 4 = 0.135. For a slightly skewed distribution with µ 1 = 0, µ 2 = 1,µ 3 = −0.20, <strong>and</strong> µ 4 = 3, <strong>the</strong> values are λ 1 = 0.237, λ 2 = 0.193,λ 3 = 0.167, <strong>and</strong> λ 4 = 0.107. For a strongly skewed distributionwith µ 1 = 0, µ 2 = 1, µ 3 = −2, <strong>and</strong> µ 4 = 9, <strong>the</strong> parameter valuesare given by λ 1 = 0.993,λ 2 = −0.108·10 −2 ,λ 3 = −0.108·10 −2 ,<strong>and</strong>λ 4 = −0.041·10 −3 .Remark. Indeed, various distributions are possible (see Mattson,1997); however, <strong>the</strong> generalized lambda distribution provesto be special. It performs very well in comparison to o<strong>the</strong>r distributions,when <strong>the</strong>oretical moments calculated according to <strong>the</strong>Mattson formulae are compared to <strong>the</strong>ir corresponding empiricalmoments computed from <strong>data</strong> simulated under a factormodel (based on that distribution). For details, see Reinartz et al.(2002). These authors have also studied <strong>the</strong> effects of <strong>the</strong> useof different (pseudo) r<strong>and</strong>om number generators for realizing<strong>the</strong> uniform distribution in such a comparison study. Out ofthree compared r<strong>and</strong>om number generators – RANUNI fromSAS, URAND from PRELIS, <strong>and</strong> RANDOM from Ma<strong>the</strong>matica– <strong>the</strong> generator RANUNI performed relatively well or better.In this paper, we used <strong>the</strong> SAS program for our simulationstudy. 7Besides <strong>the</strong> number of factors <strong>and</strong> <strong>the</strong> distributions of <strong>the</strong>latent variables, sample size was varied. In <strong>the</strong> small samplecase, every z i consisted of n = 200 observations, <strong>and</strong> in <strong>the</strong>large sample case z i contained n = 600 observations. Table 3summarizes <strong>the</strong> design of <strong>the</strong> simulation study. Overall <strong>the</strong>re7 For <strong>the</strong> factor analyses in this paper, we used <strong>the</strong> SAS program <strong>and</strong> its PROC FAC-TOR implementation of <strong>the</strong> methods PCA, EFA, <strong>and</strong> PAA. More precisely, variationof <strong>the</strong> PROC FACTOR statements, run in <strong>the</strong>ir default settings, yields <strong>the</strong> performedprocedures PCA, EFA, <strong>and</strong> PAA (e.g., EFA if METHOD = ML).0.3density0.20.10.0−6−4−2−1.5−1−0.500.512skewness valueFIGURE 1 | Distribution of <strong>the</strong> skewness values for <strong>the</strong> 125 PIRLS test items.www.frontiersin.org March 2013 | Volume 4 | Article 109 | 128

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!