13.07.2015 Views

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

Sweating the Small Stuff: Does data cleaning and testing ... - Frontiers

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ORIGINAL RESEARCH ARTICLEpublished: 27 March 2013doi: 10.3389/fpsyg.2013.00109On <strong>the</strong> relevance of assumptions associated with classicalfactor analytic approaches †Daniel Kasper <strong>and</strong> Ali Ünlü*Chair for Methods in Empirical Educational Research, TUM School of Education <strong>and</strong> Centre for International Student Assessment, Technische Universität München,Munich, GermanyEdited by:Jason W. Osborne, Old DominionUniversity, USAReviewed by:Andrew Jones, American Board ofSurgery, USAEvgueni Borokhovski, ConcordiaUniversity, Canada*Correspondence:Ali Ünlü, Chair for Methods inEmpirical Educational Research, TUMSchool of Education <strong>and</strong> Centre forInternational Student Assessment,Technische Universität München,Lothstrasse 17, 80335 Munich,Germany.e-mail: ali.uenlue@tum.de†The research reported in this paper isbased on <strong>the</strong> dissertation <strong>the</strong>sis byKasper (2012).A personal trait, for example a person’s cognitive ability, represents a <strong>the</strong>oretical conceptpostulated to explain behavior. Interesting constructs are latent, that is, <strong>the</strong>y cannot beobserved. Latent variable modeling constitutes a methodology to deal with hypo<strong>the</strong>ticalconstructs. Constructs are modeled as r<strong>and</strong>om variables <strong>and</strong> become components of astatistical model. As r<strong>and</strong>om variables, <strong>the</strong>y possess a probability distribution in <strong>the</strong> populationof reference. In applications, this distribution is typically assumed to be <strong>the</strong> normaldistribution. The normality assumption may be reasonable in many cases, but <strong>the</strong>re aresituations where it cannot be justified. For example, this is true for criterion-referencedtests or for background characteristics of students in large scale assessment studies. Never<strong>the</strong>less,<strong>the</strong> normal procedures in combination with <strong>the</strong> classical factor analytic methodsare frequently pursued, despite <strong>the</strong> effects of violating this “implicit” assumption are notclear in general. In a simulation study, we investigate whe<strong>the</strong>r classical factor analyticapproaches can be instrumental in estimating <strong>the</strong> factorial structure <strong>and</strong> properties of <strong>the</strong>population distribution of a latent personal trait from educational test <strong>data</strong>, when violationsof classical assumptions as <strong>the</strong> aforementioned are present.The results indicate thathaving a latent non-normal distribution clearly affects <strong>the</strong> estimation of <strong>the</strong> distribution of<strong>the</strong> factor scores <strong>and</strong> properties <strong>the</strong>reof. Thus, when <strong>the</strong> population distribution of a personaltrait is assumed to be non-symmetric, we recommend avoiding those factor analyticapproaches for estimation of a person’s factor score, even though <strong>the</strong> number of extractedfactors <strong>and</strong> <strong>the</strong> estimated loading matrix may not be strongly affected. An application to <strong>the</strong>Progress in International Reading Literacy Study (PIRLS) is given. Comments on possibleimplications for <strong>the</strong> Programme for International Student Assessment (PISA) complete <strong>the</strong>presentation.Keywords: factor analysis, latent variable model, normality assumption, factorial structure, criterion-referenced test,large scale educational assessment, Programme for International Student Assessment, Progress in InternationalReading Literacy Study1. INTRODUCTIONEducational research is concerned with <strong>the</strong> study of processes oflearning <strong>and</strong> teaching. Typically, <strong>the</strong> investigated processes are notobservable, <strong>and</strong> to unveil <strong>the</strong>se, manifest human behavior in testsituations is recorded. According to Lienert <strong>and</strong> Raatz (1998, p. 1)“a test [. . .] is a routine procedure for <strong>the</strong> investigation of one ormore empirically definable personality traits” (translated by <strong>the</strong>authors), <strong>and</strong> to satisfy a minimum of quality criteria, a test isrequired to be objective, reliable, <strong>and</strong> valid.In this paper we deal with factor analytic methods for assessingconstruct validity of a test, in <strong>the</strong> sense of its factorial validity (e.g.,Cronbach <strong>and</strong> Meehl, 1955; Lienert <strong>and</strong> Raatz, 1998). Factorialvalidity refers to <strong>the</strong> factorial structure of <strong>the</strong> test, that is, to <strong>the</strong>number (<strong>and</strong> interpretation) of underlying factors, <strong>the</strong> correlationstructure among <strong>the</strong> factors, <strong>and</strong> <strong>the</strong> correlations of each test itemwith <strong>the</strong> factors. There are a number of latent variable models thatmay be used to analyze <strong>the</strong> factorial structure of a test – for generalizedlatent variable modeling covering a plethora of models asspecial cases of a much broader framework, see Bartholomew et al.(2011) <strong>and</strong> Skrondal <strong>and</strong> Rabe-Hesketh (2004). This paper focuseson classical factor analytic approaches, <strong>and</strong> it examines how accuratelydifferent methods of classical factor analysis can estimate<strong>the</strong> factorial structure of test <strong>data</strong>, if assumptions associated with<strong>the</strong> classical approaches are not satisfied. The methods of classicalfactor analysis will include principal component analysis (PCA;Pearson, 1901; Hotelling, 1933a,b; Kelley, 1935), exploratory factoranalysis (EFA; Spearman, 1904; Burt, 1909; Thurstone, 1931,1965), <strong>and</strong> principal axis analysis (PAA; Thurstone, 1931, 1965).More recent works on factor analysis <strong>and</strong> related methods are Harman(1976), McDonald (1985), Cudeck <strong>and</strong> MacCallum (2007),<strong>and</strong> Mulaik (2009). Fur<strong>the</strong>r references, to more specific topics infactor analysis, are given below, later in <strong>the</strong> text. 11 For <strong>the</strong> sake of simplicity <strong>and</strong> for <strong>the</strong> purpose <strong>and</strong> analysis of this paper, we wantto refer to all of <strong>the</strong>se approaches (PCA, EFA, PAA) collectively as classical factoranalysis/analytic methods. Albeit it is known that PCA differs from factor analysisin important aspects, <strong>and</strong> that PAA ra<strong>the</strong>r represents an alternative estimationprocedure for EFA. PCA <strong>and</strong> EFA are different technically <strong>and</strong> conceptually. PCAwww.frontiersin.org March 2013 | Volume 4 | Article 109 | 122

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!