3 years ago

First Draft of the paper - University of Toronto

First Draft of the paper - University of Toronto

3 Modelling measurement

3 Modelling measurement errorThe clear implication of Section 1 is that measurement error in the independentvariables should be modelled, not ignored. Here, we present thesimplest approach we know, using classical structural equation models of thesort described by Jöreskog (1978) and Bollen (1989). These models are quitegeneral; special cases include confirmatory factor analysis and path analysisas well as regression with measurement error.The classical structural equation models have no intercepts, and assumethat all independent variables and error terms have expected value zero.In practice, one centers all variables by subtracting off the sample means,which for large samples is approximately the same as subtracting off thepopulation means. Since all the inference is asymptotic anyway, there is noserious problem with this. We will discuss models with intercepts, and arguethat intercepts are often more trouble than they are worth. A multivariatenormal assumption is common, but easy to relax.We prefer discuss these relatively primitive methods (rather than thosedescribed, for example by Fuller, 1987) because the calculations are easier topresent to students and clients, and also because they are close to the defaultsettings in widely available commercial software for structural equationmodelling.3.1 Model identificationIn our experience, the greatest obstacle to using structural equation modelsin practice is that is is quite easy to come up with scientifically plausiblemodels that are not identified. Thus, to apply even the simplest structuralequation models to measurement error in regression, we need to discuss modelidentification.Suppose we have a vector of observable data D = (D 1 , . . . , D n ), and astatistical model (a set of assertions implying a probability distribution) forD, and this model depends on a parameter θ, which is usually a vector. Ifthe probability distribution of the data corresponds uniquely to θ, then wesay that the model is identified.It is possible for certain functions of the parameter vector to be identified,even when the entire model is not identified. If full knowledge of theprobability distribution of the data implies knowledge of some function of theparameter vector, then that function is said to be identified, and consistent20

estimation of it is a possibility. One example is the so-called “estimable functions”of the parameters of the over-parameterized linear models describedby Scheffé (1959).To show that a model is not identified, one need only produce two distinctparameter values that give rise to the same probability distribution. Forexample, let D 1 , . . . , D n be i.i.d. Poisson random variables with mean λ 1 +λ 2 ,where λ 1 > 0 and λ 1 > 0. The parameter is the pair θ = (λ 1 , λ 2 ). The modelis not identified because any pair of λ values satisfying λ 1 + λ 2 = c willproduce exactly the same probability distribution. Notice also how maximumlikelihood estimation will fail in this case; the likelihood function will havea ridge, a non-unique maximum along the line λ 1 + λ 2 = y. The functionλ = λ 1 + λ 2 , of course, is identified.For any statistical model, the probability distribution of the data is afunction of the parameter. If the parameter is also a function of the probabilitydistribution, then the function is one-to-one and the model is identified.Now, in the classical structural equation models, D 1 , . . . , D n are i.i.d. multivariatenormal with mean zero, so that their joint probability distribution iscompletely determined by their common variance-covariance matrix. Followingstandard practice, we will denote this matrix by Σ = Σ(θ), where θ is avector of the model parameters. As the notation indicates, Σ is a function ofθ. If it is also possible to solve for the elements of θ uniquely in terms of theelements of Σ so that θ is also a function of Σ, then the structural equationmodel is identified. Otherwise, it is not.In Section 1, we gave a model for multiple regression with two independentvariables measured with error, represented by Equations (2). For simplicity,suppose that all the intercepts and expected values equal zero, all error termsare uncorrelated with the latent variables and with each other, and everythingis multivariate normal.We have D i = (Y i , X i,1 , X i,2 ), so that Σ has six unique elements. Theparameter θ has eight elements: γ 1 , γ 2 , ψ, three more for the unique elementsof Φ, and two more for the error variances θ 1,1 and θ 2,2 . Attempting torecover these eight parameter values from the six elements of Σ amounts tosolving six equations in eight unknowns. No unique solution is possible, andhence the model is not identified. We see from this simple example that forthe kind of data set usually encountered in regression analysis, even a verymodest model for measurement error in the independent variables will notbe identified in general. How should we proceed?When a structural equation model is determined not to be identified (and21

draft - Toronto and Region Conservation Authority
EJMiller_Workshop_Nov-25-10 - Cities Centre - University of Toronto
Sharp Spectral Asymptotics - Victor Ivrii - University of Toronto
PDF Format, Slides - University of Toronto
draft - Toronto and Region Conservation Authority
Research in Action 2008 - University of Toronto
The Infant with a Cough A case - CEPD University of Toronto
Draft Report: America's Children and the Environment: A First - Inches
HIV, HCV and STI infection in Canada - University of Toronto
Chapter 2 - Memorial University of Newfoundland
In Praise of Weakness - Department of Physics - University of Toronto
SCIENTIFIC ACTIVITIES - Fields Institute - University of Toronto
WEMPA working paper-01 - VU University, Institute for ...
CCAP transport NAMAs paper FINAL DRAFT - India Environment ...
Paper Title - Civil Engineering - University of Toronto
Inference for bounded parameters - University of Toronto
Theoretical Statistics and Asymptotics - University of Toronto
Paper - University of Toronto Dynamic Graphics Project
Likelihood inference for complex data - University of Toronto
McMaster University - University of Toronto
Summer 2008 exam (with partial solutions) - University of Toronto