2 years ago

First Draft of the paper - University of Toronto

First Draft of the paper - University of Toronto

Since ξ 1 and ξ 2 are

Since ξ 1 and ξ 2 are not available, we might use X 1 and X 2 , fitting themodel (1) and testing the coefficient of X 2 with the usual t or F test. Thiswhat the textbooks are implicitly telling us to do. But unless ξ 1 is alsounrelated to Y , or the covariance of ξ 1 and ξ 2 is zero, the Type I errorrate is inflated. As the covariance between ξ 1 and ξ 2 increases, the problembecomes worse. As the strength of relationship between Y and ξ 1 increases,the problem becomes worse. As the amount of error in the measurementof ξ 1 increases, the problem becomes worse. As the amount of error in themeasurement of ξ 2 increases, the problem becomes somewhat less severe. Asthe sample size increases, the problem becomes worse, with Type I error ratesapproaching one for large samples under most conditions. The distributionsof the latent variables and measurement errors do not matter much. Fulldetails are given in Sections 1.1 and 1.2.Specialists in measurement error models will not really be surprised bythe inflation of Type I error, but for some reason they seem averse to advertisingit. Perhaps they have a tradition of being polite. For exampleCochran (1968), after showing that the ordinary least squares estimates arebiased, remarks that “Depending on the signs of the terms . . . , this biascan produce either an overestimate or an underestimate, and will disturb theType I errors of tests of significance.” (page 653). Fuller (1978) knows too, ofcourse. In a discussion of instrumental variables (more on this later), he mentionsthat “. . . variables whose theoretical coefficients are zero are sometimessignificant in ordinary least squares regression” (page 55).So, presumably all the experts know that measurement error in the independentvariables can inflate Type I error rates, but they seem to be verycalm about it. In contrast, many mainstream statisticians and users of multipleregression may be appalled at the magnitude of the effect.It gets worse. Suppose that ξ 1 and ξ 2 are positively correlated, ξ 1 ismeasured with error, the coefficient corresponding to ξ 1 is greater than zero,and the coefficient corresponding to ξ 2 is not zero this time, but actuallysomewhat less than zero. Especially for small negative values, the test ofX 2 controlling for X 1 will be statistically significant at a high rate by theusual tests, and this will be accompanied by a positive estimated regressioncoefficient for X 2 . In this case, one would mistakenly conclude that there isa positive (partial) association between ξ 2 and Y , when in fact it is negative.There is some good news (Lord, 1960). In a designed experiment withrandom assignment to conditions, analysis of covariance is not subject toany particular problems, even when the covariates are measured with error6

and the model is not formally identified. This is because the independentvariables for which we are “controlling” are unrelated to those we want totest.There is another case where ordinary least squares regression may be usefuleven when the independent variables are measured with error. In certainapplications (possibly in business), the primary interest may be to build aregression model entirely for purposes of prediction, while interpretation ofthe regression equation may be of secondary interest at best. Here, ordinaryleast squares may be a useful tool even in the presence of measurement error.But when the goal of a regression analysis is to understand the processes underlyingthe data, there appears to be little excuse for using ordinary leastsquares when the independent variables are measured with error.Clearly, the solution is to use models that formally incorporate measurementerror. We avoid the temptation to merely observe that such modelsexist, give a few references, and suggest that people use them. Our suggestionsurely would not be followed, because it is not just a matter of applyinga different statistical method to the same old data. In many cases, a differentkind of data set is required. The reason is that for even the simplest measurementerror models, most data sets do not provide unique identificationof the model parameters, and all the usual methods of parameter estimationand testing will fail. In particular, the likelihood function will not have aunique maximum, and good algorithms for numerical maximization methodswill not converge — or worse, they will simply stop somewhere in a regionwhere the likelihood is flat.In Section 3.1, we discuss the identification problem, and give a simplesolution for regression with measurement error: measure each independentvariable twice, preferably on two different occasions and using different methodsor measuring instruments. If it can be assumed that the measurementerrors on the two occasions are uncorrelated, the model will be identified inthe structural parameters, regardless of the number of independent variables.We call this kind of data configuration a test-retest design. It is just a specialcase of instrumental variables; for example, see Fuller (1968). But weadvance this terminology (inspired by “test-retest reliability” in psychometrics)in the hope that it will facilitate the collection of data sets that allowpractitioners to carry out successful measurement error regressions withouthaving to struggle with model identification.By following the “test-retest design” recipe, scientists and undergraduateswithout much mathematical background should have no trouble performing7

draft - Toronto and Region Conservation Authority
draft - Toronto and Region Conservation Authority
PDF Format, Slides - University of Toronto
EJMiller_Workshop_Nov-25-10 - Cities Centre - University of Toronto
Sharp Spectral Asymptotics - Victor Ivrii - University of Toronto
Research in Action 2008 - University of Toronto
The Infant with a Cough A case - CEPD University of Toronto
Draft Report: America's Children and the Environment: A First - Inches
HIV, HCV and STI infection in Canada - University of Toronto
In Praise of Weakness - Department of Physics - University of Toronto
SCIENTIFIC ACTIVITIES - Fields Institute - University of Toronto
Chapter 2 - Memorial University of Newfoundland
CCAP transport NAMAs paper FINAL DRAFT - India Environment ...
WEMPA working paper-01 - VU University, Institute for ...
Paper Title - Civil Engineering - University of Toronto
Inference for bounded parameters - University of Toronto
Theoretical Statistics and Asymptotics - University of Toronto
McMaster University - University of Toronto
Likelihood inference for complex data - University of Toronto
Paper - University of Toronto Dynamic Graphics Project
Summer 2008 exam (with partial solutions) - University of Toronto