- Text
- Variables,
- Measurement,
- Regression,
- Parameter,
- Latent,
- Variable,
- Wald,
- Models,
- Likelihood,
- Squares,
- Draft,
- Toronto

First Draft of the paper - University of Toronto

Since ξ 1 and ξ 2 are not available, we might use X 1 and X 2 , fitting **the**model (1) and testing **the** coefficient **of** X 2 with **the** usual t or F test. Thiswhat **the** textbooks are implicitly telling us to do. But unless ξ 1 is alsounrelated to Y , or **the** covariance **of** ξ 1 and ξ 2 is zero, **the** Type I errorrate is inflated. As **the** covariance between ξ 1 and ξ 2 increases, **the** problembecomes worse. As **the** strength **of** relationship between Y and ξ 1 increases,**the** problem becomes worse. As **the** amount **of** error in **the** measurement**of** ξ 1 increases, **the** problem becomes worse. As **the** amount **of** error in **the**measurement **of** ξ 2 increases, **the** problem becomes somewhat less severe. As**the** sample size increases, **the** problem becomes worse, with Type I error ratesapproaching one for large samples under most conditions. The distributions**of** **the** latent variables and measurement errors do not matter much. Fulldetails are given in Sections 1.1 and 1.2.Specialists in measurement error models will not really be surprised by**the** inflation **of** Type I error, but for some reason **the**y seem averse to advertisingit. Perhaps **the**y have a tradition **of** being polite. For exampleCochran (1968), after showing that **the** ordinary least squares estimates arebiased, remarks that “Depending on **the** signs **of** **the** terms . . . , this biascan produce ei**the**r an overestimate or an underestimate, and will disturb **the**Type I errors **of** tests **of** significance.” (page 653). Fuller (1978) knows too, **of**course. In a discussion **of** instrumental variables (more on this later), he mentionsthat “. . . variables whose **the**oretical coefficients are zero are sometimessignificant in ordinary least squares regression” (page 55).So, presumably all **the** experts know that measurement error in **the** independentvariables can inflate Type I error rates, but **the**y seem to be verycalm about it. In contrast, many mainstream statisticians and users **of** multipleregression may be appalled at **the** magnitude **of** **the** effect.It gets worse. Suppose that ξ 1 and ξ 2 are positively correlated, ξ 1 ismeasured with error, **the** coefficient corresponding to ξ 1 is greater than zero,and **the** coefficient corresponding to ξ 2 is not zero this time, but actuallysomewhat less than zero. Especially for small negative values, **the** test **of**X 2 controlling for X 1 will be statistically significant at a high rate by **the**usual tests, and this will be accompanied by a positive estimated regressioncoefficient for X 2 . In this case, one would mistakenly conclude that **the**re isa positive (partial) association between ξ 2 and Y , when in fact it is negative.There is some good news (Lord, 1960). In a designed experiment withrandom assignment to conditions, analysis **of** covariance is not subject toany particular problems, even when **the** covariates are measured with error6

and **the** model is not formally identified. This is because **the** independentvariables for which we are “controlling” are unrelated to those we want totest.There is ano**the**r case where ordinary least squares regression may be usefuleven when **the** independent variables are measured with error. In certainapplications (possibly in business), **the** primary interest may be to build aregression model entirely for purposes **of** prediction, while interpretation **of****the** regression equation may be **of** secondary interest at best. Here, ordinaryleast squares may be a useful tool even in **the** presence **of** measurement error.But when **the** goal **of** a regression analysis is to understand **the** processes underlying**the** data, **the**re appears to be little excuse for using ordinary leastsquares when **the** independent variables are measured with error.Clearly, **the** solution is to use models that formally incorporate measurementerror. We avoid **the** temptation to merely observe that such modelsexist, give a few references, and suggest that people use **the**m. Our suggestionsurely would not be followed, because it is not just a matter **of** applyinga different statistical method to **the** same old data. In many cases, a differentkind **of** data set is required. The reason is that for even **the** simplest measurementerror models, most data sets do not provide unique identification**of** **the** model parameters, and all **the** usual methods **of** parameter estimationand testing will fail. In particular, **the** likelihood function will not have aunique maximum, and good algorithms for numerical maximization methodswill not converge — or worse, **the**y will simply stop somewhere in a regionwhere **the** likelihood is flat.In Section 3.1, we discuss **the** identification problem, and give a simplesolution for regression with measurement error: measure each independentvariable twice, preferably on two different occasions and using different methodsor measuring instruments. If it can be assumed that **the** measurementerrors on **the** two occasions are uncorrelated, **the** model will be identified in**the** structural parameters, regardless **of** **the** number **of** independent variables.We call this kind **of** data configuration a test-retest design. It is just a specialcase **of** instrumental variables; for example, see Fuller (1968). But weadvance this terminology (inspired by “test-retest reliability” in psychometrics)in **the** hope that it will facilitate **the** collection **of** data sets that allowpractitioners to carry out successful measurement error regressions withouthaving to struggle with model identification.By following **the** “test-retest design” recipe, scientists and undergraduateswithout much ma**the**matical background should have no trouble performing7

- Page 1 and 2: Inflation of Type I error in multip
- Page 3 and 4: But if the independent variables ar
- Page 5: sion coefficients are different fro
- Page 9 and 10: X i,1 = ν 1 + ξ i,1 + δ i,1X i,2
- Page 11 and 12: the same direction, but if they hav
- Page 13 and 14: Thus we may manipulate the reliabil
- Page 15 and 16: 1.2.2 ResultsAgain, this is a compl
- Page 17 and 18: marized in Table 1.2.2, which shows
- Page 19 and 20: each value of γ 2 . For each data
- Page 21 and 22: estimation of it is a possibility.
- Page 23 and 24: Γ is an m × p matrix of unknown c
- Page 25 and 26: giving further thought to model ide
- Page 27 and 28: It is instructive to see how this w
- Page 29 and 30: We emphasize that the simulations r
- Page 31 and 32: For the severe parameter configurat
- Page 33 and 34: In Table 4, using the base distribu
- Page 35 and 36: weighted least squares test for the
- Page 37 and 38: Figure 3: Power of the normal likel
- Page 39 and 40: measurement error, this fits neatly
- Page 41 and 42: We started with two correlated bina
- Page 43 and 44: Well-established solutions are avai
- Page 45 and 46: is that the client has data, and li
- Page 47 and 48: University of Wisconsin, Madison.Be
- Page 49: Robustness in the Analysis of Linea