- Text
- Variables,
- Measurement,
- Regression,
- Parameter,
- Latent,
- Variable,
- Wald,
- Models,
- Likelihood,
- Squares,
- Draft,
- Toronto

First Draft of the paper - University of Toronto

Again **the**re are two correlated latent variables ξ 1 and ξ 2 , only this time**the**y are binary. The corresponding observable variables X 1 and X 2 are alsobinary. There is a binary dependent variable Y that is dependent upon ξ 1and conditionally independent **of** ξ 2 .The components **of** **the** measurement error model are two-way tables **of****the** joint probabilities **of** ξ 1 and ξ 2 , ξ 1 and X 1 , and ξ 2 with X 2 . The valueswe used are given in Table 5.Table 5: Joint probabilities for **the** classification error modelξ 1ξ 2 0 10 0.40 0.101 0.10 0.40X 1ξ 1 0 10 0.30 0.201 0.20 0.30X 2ξ 2 0 10 0.45 0.051 0.05 0.45The data were constructed by first sampling a (ξ 1 , ξ 2 ) pair from a multinomialdistribution, and **the**n simulating X 1 conditionally on ξ 1 and X 2 conditionallyon ξ 2 . Finally, we generated Y conditionally on ξ 1 using P (Y =0|ξ 1 = 0) = P (Y = 1|ξ 1 = 1) = 0.80. Repeating this process n = 250times yielded a simulated data set **of** (X 1 , X 2 , Y ) triples. We **the**n testedfor conditional independence **of** X 2 and Y given X 1 , as a surrogate for for**the** conditional independence **of** ξ 2 and Y given ξ ! . Specifically, we used R’sloglin function to fit a hierarchical loglinear model with an association betweenX 1 and X 2 , and between X 1 and Y . Comparing this to a saturatedmodel, we calculated a large-sample likelihood ratio test **of** conditional independencewith two degrees **of** freedom. In 1,000 independent repetitions**of** this process, **the** null hypo**the**sis was incorrectly rejected 983 times at **the**0.05 level.Factorial ANOVA with classification error In an unbalanced factorialdesign with a quantitative dependent variable, **the** usual approach — sayusing **the** Type III sums **of** squares **of** SAS proc glm (SAS Institute Inc.,1999) — is to test each main effect controlling for all **the** o**the**rs as well as**the** interactions. We now report a quick simulation showing that in a tw**of**actordesign, if factor level membership is subject to classification error inone **of** **the** independent variables, **the**n Type I error may be inflated in testingfor a main effect **of** **the** o**the**r independent variable.40

We started with two correlated binary latent independent variables ξ 1and ξ 2 , and **the**ir corresponding observable versions X 1 and X 2 , constructedaccording to **the** same classification error model we used for loglinear models;see Table 5. We **the**n generated **the** dependent variable as Y = 1 + ξ 1 + ζ,where ζ is Normal with mean zero and variance 1. Because ξ 4 1 is Bernoulliwith probability one-half, its variance is also 1 , and it accounts for half **the**4variance in Y . Conditionally upon **the** latent (true) independent variable ξ 1 ,Y is independent **of** ξ 2 and **the**re is no interaction.Repeating this process n = 200 times yielded a simulated data set **of**(X 1 , X 2 , Y ) triples. As usual, we conducted **the** analysis using **the** observablevariables X 1 and X 2 in place **of** ξ 1 and ξ 2 respectively, ignoring **the**measurement error. We fit a regression model with effect coding and a productterm for **the** interaction, and tested for a main effect **of** X 2 at **the** 0.05level with **the** usual F test. Again, this is equivalent to **the** test based onType III sums **of** squares in SAS proc glm. Conducting this test on 1,000simulated data sets, we incorrectly rejected **the** null hypo**the**sis 995 times.Discarding data to get equal sample sizes in factorial ANOVA InSection 1, we saw that inflation **of** Type I error arises not just from measurementerror in **the** independent variables, but from **the** combination **of**correlated independent variables and measurement error in **the** one for whichone is attempting to “control.” Now sometimes, researchers (not statisticians,we hope) randomly discard data from observational studies to obtainbalanced factorial designs, and it might be tempting to try this here to eliminate**the** correlation between independent variables. It doesn’t work, though,because it is association between **the** latent independent variables that is **the**real source **of** **the** problem.To verify this, we simulated random sets **of** data exactly as in **the** lastexample, except that when one **of** **the** four combinations **of** X 1 , X 2 valuesreached 50 observations, we discarded all subsequent observations in thatcell, continuing until we had 50 data values in each **of** **the** four cells. Thenwe tested for a main effect **of** X 2 (as a surrogate for ξ 2 ) exactly as before.The result was that we wrongly rejected **the** null hypo**the**sis 919 times in1,000 simulations.Cox Proportional hazards regression with additive measurementerror The last mini-simulation shows that **the** problem **of** inflated Type41

- Page 1 and 2: Inflation of Type I error in multip
- Page 3 and 4: But if the independent variables ar
- Page 5 and 6: sion coefficients are different fro
- Page 7 and 8: and the model is not formally ident
- Page 9 and 10: X i,1 = ν 1 + ξ i,1 + δ i,1X i,2
- Page 11 and 12: the same direction, but if they hav
- Page 13 and 14: Thus we may manipulate the reliabil
- Page 15 and 16: 1.2.2 ResultsAgain, this is a compl
- Page 17 and 18: marized in Table 1.2.2, which shows
- Page 19 and 20: each value of γ 2 . For each data
- Page 21 and 22: estimation of it is a possibility.
- Page 23 and 24: Γ is an m × p matrix of unknown c
- Page 25 and 26: giving further thought to model ide
- Page 27 and 28: It is instructive to see how this w
- Page 29 and 30: We emphasize that the simulations r
- Page 31 and 32: For the severe parameter configurat
- Page 33 and 34: In Table 4, using the base distribu
- Page 35 and 36: weighted least squares test for the
- Page 37 and 38: Figure 3: Power of the normal likel
- Page 39: measurement error, this fits neatly
- Page 43 and 44: Well-established solutions are avai
- Page 45 and 46: is that the client has data, and li
- Page 47 and 48: University of Wisconsin, Madison.Be
- Page 49: Robustness in the Analysis of Linea