3 years ago

First Draft of the paper - University of Toronto

First Draft of the paper - University of Toronto


Again there are two correlated latent variables ξ 1 and ξ 2 , only this timethey are binary. The corresponding observable variables X 1 and X 2 are alsobinary. There is a binary dependent variable Y that is dependent upon ξ 1and conditionally independent of ξ 2 .The components of the measurement error model are two-way tables ofthe joint probabilities of ξ 1 and ξ 2 , ξ 1 and X 1 , and ξ 2 with X 2 . The valueswe used are given in Table 5.Table 5: Joint probabilities for the classification error modelξ 1ξ 2 0 10 0.40 0.101 0.10 0.40X 1ξ 1 0 10 0.30 0.201 0.20 0.30X 2ξ 2 0 10 0.45 0.051 0.05 0.45The data were constructed by first sampling a (ξ 1 , ξ 2 ) pair from a multinomialdistribution, and then simulating X 1 conditionally on ξ 1 and X 2 conditionallyon ξ 2 . Finally, we generated Y conditionally on ξ 1 using P (Y =0|ξ 1 = 0) = P (Y = 1|ξ 1 = 1) = 0.80. Repeating this process n = 250times yielded a simulated data set of (X 1 , X 2 , Y ) triples. We then testedfor conditional independence of X 2 and Y given X 1 , as a surrogate for forthe conditional independence of ξ 2 and Y given ξ ! . Specifically, we used R’sloglin function to fit a hierarchical loglinear model with an association betweenX 1 and X 2 , and between X 1 and Y . Comparing this to a saturatedmodel, we calculated a large-sample likelihood ratio test of conditional independencewith two degrees of freedom. In 1,000 independent repetitionsof this process, the null hypothesis was incorrectly rejected 983 times at the0.05 level.Factorial ANOVA with classification error In an unbalanced factorialdesign with a quantitative dependent variable, the usual approach — sayusing the Type III sums of squares of SAS proc glm (SAS Institute Inc.,1999) — is to test each main effect controlling for all the others as well asthe interactions. We now report a quick simulation showing that in a twofactordesign, if factor level membership is subject to classification error inone of the independent variables, then Type I error may be inflated in testingfor a main effect of the other independent variable.40

We started with two correlated binary latent independent variables ξ 1and ξ 2 , and their corresponding observable versions X 1 and X 2 , constructedaccording to the same classification error model we used for loglinear models;see Table 5. We then generated the dependent variable as Y = 1 + ξ 1 + ζ,where ζ is Normal with mean zero and variance 1. Because ξ 4 1 is Bernoulliwith probability one-half, its variance is also 1 , and it accounts for half the4variance in Y . Conditionally upon the latent (true) independent variable ξ 1 ,Y is independent of ξ 2 and there is no interaction.Repeating this process n = 200 times yielded a simulated data set of(X 1 , X 2 , Y ) triples. As usual, we conducted the analysis using the observablevariables X 1 and X 2 in place of ξ 1 and ξ 2 respectively, ignoring themeasurement error. We fit a regression model with effect coding and a productterm for the interaction, and tested for a main effect of X 2 at the 0.05level with the usual F test. Again, this is equivalent to the test based onType III sums of squares in SAS proc glm. Conducting this test on 1,000simulated data sets, we incorrectly rejected the null hypothesis 995 times.Discarding data to get equal sample sizes in factorial ANOVA InSection 1, we saw that inflation of Type I error arises not just from measurementerror in the independent variables, but from the combination ofcorrelated independent variables and measurement error in the one for whichone is attempting to “control.” Now sometimes, researchers (not statisticians,we hope) randomly discard data from observational studies to obtainbalanced factorial designs, and it might be tempting to try this here to eliminatethe correlation between independent variables. It doesn’t work, though,because it is association between the latent independent variables that is thereal source of the problem.To verify this, we simulated random sets of data exactly as in the lastexample, except that when one of the four combinations of X 1 , X 2 valuesreached 50 observations, we discarded all subsequent observations in thatcell, continuing until we had 50 data values in each of the four cells. Thenwe tested for a main effect of X 2 (as a surrogate for ξ 2 ) exactly as before.The result was that we wrongly rejected the null hypothesis 919 times in1,000 simulations.Cox Proportional hazards regression with additive measurementerror The last mini-simulation shows that the problem of inflated Type41

draft - Toronto and Region Conservation Authority
draft - Toronto and Region Conservation Authority
Research in Action 2008 - University of Toronto
PDF Format, Slides - University of Toronto
In Praise of Weakness - Department of Physics - University of Toronto
SCIENTIFIC ACTIVITIES - Fields Institute - University of Toronto
EJMiller_Workshop_Nov-25-10 - Cities Centre - University of Toronto
Sharp Spectral Asymptotics - Victor Ivrii - University of Toronto
Chapter 2 - Memorial University of Newfoundland
The Infant with a Cough A case - CEPD University of Toronto
Draft Report: America's Children and the Environment: A First - Inches
HIV, HCV and STI infection in Canada - University of Toronto
CCAP transport NAMAs paper FINAL DRAFT - India Environment ...
WEMPA working paper-01 - VU University, Institute for ...
Paper Title - Civil Engineering - University of Toronto
Inference for bounded parameters - University of Toronto
Theoretical Statistics and Asymptotics - University of Toronto
McMaster University - University of Toronto
Likelihood inference for complex data - University of Toronto
Summer 2008 exam (with partial solutions) - University of Toronto
Paper - University of Toronto Dynamic Graphics Project