- Text
- Variables,
- Measurement,
- Regression,
- Parameter,
- Latent,
- Variable,
- Wald,
- Models,
- Likelihood,
- Squares,
- Draft,
- Toronto

First Draft of the paper - University of Toronto

This is illustrated in Figure 1, which shows histograms **of** 20,000 simulatedX 1 values for each **of** **the** four base distributions, with φ 1,2 = 0.75 and areliability **of** 0.90 (**the**se values are part **of** **the** “mild” parameter configurationused in some simulations that come later in **the** **paper**).Figure 1: Simulated values **of** X 1Normal Base DistributionPareto Base DistributionRelative Frequency0.0 0.1 0.2 0.3 0.4 0.5 0.6Relative Frequency0.0 0.1 0.2 0.3 0.4 0.5 0.6−15 −10 −5 0 5 10 15X1−15 −10 −5 0 5 10 15X1T Base DistributionUniform Base DistributionRelative Frequency0.0 0.1 0.2 0.3 0.4 0.5 0.6Relative Frequency0.0 0.1 0.2 0.3 0.4 0.5 0.6−15 −10 −5 0 5 10 15X1−15 −10 −5 0 5 10 15X1As intended, **the** t base distribution yields heavy-tailed symmetric distributions,**the** Pareto yields heavy-tailed nonsymmetric distributions, and **the**uniform yields light-tailed distributions. One thing that is not apparent inFigure 1 is **the** high outliers generated by **the** Pareto base distribution, and**the** high and low outliers generated by **the** t base distribution. For **the** Paretobase distribution, simulated values **of** X 1 range from -1.19 to 14.53; for **the**t, **the**y range from -14.44 to 12.00. Of course **the** true variance **of** X 1 is **the**same regardless **of** **the** base distribution; in this case it is 10/9 ≈ 1.11, and**the** sample variances **of** **the** simulated data are all close to this value.14

1.2.2 ResultsAgain, this is a complete factorial experiment with 5 × 5 × 3 × 5 × 5 ×4 = 7, 500 treatment combinations. Within each treatment combination,we independently generated 10,000 random sets **of** data, yielding 75 millionsimulated data sets in all. For each one, we ignored measurement error, fittedModel (4) and tested H 0 : β 2 = 0 with **the** usual “extra sum **of** squares” F -test. The proportion **of** simulated data sets for which **the** null hypo**the**sis wasrejected at α = 0.05 is a Monte Carlo estimate **of** **the** Type I error rate.Considerations **of** space do not permit us to reproduce **the** entire set **of** resultshere. Instead, we give excerpts that tell **the** main part **of** **the** story, referring**the** reader to www.utstat.toronto.edu/~brunner/MeasurementErrorfor **the** rest. On **the** Web, **the** full set **of** results is available in **the** form **of**a 6-dimensional table with 7,500 cells, and also in **the** form **of** a data filewith 7,500 lines, suitable as input data for fur**the**r analysis. Complete sourcecode for **the** special-purpose fortran programs we wrote is also available fordownload, along with o**the**r supporting materials.Table 1 shows **the** results when all **the** variables are normally distributedand **the** reliabilities **of** both independent variables equal 0.90; that is, only10% **of** **the** variance **of** **the** independent variables arises from measurementerror. In **the** social and behavioral sciences, a reliability **of** 0.90 would beconsidered impressively high, and one might think **the**re was little to worryabout.Table 1 shows that except when **the** latent independent variables ξ 1 andξ 2 are uncorrelated, applying ordinary least squares regression to **the** correspondingobservable variables X 1 and X 2 results in a substantial inflation **of****the** Type I error rate. As one would predict from Expression 5 with θ 1,2 = 0,**the** problem becomes more severe as ξ 1 and ξ 2 become more strongly related,as ξ 1 and Y become more strongly related, and as **the** sample size increases.We view **the**se Type I error rates as shockingly high, even for fairly moderatesample sizes and modest relationships among variables.This pattern **of** results holds for all four base distributions, and for alltwenty-five combinations **of** reliabilities **of** **the** independent variables. In addition,**the** Type I error rates increased with decreasing reliability **of** X 1 ,and decreased with decreasing reliability **of** X 2 , **the** variable being tested.The distribution **of** **the** error terms and independent variables did not mattermuch, though average Type I error rates were slightly lower when **the** basedistribution was **the** skewed and heavy-tailed Pareto. These trends are sum-15

- Page 1 and 2: Inflation of Type I error in multip
- Page 3 and 4: But if the independent variables ar
- Page 5 and 6: sion coefficients are different fro
- Page 7 and 8: and the model is not formally ident
- Page 9 and 10: X i,1 = ν 1 + ξ i,1 + δ i,1X i,2
- Page 11 and 12: the same direction, but if they hav
- Page 13: Thus we may manipulate the reliabil
- Page 17 and 18: marized in Table 1.2.2, which shows
- Page 19 and 20: each value of γ 2 . For each data
- Page 21 and 22: estimation of it is a possibility.
- Page 23 and 24: Γ is an m × p matrix of unknown c
- Page 25 and 26: giving further thought to model ide
- Page 27 and 28: It is instructive to see how this w
- Page 29 and 30: We emphasize that the simulations r
- Page 31 and 32: For the severe parameter configurat
- Page 33 and 34: In Table 4, using the base distribu
- Page 35 and 36: weighted least squares test for the
- Page 37 and 38: Figure 3: Power of the normal likel
- Page 39 and 40: measurement error, this fits neatly
- Page 41 and 42: We started with two correlated bina
- Page 43 and 44: Well-established solutions are avai
- Page 45 and 46: is that the client has data, and li
- Page 47 and 48: University of Wisconsin, Madison.Be
- Page 49: Robustness in the Analysis of Linea