11.07.2015 Views

[U] User's Guide

[U] User's Guide

[U] User's Guide

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

300 [ U ] 20 Estimation and postestimation commandsare not the same as the model-based β. So, what we are estimating is different, but we still needstandard errors that allow us to make valid statistical inference. So, if the process that we used tocollect the data caused (x i , e i ) to be independently but not identically distributed, then we need touse the robust standard errors to make valid statistical inference about the population parameters b.The robust estimator of variance has one feature that the conventional estimator does not have:the ability to relax the assumption of independence of the observations. That is, if you specify thevce(cluster clustvar) option, it can produce “correct” standard errors (in the measurement sense),even if the observations are correlated.For the automobile data, it is difficult to believe that the models of the various manufacturers aretruly independent. Manufacturers, after all, use common technology, engines, and drive trains acrosstheir model lines. The VW Dasher in the above regression has a measured residual of −2.80. Havingbeen told that, do you really believe that the residual for the VW Rabbit is as likely to be above 0 asbelow? (The residual is −2.32.) Similarly, the measured residual for the Chevrolet Malibu is 1.27.Does that provide information about the expected value of the residual of the Chevrolet Monte Carlo(which turns out to be 1.53)?We need to be careful about picking examples from data; we have not told you about the Datsun210 and 510 (residuals +8.28 and −1.01) or the Cadillac Eldorado and Seville (residuals −1.99 and+7.58), but you should, at least, question the assumption of independence. It may be believable that themeasured mpg given the weight of one manufacturer’s vehicles is independent of other manufacturers’vehicles, but it is at least questionable whether a manufacturer’s vehicles are independent of oneanother.In commands with the vce(robust) option, another option—vce(cluster clustvar)—relaxesthe independence assumption and requires only that the observations be independent across the clusters:. regress mpg weight foreign, vce(cluster manufacturer)Linear regression Number of obs = 74F( 2, 22) = 90.93Prob > F = 0.0000R-squared = 0.6627Root MSE = 3.4071(Std. Err. adjusted for 23 clusters in manufacturer)Robustmpg Coef. Std. Err. t P>|t| [95% Conf. Interval]weight -.0065879 .0005339 -12.34 0.000 -.0076952 -.0054806foreign -1.650029 1.039033 -1.59 0.127 -3.804852 .5047939_cons 41.6797 1.844559 22.60 0.000 37.85432 45.50508It turns out that, in these data, whether or not we specify vce(cluster clustvar) makes littledifference. The VW and Chevrolet examples above were not representative; had they been, theconfidence intervals would have widened. (In the above, manuf is a variable that takes on valuessuch as “Chev.” or “VW”, recording the manufacturer of the vehicle. This variable was created fromvariable make, which contains values such as “Chev. Malibu” or “VW Rabbit”, by extracting the firstword.)As a demonstration of how well clustering can work, in [R] regress we fit a random-effects modelwith regress, vce(robust) and then compared the results with ordinary least squares and the GLSrandom-effects estimator. Here we will simply summarize the results.We start with a dataset on 4,711 women aged 14–46 years. Subjects appear an average of 6.056times in the data; there are a total of 28,534 observations. The model we use is log wage on age,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!