Model Quality Report in Business Statistics - Harvard ...

More documents

Recommendations

Info

This is because the weights are still constant within strata. Using the same (ratio) weights in SUDAAN produces a variance estimator different from the number raised one, but not very different. Averaged over 50 samples, the ratio of the variance estimates (naïve estimator to true variance estimator) is from 0.53 to 1.23, but this translates into a ratio of cvs of only 0.88 to 1.08. Nevertheless this suggests that the naïve ratio variance is too close to the expansion variance, and not appropriate to ratio estimation. Following the same approach with regression estimation is not possible because it produces negative weights in some strata, and neither STATA nor SUDAAN allow negative weights. An alternative is to use the constraining options within GES (in this case) to produce strictly positive weights wherever a solution to the calibration equations exists, and to replace the weights (for a whole stratum) with the expansion weights where such a solution does not exist. This guarantees no negative weights, but the extent of replacement and constraining can potentially cause a large increase in the variance. The results are shown in Table 3.5; note, however, that the standard deviation of the point estimates of total seem inconsistent with others presented here for the “regression” estimator. This may be due to the constraining process (see Hedlin, Falvey, Chambers & Kokic 1998). The main indication from this table seems to be that the naïve “regression” variance severely underestimates the actual variability of the estimates obtained with regression weighting. None of the naïve variance estimators considered here seems to offer a good approximation to the underlying variance of the estimates. This means that, where ratio or regression estimators are used in business surveys, appropriate software which explicitly takes account of the estimation model is necessary, and that software which uses only techniques for expansion weighting cannot be safely used. domain Number of simulations Standard deviation of point estimators (% of true total) Average estimated standard error (% of true total) Bias of standard error estimate (% of sd of point estimators) expansion 1000 184.66 185.13 0.26 “regression” 733 14.26 1.59 -88.84 expansion 1000 0.08 0.13 53.97 “regression” 732 6.84 0.03 -99.58 expansion 1000 9.56 9.42 -1.42 “regression” 732 9.67 3.12 -67.78 expansion 1000 0.36 0.35 -3.04 “regression” 733 23.06 0.02 -99.93 total ind1 ind2 reg1 Table 3.5 Comparison of the properties of variance estimators obtained from STATA using the usual expansion weights, and using constrained or adjusted weights (see text for full description) to give a pseudo-regression estimator. 3.3.3 Comparison of software package outputs Taylor variances: We will first look at the only estimation method common to all the packages, expansion estimation, and the “Taylor” (SYG) variance estimator. All the packages 30
produce identical solutions for the variance with the expansion estimator. The two packages which have appropriate processing for the ratio and regression estimators, CLAN and GES, also produce identical estimates for this estimator in standard cases, although the treatment of samples containing zero values of auxiliary variables can give rise to slight differences. Using SUDAAN twice as described in section 2.2.4 to produce a quasi-ratio estimator gives a theoretically correct variance estimator (but doesn’t give a point estimate at all), but with a rather indirect implementation. However, the solution is not the same as the CLAN/GES one. The differences are shown in Table 3.6. The general impression that the industry estimates are quite close whereas the region estimates are way out is broadly indicative of the trends in the remaining parts of the dataset. In general it seems that SUDAAN, even when apparently using a fix to give the correct form of the variance, is not appropriate software for calculating the variance of ratio estimates. Jackknife variances: Only SUDAAN and GES allow the use of jackknife variance estimators in stratified designs. The default in SUDAAN is the drop-one jackknife estimator, and in GES the user must set up jackknife groups. Since the drop-one estimator is preferred (Wolter 1985 p164), this has been used here. In this case, the jackknife estimators of variance from SUDAAN and GES are identical for the number-raised estimator; neither includes the fpc, which must be added later if it is required (see section 3.3.2, above). Domain Total Industry 1 Industry 2 Region 1 Number of simulations Standard deviation of point estimators (% of true total) Average estimated standard error (% of true total) Bias of standard error estimate (% of sd of point estimators 4 ) GES ratio 663 165.37 166.01 0.38 SUDAAN ratio 150 na 314.28 90.04 GES ratio 663 0.11 0.11 -0.70 SUDAAN ratio 150 na 0.11 6.33 GES ratio 663 9.27 8.46 -8.72 SUDAAN ratio 150 na 6.10 -34.18 GES ratio 663 0.43 0.44 1.79 SUDAAN ratio 150 na 37.75 8,634.68 Table 3.6 Summary of the properties of SYG variance estimators for ratio estimation from GES and using an apparently correct fix in SUDAAN. 3.4 General conclusions 1. The variance estimators which are common to several packages do in fact produce the same results in each case, with rounding error contributing only after many significant figures. 4 The SUDAAN figures have been compared with the variance of the GES point estimators; that is, the divisor for biases from both packages is the same in this column. 31
Page 1 and 2: Model Quality Report in Business St
Page 3 and 4: Outline of Model Quality Report Vol
Page 5 and 6: Contents 1 Introduction ...........
Page 7 and 8: The requirements for a variance est
Page 9 and 10: where y k is the value of the study
Page 11 and 12: where n hq is the size of the part
Page 13 and 14: The parameters (2.1)-(2.4) are tota
Page 15 and 16: previous sections. However, in orde
Page 17 and 18: The finite population correction (f
Page 19 and 20: esponse homogeneity groups in such
Page 21 and 22: The four main packages investigated
Page 23 and 24: There is a two-volume manual which
Page 25 and 26: 2.2.6.1 Initial reactions of new us
Page 27 and 28: knowledge from using the software i
Page 29 and 30: log ( y ) = β + β log( x + 1) +
Page 31 and 32: The main conclusion seems to be tha
Page 33: The jackknife estimators are extrem
Page 37 and 38: 4 Variances in STATA/SUDAAN compare
Page 39 and 40: where δ hi ⎧1 if sample unit k i
Page 41: SÄRNDAL, C.-E. & SWENSSON, B. (198

Model Quality Report in Business Statistics - Harvard ...

Create successful ePaper yourself

Delete template?

Save as template?