- Text
- Estimator,
- Variance,
- Markov,
- Monte,
- Carlo,
- Likelihood,
- Sampling,
- Gibbs,
- Density,
- Bias,
- Integration,
- Statistics,
- Stat.rutgers.edu

Monte Carlo integration with Markov chain - Department of Statistics

1976 Z. Tan / Journal **of** Statistical Planning and Inference 138 (2008) 1967 – 1980 Table 3 Comparison **of** estimators **of** E(x 1 ) and pr(x 1 > 1.645) n = 5000 CMC Lik Subsampling b = 10 Basic δ = 10 −5 a = 1.5 E(x 1 ) Bias −.00094 −.00014 −.000012 −.00019 −.000082 Std Dev .0604 .00679 .0190 .00736 .00959 Sqrt MSE .0604 .00679 .0190 .00736 .00959 Approx Err .0141 .0143 .0207 .0146 .0149 pr(x 1 > 1.645) Bias −.00024 −.00037 −.000041 −.00024 −.000028 Std Dev .00855 .00232 .00425 .00241 .00254 Sqrt MSE .00855 .00235 .00425 .00241 .00254 Approx Err .00307 .00313 .00431 .00322 .00309 Table 4 Comparison **of** estimators **of** log Z N = 500 + 5000 Chib Lik Subsampling b = 2 b = 5 b = 10 Time ratio

Z. Tan / Journal **of** Statistical Planning and Inference 138 (2008) 1967 – 1980 1977 Table 5 Comparison **of** estimators **of** posterior expectations N = 500 + 5000 E(β 2 |y) pr(β 2 > 2|y) CMC Lik b = 10 CMC Lik b = 10 Mean 1.2893 1.2886 1.2895 .06794 .06713 .06787 Std Dev .0134 .00470 .00532 .00564 .00306 .00351 Approx Err .00666 .00667 .00712 .00356 .00357 .00393 Note: The true values are E(β 2 |y) = 1.28988 ± .00002 and pr(β 2 > 2|y) = .06813 ± .00001. in precision per CPU second, defined as the reciprocal **of** the product **of** the total time and the variance. This comparison does not take account **of** the bias **of** the likelihood estimator, which is in fact **of** a greater magnitude than the standard deviation. In fact, there remain two questions about the likelihood method. Can the computational cost be reduced **with** some but not substantial sacrifice in statistical efficiency? Can the appreciable bias be removed relative to the standard deviation? Remarkably, the two goals can be achieved simultaneously by appropriate subsampling; see Table 5. Forb = 10, the computational cost is lowered by about 10 times. The bias **of** the subsampled estimator is substantially reduced and is negligible in comparison **with** the standard deviation. Subsampling achieves a better precision per CPU second than Chib’s method by a factor **of** (.0211/.00285) 2 /6.5 ≈ 8.4. The posterior expectations are also estimated and the results about the regression coefficient β 2 for X-ray reading are partly summarized in Table 5. Compared **with** the crude **Monte** **Carlo** estimator, the subsampled estimator (b = 10) has reduced variance by a factor **of** 6.3 and 2.6 for the posterior mean **of** β 2 and the posterior probability **of** (β 2 > 2). The variance reduction is computationally worthwhile if the baseline measure is already estimated for computing the normalizing constant, but otherwise is nearly **of**fset by the time ratio **of** about 5 for computing the estimated measure against Gibbs sampling (in C programming). However, the trade-**of**f **of** variance and computational time is definitely in favor **of** the subsampled estimator if programming is done in MATLAB where the time ratio is essentially zero. 3.2.2. Generalized Gibbs sampling The Gibbs sampler using the standard data augmentation may mix slowly in some situations: the **chain** has strong autocorrelation and fails to move freely over the target distribution. A working parameter σ 2 can be introduced to improve mixing (Liu and Wu, 1999; van Dyk and Meng, 2001). Assuming α = 0, the generalized Gibbs sampler has the following steps in each iteration: sample u i t ∼ TN(x⊤ i β t−1 , 1,y i ) independently; sample σ 2 t ∼ R t /χ 2 m ; and sample β t ∼ N[˜β t /σ t , Ã], where R t = ∑ m i=1 [u i t − x⊤ i ˜β t ] 2 + ˜β ⊤ t A −1 ˜βt , ˜β t = ÃX ⊤ u t , and Ã = (A −1 + X ⊤ X) −1 . Then the marginal sequence (β 1 ,...,β n ) converges to the posterior β|y. Identification **of** u t as ξ t and β t as x t would require evaluating the corresponding transition density p(·; ξ t ), which is nonstandard. Instead, we identify (u t , σ 2 t ) as ξ t and β t as x t . Then the transition density p(·; ξ t ) is normal **with** mean ˜β t /σ t and variance Ã. Consider Haas’s data about the occurrence **of** latent membraneous lupus nephritis on 55 patients (van Dyk and Meng, 2001). For probit regression **with** the intercept and two covariates, let the prior on β be trivariate normal **with** zero mean and variance diag(100 2 , 100 2 , 100 2 ). The new algorithm **of**fers a clear improvement over the standard Gibbs sampler for posterior sampling. We compare different estimators, using the new algorithm; see Tables 6 and 7. In our 1000 simulations, the algorithm is started at Ã −1 X ⊤ y and run for N = n 0 + n iterations **with** the first n 0 discarded. For this example, the normal kernel p(·; ξ t ) is much more narrowly spread than the posterior and its center ˜β t /σ t moves freely due to the stochastic rescaling by σ −1 t . The situation is similar to scheme (ii) in Section 2.2. For N =500+ 5000, the likelihood estimator log ˜Z has a skewed distribution **with** a heavy right-tail and has considerable bias relative to standard deviation. The basic subsampled estimator has larger variance and more serious bias. Regulation reduces the

- Page 1 and 2: Journal of Statistical Planning and
- Page 3 and 4: Z. Tan / Journal of Statistical Pla
- Page 5 and 6: Z. Tan / Journal of Statistical Pla
- Page 7 and 8: Z. Tan / Journal of Statistical Pla
- Page 9: Z. Tan / Journal of Statistical Pla
- Page 13 and 14: Z. Tan / Journal of Statistical Pla