- Text
- Estimator,
- Variance,
- Markov,
- Monte,
- Carlo,
- Likelihood,
- Sampling,
- Gibbs,
- Density,
- Bias,
- Integration,
- Statistics,
- Stat.rutgers.edu

Monte Carlo integration with Markov chain - Department of Statistics

1974 Z. Tan / Journal **of** Statistical Planning and Inference 138 (2008) 1967 – 1980 The asymptotic variance **of** ˜Z can be estimated by ⎧ ⎨ ∫ ( 2 n −1 q(x) ˜Z) ⎩ ∑ξ ˜π(ξ)p(x i; ξ) − dPˆ − ∑ ( ⎫ ∫ 2 q(x)p(x; ξ) ⎬ ˜π(ξ) [ ∑ ξ ξ ˜π(ξ)p(x i; ξ)] 2 d Pˆ − ˜Z) ⎭ . Similar results can be obtained about the estimator Ẽ(φ). Theorem 2. For a real-valued function q(x) whose integral Z is nonzero, assume that var p∗ [φ(x)q(x)/p ∗ (x)] < ∞. The estimator Ẽ(φ) is consistent and asymptotically normal **with** variance ⎧ ⎨ [ φ n −1 † ] ⎩ var (x)q(x) p ∗ − ∑ p ∗ (x) ξ where φ † (x) = φ(x) − E p (φ). π(ξ)E 2 p ∗ [ φ † (x)q(x)q(x; ξ) p 2 ∗ (x) ] ⎫ / ⎬ Z 2 . ⎭ In general, the set Ξ is not finite and ξ 1 ,...,ξ n are sequentially generated.A complete large sample theory remains to be established. However, a key idea is that under suitable regularity conditions, the **Markov** **chain** [(ξ 1 ,x 1 ),...,(ξ n ,x n )] can be closely approximated by the regression process in which ξ 1 ,...,ξ n are fixed realizations, and x 1 ,...,x n are independent and have distributions p(·; ξ 1 ),...,p(·; ξ n ) respectively; see Neumann and Kreiss (1998) and the references therein. This approximation suggests that the point estimators ˜Z and Ẽ(φ) be adaptive to how the indices ξ 1 ,...,ξ n are sequentially generated, just as the least-square estimators **of** regression coefficients are adaptive to whether the regressors are fixed, random, or auto-correlated. The asymptotic variances **of** ˜Z and Ẽ(φ) depends on the design **of** how the indices ξ 1 ,...,ξ n are generated. However, by virtue **of** the above approximation, we propose estimating the asymptotic variance **of** ˜Z by the first term ∫ ( ) 2 n −1 q(x) n −1∑ n j=1 p(x; ξ j ) − ˜Z dP ˆ, and estimating that **of** Ẽ(φ) by ∫ ( ) 2 / n −1 [φ(x) − Ẽ(φ)] 2 q(x) n −1∑ n dPˆ ˜Z 2 . j=1 p(x; ξ j ) The variance estimators involve no knowledge **of** the design in their formulas, and are adaptive to different designs, just as the usual variance estimators **of** least-square estimators are adaptive to whether the regressors are fixed, random, or auto-correlated. The asymptotic variance **of** a subsampled, regulated, or amplified estimator can be estimated in a similar manner. 3. Examples First we present an example where analytical answers are available. Then we apply our method to Bayesian computation for probit regression. We provide two examples where different data augmentation schemes are used for posterior sampling. 3.1. Illustration Consider the bivariate normal distribution **with** zero mean and variance ( ) 1 4 V = 4 5 2 .

Z. Tan / Journal **of** Statistical Planning and Inference 138 (2008) 1967 – 1980 1975 Table 2 Effects **of** subsampling, regulation, and amplification (log Z) n = 5000 Subsampling b = 10 b = 2 b = 5 b = 10 Regulation Amplification δ = 10 −5 δ = 10 −4 a = 1.5 a = 3 Bias −.00089 −.00047 −.00016 −.00059 −.0016 −.00014 −.000069 Std Dev .000965 .00171 .00506 .00131 .00118 .00662 .0169 Sqrt MSE .00131 .00178 .00506 .00144 .00203 .00662 .0169 Approx Err .00138 .00202 .00483 .00178 .00162 .00662 .0170 Let q(x) be exp(−x ⊤ V −1 x/2). The normalizing constant Z is 2π √ det(V ). Consider the latent variable u which is normal (0, 2 2 ) and has correlation .9 **with** each component **of** x = (x 1 ,x 2 ). The Gibbs sampler has two steps in each iteration: sample u t ∼ u|x t−1 and sample x t ∼ x|u t , where both conditional distributions are normal. In our 5000 simulations, the Gibbs sampler is started at (0, 0) and run for n iterations. Chib’s estimator and the likelihood estimator log ˜Z are compared in Table 1.Forn=5000, the mean squared error **of** the likelihood estimator is smaller than that **of** Chib’s estimator by a factor **of** (.0363/.00148) 2 ≈ 600, while the total computational time (for simulation and evaluation) **of** the likelihood estimator is 16.8 times as large as that **of** Chib’s estimator. Both the absolute bias and the standard deviation **of** the likelihood estimator appear to decrease at rate n −1 . As a result, the bias becomes appreciable relative to the standard deviation. The effects **of** subsampling, regulation, and amplification are presented in Table 2. Forb from 2 to 10, the bias **of** the subsampled estimator is reduced but the variance is increased gradually. In fact, the subsampled estimator (b = 10) has a skewed distribution **with** a few extreme overestimates from the 5000 simulations. The skewness is effectively reduced and both the bias and the variance are controlled by regulation (δ = 10 −5 ) or amplification (a = 1.5). Further increase **of** the smoothing parameter δ keeps down the variance but increases the absolute bias, while that **of** the scaling parameter a keeps down the absolute bias but increases the variance. Compared **with** Chib’s estimator, the regulated estimator (δ=10 −5 ) has mean squared error reduced by a factor **of** (.0363/.00144) 2 ≈ 635 and the amplified estimator (a = 1.5) has that reduced by a factor **of** (.0363/.00662) 2 ≈ 30 under subsampling (b = 10). This improvement is achieved in total computational time only 2.6 times as large as that **of** Chib’s estimator. Thus, it is necessary and can be considerably beneficial to combine subsampling **with** amplification or regulation appropriately. Unlike log ˜Z, the likelihood estimator Ẽ(φ) converges at the standard rate n −1/2 and the bias is negligible relative to the standard deviation. Compared **with** the crude **Monte** **Carlo** estimator, the likelihood estimator has mean squared error reduced by an average factor **of** 80 for two marginal means, 18 for three second-order moments, and 25 for 38 marginal probabilities (ranging from .05 to .95 by .05). The reduction factors are almost the same for the regulated estimator (δ=10 −5 ) or 40, 12, and 15 for the amplified estimator (a =1.5) under subsampling (b=10). Either estimator requires total computational time only 2.6 times as large as the crude **Monte** **Carlo** estimator. The results are partly summarized in Table 3. Finally, the square root **of** the mean **of** the variance estimates is near or even above the standard deviation for all the basic and modified likelihood estimators. For comparison, the sample variance divided by n is computed as a variance estimate for the crude **Monte** **Carlo** estimator. As expected, the square root **of** the mean **of** these variance estimates is seriously below the standard deviation **of** the crude **Monte** **Carlo** estimator. 3.2. Probit regression In probit regression, the responses are independent Bernoulli random variables **with** pr(y i = 1) = Φ(x ⊤ i β), where x i is the vector **of** covariates and β is the parameter. Further suppose that the prior on β is N(α,A), multivariate normal

- Page 1 and 2: Journal of Statistical Planning and
- Page 3 and 4: Z. Tan / Journal of Statistical Pla
- Page 5 and 6: Z. Tan / Journal of Statistical Pla
- Page 7: Z. Tan / Journal of Statistical Pla
- Page 11 and 12: Z. Tan / Journal of Statistical Pla
- Page 13 and 14: Z. Tan / Journal of Statistical Pla