Monte Carlo integration with Markov chain - Department of Statistics
1974 Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 The asymptotic variance of ˜Z can be estimated by ⎧ ⎨ ∫ ( 2 n −1 q(x) ˜Z) ⎩ ∑ξ ˜π(ξ)p(x i; ξ) − dPˆ − ∑ ( ⎫ ∫ 2 q(x)p(x; ξ) ⎬ ˜π(ξ) [ ∑ ξ ξ ˜π(ξ)p(x i; ξ)] 2 d Pˆ − ˜Z) ⎭ . Similar results can be obtained about the estimator Ẽ(φ). Theorem 2. For a real-valued function q(x) whose integral Z is nonzero, assume that var p∗ [φ(x)q(x)/p ∗ (x)] < ∞. The estimator Ẽ(φ) is consistent and asymptotically normal with variance ⎧ ⎨ [ φ n −1 † ] ⎩ var (x)q(x) p ∗ − ∑ p ∗ (x) ξ where φ † (x) = φ(x) − E p (φ). π(ξ)E 2 p ∗ [ φ † (x)q(x)q(x; ξ) p 2 ∗ (x) ] ⎫ / ⎬ Z 2 . ⎭ In general, the set Ξ is not finite and ξ 1 ,...,ξ n are sequentially generated.A complete large sample theory remains to be established. However, a key idea is that under suitable regularity conditions, the Markov chain [(ξ 1 ,x 1 ),...,(ξ n ,x n )] can be closely approximated by the regression process in which ξ 1 ,...,ξ n are fixed realizations, and x 1 ,...,x n are independent and have distributions p(·; ξ 1 ),...,p(·; ξ n ) respectively; see Neumann and Kreiss (1998) and the references therein. This approximation suggests that the point estimators ˜Z and Ẽ(φ) be adaptive to how the indices ξ 1 ,...,ξ n are sequentially generated, just as the least-square estimators of regression coefficients are adaptive to whether the regressors are fixed, random, or auto-correlated. The asymptotic variances of ˜Z and Ẽ(φ) depends on the design of how the indices ξ 1 ,...,ξ n are generated. However, by virtue of the above approximation, we propose estimating the asymptotic variance of ˜Z by the first term ∫ ( ) 2 n −1 q(x) n −1∑ n j=1 p(x; ξ j ) − ˜Z dP ˆ, and estimating that of Ẽ(φ) by ∫ ( ) 2 / n −1 [φ(x) − Ẽ(φ)] 2 q(x) n −1∑ n dPˆ ˜Z 2 . j=1 p(x; ξ j ) The variance estimators involve no knowledge of the design in their formulas, and are adaptive to different designs, just as the usual variance estimators of least-square estimators are adaptive to whether the regressors are fixed, random, or auto-correlated. The asymptotic variance of a subsampled, regulated, or amplified estimator can be estimated in a similar manner. 3. Examples First we present an example where analytical answers are available. Then we apply our method to Bayesian computation for probit regression. We provide two examples where different data augmentation schemes are used for posterior sampling. 3.1. Illustration Consider the bivariate normal distribution with zero mean and variance ( ) 1 4 V = 4 5 2 .
Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 1975 Table 2 Effects of subsampling, regulation, and amplification (log Z) n = 5000 Subsampling b = 10 b = 2 b = 5 b = 10 Regulation Amplification δ = 10 −5 δ = 10 −4 a = 1.5 a = 3 Bias −.00089 −.00047 −.00016 −.00059 −.0016 −.00014 −.000069 Std Dev .000965 .00171 .00506 .00131 .00118 .00662 .0169 Sqrt MSE .00131 .00178 .00506 .00144 .00203 .00662 .0169 Approx Err .00138 .00202 .00483 .00178 .00162 .00662 .0170 Let q(x) be exp(−x ⊤ V −1 x/2). The normalizing constant Z is 2π √ det(V ). Consider the latent variable u which is normal (0, 2 2 ) and has correlation .9 with each component of x = (x 1 ,x 2 ). The Gibbs sampler has two steps in each iteration: sample u t ∼ u|x t−1 and sample x t ∼ x|u t , where both conditional distributions are normal. In our 5000 simulations, the Gibbs sampler is started at (0, 0) and run for n iterations. Chib’s estimator and the likelihood estimator log ˜Z are compared in Table 1.Forn=5000, the mean squared error of the likelihood estimator is smaller than that of Chib’s estimator by a factor of (.0363/.00148) 2 ≈ 600, while the total computational time (for simulation and evaluation) of the likelihood estimator is 16.8 times as large as that of Chib’s estimator. Both the absolute bias and the standard deviation of the likelihood estimator appear to decrease at rate n −1 . As a result, the bias becomes appreciable relative to the standard deviation. The effects of subsampling, regulation, and amplification are presented in Table 2. Forb from 2 to 10, the bias of the subsampled estimator is reduced but the variance is increased gradually. In fact, the subsampled estimator (b = 10) has a skewed distribution with a few extreme overestimates from the 5000 simulations. The skewness is effectively reduced and both the bias and the variance are controlled by regulation (δ = 10 −5 ) or amplification (a = 1.5). Further increase of the smoothing parameter δ keeps down the variance but increases the absolute bias, while that of the scaling parameter a keeps down the absolute bias but increases the variance. Compared with Chib’s estimator, the regulated estimator (δ=10 −5 ) has mean squared error reduced by a factor of (.0363/.00144) 2 ≈ 635 and the amplified estimator (a = 1.5) has that reduced by a factor of (.0363/.00662) 2 ≈ 30 under subsampling (b = 10). This improvement is achieved in total computational time only 2.6 times as large as that of Chib’s estimator. Thus, it is necessary and can be considerably beneficial to combine subsampling with amplification or regulation appropriately. Unlike log ˜Z, the likelihood estimator Ẽ(φ) converges at the standard rate n −1/2 and the bias is negligible relative to the standard deviation. Compared with the crude Monte Carlo estimator, the likelihood estimator has mean squared error reduced by an average factor of 80 for two marginal means, 18 for three second-order moments, and 25 for 38 marginal probabilities (ranging from .05 to .95 by .05). The reduction factors are almost the same for the regulated estimator (δ=10 −5 ) or 40, 12, and 15 for the amplified estimator (a =1.5) under subsampling (b=10). Either estimator requires total computational time only 2.6 times as large as the crude Monte Carlo estimator. The results are partly summarized in Table 3. Finally, the square root of the mean of the variance estimates is near or even above the standard deviation for all the basic and modified likelihood estimators. For comparison, the sample variance divided by n is computed as a variance estimate for the crude Monte Carlo estimator. As expected, the square root of the mean of these variance estimates is seriously below the standard deviation of the crude Monte Carlo estimator. 3.2. Probit regression In probit regression, the responses are independent Bernoulli random variables with pr(y i = 1) = Φ(x ⊤ i β), where x i is the vector of covariates and β is the parameter. Further suppose that the prior on β is N(α,A), multivariate normal