Monte Carlo integration with Markov chain

1972 Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 0.35 Scheme (i) 0.16 Scheme (ii) 0.3 0.14 0.25 0.12 0.2 0.1 log Z 0.15 log Z 0.08 0.06 0.1 0.04 0.05 0.02 0 0 0.05 0.02 Fig. 1. Boxplots of estimators of log Z. Left to right: basic estimator, subsampled estimator (b=10), and regulated estimator (δ=10 −4 ) and amplified estimator under subsampling (a = 1.5). 2.2. Regulation and amplification The performance of the estimator ˜Z or its subsampled version ˜Z b depends on the properties of the Markov chain such as how quickly the Markov chain mixes and how closely the stationary density p ∗ (x) matches the integrand q(x). For Gibbs sampling, the stationary density is made perfectly proportional to the integrand, and the problem of speeding up the Gibbs sampler has been studied in the MCMC literature (e.g. Gilks et al., 1996; Liu, 2001). We address one additional factor that affects the performance of the estimator ˜Z or its subsampled version ˜Z b . For example, let Ξ and X be the real line and q(x) be exp(−x 2 /2)/ √ 2π. Consider two sampling schemes, where the sequence (x 1 ,...,x n ) converges to the standard normal distribution N(0, 1): (i) ξ t |x t−1 ∼ N(ρx t−1 , 1 − ρ 2 ) and x t |ξ t ∼ N(ρξ t , 1 − ρ 2 ). (ii) ξ t ∼ N(0, 1) and x t |ξ t ∼ N(ρξ t , 1 − ρ 2 ). Scheme (ii) with ρ ≈ 1 is an extreme of those situations where the Markov chain [(ξ 1 ,x 1 ),...,(ξ n ,x n )] mixes well but the transition density p(·; ξ) is narrowly spread; see a generalized Gibbs sampling example in Section 3.2. Under scheme (i), as ρ increases to one, the Markov chain mixes more slowly. The estimator ˜Z has larger variance and more serious bias. Under scheme (ii), the Markov chain mixes perfectly and the estimator ˜Z is unbiased for any 0 < ρ < 1. But, the variance of ˜Z depends on the value of ρ. Ifρ is close to one, the estimator ˜Z has a skewed distribution with a heavy right-tail. Now consider the subsampled estimator ˜Z b . Under scheme (i), the subsampled sequence becomes approximately independent for a large subsampling interval b. The estimator ˜Z b has reduced bias but increased variance, compared with the estimator ˜Z.Ifρ is near one, the estimator ˜Z b tends to yield large overestimates. Under scheme (ii), the estimator ˜Z b has an even skewed distribution, compared with the estimator ˜Z. These different performances are illustrated in Fig. 1, which is based on 5000 simulations of size 1000 and ρ = .9. The preceding discussion makes it clear that a poor performance may be caused by the narrow spread of the transition density p(·; ξ). We propose two modifications of the basic ˜μ or the subsampled ˜μ b . Only those of ˜μ are presented and those of ˜μ b should be understood in a similar manner. As seen from Fig. 1, these techniques are helpful for variance reduction by removing those extreme estimates. First, the spread of the transition density p(·; ξ) is relevant because it affects how uniformly the average n −1∑ n j=1 p(x; ξ j ) converges to the stationary density p ∗ (x) on a multiplicative scale for x ∈ X. Nonuniformity

Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 1973 is most likely to occur on X where p ∗ (x) is close to zero. We consider the regulated estimator ˜μ δ ({x}) = ˆ P({x}) δ ∨[n −1∑ n j=1 p(x; ξ j )] , by censoring n −1∑ n j=1 p(x; ξ j ) from below at δ0. For a real-valued function q(x), the estimator ∫ q(x)d ˜μ δ has asymptotic bias [ ] [ ( q(x) 1 E p∗ − Z = E p∗ q(x) δ ∨ p ∗ (x) δ − 1 ) ] 1 {p∗ (x)

