- Text
- Estimator,
- Variance,
- Markov,
- Monte,
- Carlo,
- Likelihood,
- Sampling,
- Gibbs,
- Density,
- Bias,
- Integration,
- Statistics,
- Stat.rutgers.edu

Monte Carlo integration with Markov chain - Department of Statistics

1978 Z. Tan / Journal **of** Statistical Planning and Inference 138 (2008) 1967 – 1980 Table 6 Comparison **of** estimators **of** log Z N = 500 + 5000 Chib Lik Subsampling b = 10 Basic δ = 10 −6 a = 4 Time ratio 12|y) = .07657 ± .00002. variance but makes the bias even more serious. However, amplification helps in such a situation. Under subsampling (b = 10), the amplified estimator (a = 4) essentially removes the bias and reduces the variance. Subsampling and amplification together achieve a better precision per CPU second by a factor **of** (.203/.0161) 2 /7.1 ≈ 22.4, compared **with** Chib’s method. For the posterior expectations **of** β 1 , the amplified estimator (a = 4) performs well under subsampling (b = 10): not only the computational time is lowered by a factor **of** about 10, but also the bias and the variance are controlled. In fact, it has smaller variance than the crude **Monte** **Carlo** estimator by a factor **of** 7.0 and 3.4 for the posterior mean **of** β 1 and the posterior probability **of** (β 1 > 12). This improvement is computationally worthwhile if the baseline measure is already estimated for computing the normalizing constant or if Gibbs sampling iteration is time consuming, for example, in MATLAB programming. 4. Summary We develop a new method for **Monte** **Carlo** **integration** using a **Markov** **chain** simulated by an MCMC algorithm. While taking the likelihood approach **of** Kong et al. (2003), we basically treat the **Markov** **chain** scheme as an random design and define a stratified estimator **of** the baseline measure. The method has the following useful features: (i) It provides not only a point estimate but also an associated variance estimate. (ii) It is applicable for estimating simultaneously the normalizing constant and the expectations **of** a probability distribution. (iii) It can yield substantially improved accuracy compared **with** Chib’s estimator and the crude **Monte** **Carlo** estimator. At the same time, there remain important questions on the theory and applications **of** the method. In regard to (i), a large sample theory is needed to formally confirm the appropriateness **of** the point and variance estimators found in our simulation studies. In regard to (iii), real improvement is obtained only if variance reduction dominates time increase for computing the stratified estimator. The method is expected to achieve a worthwhile trade-**of**f when the transition density is relatively easy to evaluate. Finally, the techniques **of** subsampling, regulation, and amplification can be employed individually or jointly. The benefits depend on the tuning parameters in addition to the structure **of** each problem in practice. Further work is desirable on how to automate the choices **of** these tuning parameters.

Z. Tan / Journal **of** Statistical Planning and Inference 138 (2008) 1967 – 1980 1979 Acknowledgment This work was part **of** the author’s doctoral thesis at the University **of** Chicago. The author thanks Peter McCullagh and Xiao-Li Meng for advice and support, and also thanks Wei-Biao Wu for pointing to the approximation theory in Section 2.3 and two referees for helpful comments. Appendix A. A.1. Pro**of** **of** Theorem 1 Without loss **of** generality, assume that π(ξ)>0 for all ξ ∈ Ξ. The consistency **of** ˜Z follows from ∣ ∣ 1 n∑ q(x i ) | ˜Z − Z| ∑ ∣n i=1 ξ ˜π(ξ)p(x i; ξ) − 1 n∑ q(x i ) ∣∣∣∣ n p i=1 ∗ (x i ) ∣ + 1 n∑ q(x i ) ∣∣∣∣ n p i=1 ∗ (x i ) − Z ∣ p ∗ (x) ∣∣∣∣ sup ∑ x ∣ ξ ˜π(ξ)p(x; ξ) − 1 · 1 n∑ ∣ ∣ q(x i ) ∣∣∣∣ n ∣p ∗ (x i ) ∣ + 1 n∑ q(x i ) ∣∣∣∣ n p ∗ (x i ) − Z and the uniform convergence **of** ∑ ξ ˜π(ξ)p(x; ξ) to p ∗(x) on a multiplicative scale for x ∈ X: ∑ ξ ˜π(ξ)p(x; ξ) ∑ ∑ − 1 ∣ p ∗ (x) ∣ ξ |˜π(ξ) − π(ξ)|p(x; ξ) ξ |˜π(ξ) − π(ξ)| . p ∗ (x) min ξ π(ξ) To prove the asymptotic normality, note that for ˜π(ξ) around π(ξ), ⎧ ⎫ ˜Z = 1 n∑ ⎨ q(x i ) n ⎩p i=1 ∗ (x i ) − ∑ q(x i )p(x i ; ξ) ⎬ p 2 ξ ∗ (x [˜π(ξ) − π(ξ)] i) ⎭ + o p(n −1/2 ) i=1 because the remainder term is bounded by { ∣} ∑ 2 n∑ q(x i )p 2 (x i ; ξ) ∣∣∣∣ n ∣[ ∑ ξ i=1 ξ π∗ x i (ξ)p(x i ; ξ)] 3 [˜π(ξ) − π(ξ)] 2 , where π ∗ x i (ξ) lies between π(ξ) and ˜π(ξ). The term **of** first order is 1 n∑ q(x i ) n p i=1 ∗ (x i ) − ∑ { } 1 n∑ q(x i )p(x i ; ξ) n p 2 ξ i=1 ∗ (x [˜π(ξ) − π(ξ)] i) = 1 n n∑ q(x i ) p ∗ (x i ) − ∑ ξ i=1 The asymptotic variance follows from direct calculation. A.2. Pro**of** **of** Theorem 2 i=1 [ ] q(x)p(x; ξ) E p∗ p∗ 2(x) [˜π(ξ) − π(ξ)]+o p (n −1/2 ). Rewrite [ ]/ √ 1 n∑ φ † (x i )q(x i ) n[Ẽ(φ) − Ep (φ)]= √ ∑ n ξ ˜π(ξ)p(x ˜Z. i; ξ) i=1 By Theorem 1, the numerator converges to a normal distribution and the denominator converges to the constant Z. The results follows from Slutsky’s theorem.

- Page 1 and 2: Journal of Statistical Planning and
- Page 3 and 4: Z. Tan / Journal of Statistical Pla
- Page 5 and 6: Z. Tan / Journal of Statistical Pla
- Page 7 and 8: Z. Tan / Journal of Statistical Pla
- Page 9 and 10: Z. Tan / Journal of Statistical Pla
- Page 11: Z. Tan / Journal of Statistical Pla