1978 Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 Table 6 Comparison of estimators of log Z N = 500 + 5000 Chib Lik Subsampling b = 10 Basic δ = 10 −6 a = 4 Time ratio 12|y) = .07657 ± .00002. variance but makes the bias even more serious. However, amplification helps in such a situation. Under subsampling (b = 10), the amplified estimator (a = 4) essentially removes the bias and reduces the variance. Subsampling and amplification together achieve a better precision per CPU second by a factor of (.203/.0161) 2 /7.1 ≈ 22.4, compared with Chib’s method. For the posterior expectations of β 1 , the amplified estimator (a = 4) performs well under subsampling (b = 10): not only the computational time is lowered by a factor of about 10, but also the bias and the variance are controlled. In fact, it has smaller variance than the crude MonteCarlo estimator by a factor of 7.0 and 3.4 for the posterior mean of β 1 and the posterior probability of (β 1 > 12). This improvement is computationally worthwhile if the baseline measure is already estimated for computing the normalizing constant or if Gibbs sampling iteration is time consuming, for example, in MATLAB programming. 4. Summary We develop a new method for MonteCarlointegration using a Markovchain simulated by an MCMC algorithm. While taking the likelihood approach of Kong et al. (2003), we basically treat the Markovchain scheme as an random design and define a stratified estimator of the baseline measure. The method has the following useful features: (i) It provides not only a point estimate but also an associated variance estimate. (ii) It is applicable for estimating simultaneously the normalizing constant and the expectations of a probability distribution. (iii) It can yield substantially improved accuracy compared with Chib’s estimator and the crude MonteCarlo estimator. At the same time, there remain important questions on the theory and applications of the method. In regard to (i), a large sample theory is needed to formally confirm the appropriateness of the point and variance estimators found in our simulation studies. In regard to (iii), real improvement is obtained only if variance reduction dominates time increase for computing the stratified estimator. The method is expected to achieve a worthwhile trade-off when the transition density is relatively easy to evaluate. Finally, the techniques of subsampling, regulation, and amplification can be employed individually or jointly. The benefits depend on the tuning parameters in addition to the structure of each problem in practice. Further work is desirable on how to automate the choices of these tuning parameters.

Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 1979 Acknowledgment This work was part of the author’s doctoral thesis at the University of Chicago. The author thanks Peter McCullagh and Xiao-Li Meng for advice and support, and also thanks Wei-Biao Wu for pointing to the approximation theory in Section 2.3 and two referees for helpful comments. Appendix A. A.1. Proofof Theorem 1 Without loss of generality, assume that π(ξ)>0 for all ξ ∈ Ξ. The consistency of ˜Z follows from ∣ ∣ 1 n∑ q(x i ) | ˜Z − Z| ∑ ∣n i=1 ξ ˜π(ξ)p(x i; ξ) − 1 n∑ q(x i ) ∣∣∣∣ n p i=1 ∗ (x i ) ∣ + 1 n∑ q(x i ) ∣∣∣∣ n p i=1 ∗ (x i ) − Z ∣ p ∗ (x) ∣∣∣∣ sup ∑ x ∣ ξ ˜π(ξ)p(x; ξ) − 1 · 1 n∑ ∣ ∣ q(x i ) ∣∣∣∣ n ∣p ∗ (x i ) ∣ + 1 n∑ q(x i ) ∣∣∣∣ n p ∗ (x i ) − Z and the uniform convergence of ∑ ξ ˜π(ξ)p(x; ξ) to p ∗(x) on a multiplicative scale for x ∈ X: ∑ ξ ˜π(ξ)p(x; ξ) ∑ ∑ − 1 ∣ p ∗ (x) ∣ ξ |˜π(ξ) − π(ξ)|p(x; ξ) ξ |˜π(ξ) − π(ξ)| . p ∗ (x) min ξ π(ξ) To prove the asymptotic normality, note that for ˜π(ξ) around π(ξ), ⎧ ⎫ ˜Z = 1 n∑ ⎨ q(x i ) n ⎩p i=1 ∗ (x i ) − ∑ q(x i )p(x i ; ξ) ⎬ p 2 ξ ∗ (x [˜π(ξ) − π(ξ)] i) ⎭ + o p(n −1/2 ) i=1 because the remainder term is bounded by { ∣} ∑ 2 n∑ q(x i )p 2 (x i ; ξ) ∣∣∣∣ n ∣[ ∑ ξ i=1 ξ π∗ x i (ξ)p(x i ; ξ)] 3 [˜π(ξ) − π(ξ)] 2 , where π ∗ x i (ξ) lies between π(ξ) and ˜π(ξ). The term of first order is 1 n∑ q(x i ) n p i=1 ∗ (x i ) − ∑ { } 1 n∑ q(x i )p(x i ; ξ) n p 2 ξ i=1 ∗ (x [˜π(ξ) − π(ξ)] i) = 1 n n∑ q(x i ) p ∗ (x i ) − ∑ ξ i=1 The asymptotic variance follows from direct calculation. A.2. Proofof Theorem 2 i=1 [ ] q(x)p(x; ξ) E p∗ p∗ 2(x) [˜π(ξ) − π(ξ)]+o p (n −1/2 ). Rewrite [ ]/ √ 1 n∑ φ † (x i )q(x i ) n[Ẽ(φ) − Ep (φ)]= √ ∑ n ξ ˜π(ξ)p(x ˜Z. i; ξ) i=1 By Theorem 1, the numerator converges to a normal distribution and the denominator converges to the constant Z. The results follows from Slutsky’s theorem.