Monte Carlo integration with Markov chain - Department of Statistics
1970 Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 2. Markov chain simulation For the simplest case (i), i.e. ξ t =x t−1 , Kong et al.’s (2003) model assumes that x t conditional on x t−1 has distribution /∫ p(·; x t−1 ) dμ p(x; x t−1 ) dμ, where μ is a nonnegative measure on X. The likelihood is n∏ [ /∫ ] p(x i ; x i−1 )μ({x i }) p(x; x i−1 ) dμ . i=1 For the general Markov chain scheme, even though the sequence [(ξ 1 ,x 1 ),...,(ξ n ,x n )] is a Markov chain, we consider the model only specifying that x t conditional on ξ t has distribution /∫ p(·; ξ t ) dμ p(x; ξ t ) dμ, where μ is a nonnegative measure on X. The partial likelihood (Cox, 1975) is n∏ [ /∫ ] p(x i ; ξ i )μ({x i }) p(x; ξ i ) dμ . i=1 If ξ t is deterministic given x t−1 , for example ξ t = x t−1 , this likelihood coincides with the full likelihood. Under similar conditions of support and connectivity in Vardi (1985), the maximum likelihood estimate has finite support {x 1 ,...,x n } and satisfies ˆμ({x}) = ˆ P({x}) n −1∑ n j=1 Ẑ −1 (ξ j )p(x; ξ j ) , where Ẑ(ξ j )= ∫ p(x; ξ j ) d ˆμ. This equation involves n unknowns, ˆμ({x 1 }),...,ˆμ({x n }), and practically defies numerical solution for large n. However, the difficulty can be overcome by substituting the true value Z(ξ j ) = ∫ p(x; ξ j ) dμ 0 , known to be one, in the above likelihood equation. The resulting estimator is given by ˜μ in Section 1. In retrospect, the Markov chain scheme basically provides a random design: an index ξ i is stochastically selected and then a draw x i is made from p(·; ξ i ) for 1i n. The estimator ˜Z is a stratified importance sampling estimator using one observation x i per distribution p(·; ξ i ). The estimator Ẽ(φ) is a weighted Monte Carlo estimator: the observations x i have weights proportional to q(x i ) n −1∑ n j=1 p(x i ; ξ j ) . These insights are important in themselves, independent of the model formulation, and are relevant to most of the development here. Now consider Gibbs sampling [case (iii)] where the sequence (x 1 ,...,x n ) converges to the target distribution p(x). By the detailed balance equation, it follows that the average of successive transition densities n −1∑ n j=1 p(x; ξ j ) converges to p(x) pointwise. This result suggests that the estimator ˜Z converge faster than at the standard rate n −1/2 , because n −1∑ n j=1 p(x; ξ j ) serves as the stratified importance sampling density and is asymptotically proportional to the integrand q(x). This super efficiency for estimating the normalizing constant was observed in Kong et al. (2003) and in our simulation studies (e.g. Table 1). Moreover, the convergence result implies that the ratio q(x) n −1∑ n j=1 p(x; ξ j ) converges to the normalizing constant Z at any fixed x ∈ X. For Gibbs sampling, Chib (1995) proposed evaluating the above ratio at a high density point to estimate Z. Therefore, the estimator ˜Z appears to be an average of Chib’s ratios
Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 1971 Table 1 Comparison of estimators of log Z n = 500 n = 1000 n = 2500 n = 5000 Chib Lik Chib Lik Chib Lik Chib Lik Time ratio