Views
3 years ago

# Monte Carlo integration with Markov chain - Department of Statistics

Monte Carlo integration with Markov chain - Department of Statistics

## 1968 Z. Tan / Journal

1968 Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 without requiring the value of its normalizing constant Z. Then a common approach is that, if n is sufficiently large, these points are used as an approximate and dependent sample from the distribution p(x) for Monte Carlo integration. For example, the expectation E p (φ) of a function φ(x) with respect to p(x) can be estimated by the sample average or the crude Monte Carlo estimator Ē(φ) = 1 n n∑ φ(x i ). i=1 However, there are two challenging problems for this inferential approach. First, the Monte Carlo variability of Ē(φ) is generally underestimated by the sample variance 1 n n∑ [φ(x i ) − Ē(φ)] 2 i=1 divided by n, and the amount of underestimation can sometimes be substantial due to serial dependency. Specialized methods by running multiple chains or using batch means are needed to assess the Monte Carlo error (e.g. Gelman and Rubin, 1992; Geyer, 1992). Second, the crude Monte Carlo estimator is not directly applicable to the normalizing constant Z, which may be of interest and part of the computational goal. For example, if q(x) is the product of a prior and a likelihood in Bayesian inference, Z is the predictive density of data.As a result, various methods have been proposed for computing normalizing constants but most of them involve a matching density or additional simulation (e.g. DiCiccio et al., 1997; Gelman and Meng, 1998). We consider a general Markov chain scheme and develop a new method for estimating simultaneously the normalizing constant Z and the expectation E p (φ). Suppose that a Markov chain [(ξ 1 ,x 1 ),...,(ξ n ,x n )] is generated as follows. General Markov chain scheme: at each time t, update ξ t given (ξ t−1 ,x t−1 ) and sample x t from p(·; ξ t ). It is helpful to think of ξ t as an index and x t as a draw given the index. Denote by Ξ the index set. The details of updating ξ t are not needed for estimation, even though these details are essential to simulation. In contrast, the transition density p(·; ξ) is assumed to be completely known on X with respect to μ 0 for ξ ∈ Ξ. This scheme is very flexible and encompasses the following individual cases: (i) a Markov chain (x 1 ,x 2 ,...)with transition density p(·; x t−1 ) by letting ξ t = x t−1 . (ii) a subsampled chain (x 1 ,x b+1 ,x 2b+1 ,...)by letting (ξ t ,x t ) be (x tb ,x tb+1 ) in the original Markov chain. (iii) Gibbs sampling, where the chain (ξ t ,x t ) has a joint stationary distribution p(ξ,x), and ξ t is sampled from the conditional distribution p(ξ; x t−1 ) and x t from the conditional distribution p(x; ξ t ). (iv) Generalized Gibbs sampling, where a stochastic transformation g t of ξ t is inserted after ξ t is sampled and then x t is sampled from p(·; g t (ξ t ))—the index ξ t is expanded to be (ξ t ,g t ). (v) Metropolis–Hastings sampling by identifying the current state as ξ t and the proposal as x t , even though the commonly-referred Metropolis–Hastings chain, that is (ξ 1 ,...,ξ n ), does not admit a transition density. Case (i) is conceptually important but may be restrictive in practice, as we shall explain. In case (ii), the subsampled chain (x 1 ,x b+1 ,x 2b+1 ,...)is Markovian but its transition density is complicated, being the b-step convolution of the original transition density. In case (iii), the joint chain is Markovian on Ξ×X with transition density p(x; ξ)p(ξ; x t−1 ), but integrals of interest are defined on X. The marginal chain (x 1 ,...,x n ) is also Markovian but its transition density involves integrating ξ out of p(x; ξ)p(ξ; x t−1 ), which is typically difficult. A familiar example of case (iii) is data augmentation in Bayesian computation, where x is a parameter, ξ is a latent variable, and q(x) is the product of a prior and a likelihood (Gelfand and Smith, 1990; Tanner and Wong, 1987). Case (iv) is represented by a recent development in data augmentation, where ξ includes not only a latent variable but also a working parameter (Liu and Sabatti, 2000; Liu and Wu, 1999; van Dyk and Meng, 2001).

Z. Tan / Journal of Statistical Planning and Inference 138 (2008) 1967 – 1980 1969 The general description of the scheme allows the situation where x t is sampled component by component such as sample x 1 t from p 1 (x 1 ; ξ t ) and sample x 2 t from p 2|1 (x 2 ; x 1 t , ξ t). Then p(x; ξ t ) is the product p 1 (x 1 ; ξ t )p 2|1 (x 2 ; x 1 , ξ t ). As a result, case (iii) or (iv) is not restricted to two-component Gibbs sampling as it appears to be. Assume that the sequence (x 1 ,...,x n ) converges to a probability distribution as n →∞. Denote by p ∗ (x) the stationary density with respect to μ 0 . It is not necessary that p ∗ (x) be identical to the target density p(x). The sequence (x 1 ,...,x n ) typically converges to p ∗ (x) = p(x) in Gibbs sampling and its generalization [cases (iii) and (iv)]. In contrast, this sequence does not converge to p(x), i.e. p ∗ (x) ̸= p(x), in Metropolis–Hastings sampling where (ξ 1 ,...,ξ n ) converges to p(x) [case (v)]. We develop a general method in Section 2 and illustrate the application to Gibbs sampling and its variation in Section 3. Applications of the general method to rejection sampling and Metropolis–Hastings sampling are presented in Tan (2006). We take the likelihood approach of Kong et al. (2003) in developing the method; see Tan (2004) for the optimality of the likelihood approach in two situations of independence sampling. In that approach, the baseline measure is treated as the parameter in a model and estimated as a discrete measure by maximum likelihood. Consequently, integrals of interest are estimated as finite sums by substituting the estimated measure. Kong et al. (2003) considered independence sampling and the simplest, individual case (i) of Markov chain sampling. We extend the likelihood approach to the general Markov chain scheme by using partial likelihood (Cox, 1975). The approximate maximum likelihood estimator of the baseline measure is ˜μ({x}) = ˆ P({x}) n −1∑ n j=1 p(x; ξ j ) , where ˆ P is the empirical distribution placing mass n −1 at each of the points x 1 ,...,x n . The integral Z = ∫ q(x)dμ 0 is estimated by ∫ ˜Z = q(x)d ˜μ = n∑ i=1 q(x i ) ∑ nj=1 p(x i ; ξ j ) . Note that the same estimator also holds for a real-valued integrand q(x). The expectation E p (φ) = ∫ φ(x)q(x) dμ 0 / ∫ q(x)dμ0 is estimated by ∫ Ẽ(φ) = /∫ φ(x)q(x) d ˜μ q(x)d ˜μ = n∑ i=1 / n∑ φ(x i )q(x i ) q(x i ) ∑ nj=1 ∑ p(x i ; ξ j ) nj=1 p(x i=1 i ; ξ j ) . Further, we modify the basic estimator ˜μ and propose a subsampled estimator ˜μ b , a regulated estimator ˜μ δ , an amplified estimator ˜μ a , and their combined versions. Finally, we introduce approximate variance estimators for the point estimators. Our method can require less total computational time (for simulating the Markov chain and computing the estimators) than statistically inefficient estimation with brute-force increase of sample size to achieve the same degree of accuracy. In three examples, we find that the basic estimator log ˜Z has smaller variance than Chib’s estimator by a factor 100, and the basic estimator Ẽ(φ) has smaller variance than the crude Monte Carlo estimator by a factor of 1–100. Subsampling helps to reduce computational cost while regulation or amplification helps to reinforce statistical efficiency, so that our method achieves overall computational efficiency. We also find that the approximate variance estimators agree with the empirical variances of the corresponding point estimators in all our examples.

Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo
Asteroid orbital inversion using Markov-chain Monte Carlo methods
An Introduction to Monte Carlo Methods in Statistical Physics.
To Do Motivation Monte Carlo Path Tracing Monte Carlo Path ...
Bayesian Analysis with Monte Carlo Markov-Chain Methods
EFFICIENT RISK MANAGEMENT IN MONTE CARLO - Luca Capriotti
Monte Carlo simulation inspired by computational ... - mcqmc 2012
Path Integral Monte Carlo approach to ultracold atomic gases
Past and Future of Monte Carlo in Medical Physics - Department of ...
How and why the Monte Carlo method works (pdf)
Monte Carlo Simulations: Efficiency Improvement Techniques and ...
Monte Carlo simulations for brachytherapy - Carleton University
Escalation Estimating Using Indices and Monte Carlo Simulation
Markov Chain Monte Carlo Methods for Statistical Inference
IRREDUCIBLE MARKOV CHAIN MONTE CARLO ... - LSE Statistics
Markov Chain Monte Carlo - Penn State Department of Statistics ...
Markov chain Monte Carlo methods - the IMM department
Markov Chain Monte Carlo for Statistical Inference - Materials ...
Markov Chains and Monte Carlo Methods - users-deprecated.aims ...
Tutorial on Markov Chain Monte Carlo Simulations and Their ...
Markov Chain Monte Carlo in Conditionally Gaussian State Space ...
Monte Carlo Markov Chain.key
Markov Chain Monte Carlo Lecture Notes
Markov chain Monte Carlo algorithms for Gaussian processes
Introduction to Markov Chain Monte Carlo & Gibbs Sampling
MCMCpack: Markov Chain Monte Carlo in R - Journal of Statistical ...
Markov Chain Monte Carlo and mixing rates - Department of ...
Introduction to Markov Chain Monte Carlo, with R
Markov Chain Monte Carlo Methods with Applications