30.09.2014 Views

final report - probability.ca

final report - probability.ca

final report - probability.ca

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.4 Roberts and Rosenthal (1998): The MALA Algorithm<br />

Results similar to the paper discussed in the previous section <strong>ca</strong>n be obtained for other Metropolis-Hastings algorithms.<br />

Following the approach of [RGG97], Roberts and Rosenthal (1998) [RR98] demonstrate that the Metropolis-<br />

Adjusted Langevin Algorithm (MALA, see 1.2.1), applied to target distributions of the form (1.8), is optimized<br />

by fixing the variance of the proposal distribution to be O ( d −1/3) , which yields an overall asymptoti<strong>ca</strong>lly optimal<br />

acceptance rate close to 0.574 and requires O ( d 1/3) steps to converge. This compares favorably to the RWM algorithm,<br />

which require O (d) steps to explore the same class of target measures. The trade off being the need to<br />

<strong>ca</strong>lculate the gradient of the Markov chain at every iteration, adding to the computational burden.<br />

1.4.1 Target distribution and assumptions<br />

Consider Metropolis-adjusted discrete approximations {X t } to the Langevin diffusion for π as above, with<br />

( )<br />

π x (d) , d =<br />

d∏<br />

f (x i ) =<br />

i=1<br />

d∏<br />

exp {g (x i )} . (1.12)<br />

Assume X 0 ∼ π, i.e., the chain starts out in stationarity. Furthermore, we assume that g is a C 8 function with<br />

derivatives g (i) satisfying<br />

∣<br />

|g (x)| , ∣g (i) (x) ∣ ≤ M 0 (x) ,<br />

for i ∈ [1, 8] for some polynomial M 0 (·), and that<br />

ˆ<br />

R<br />

i=1<br />

x k f (x) dx < ∞,<br />

for k = 1, 2, 3, . . . . To be able apply results stemming from SDE theory, we assume that g ′ is a Lipschitz function.<br />

To compare the discrete approximations to limiting continuous-time processes, it is helpful to define the discrete<br />

approximations as jump processes { with } exponential holding times. In particular, let {J t } be a Poisson process<br />

with rate d 1/3 , and let Γ (d) = Γ (d)<br />

t be the d-dimensional jump process defined by Γ (d)<br />

t = X Jt , where we set<br />

t≥0<br />

σd 2 = l2 nd −1/3 in the definitions above, with l an arbitrary positive constant. It is assumed throughout that {X t }<br />

is non-explosive. Let<br />

ˆ ˆ<br />

( { })<br />

π (Y ) q (Y , X)<br />

α (l, d) = π (x) q (x, y) α (x, y, d) dxdy = E min<br />

π (X) q (X, Y ) , 1 , (1.13)<br />

denote the π-average acceptance rate of the algorithm generating Γ.<br />

The first primary result of the paper gives an analytic formula for the asymptotic acceptance <strong>probability</strong> of the<br />

algorithm.<br />

1.4.2 Main Results<br />

Theorem 1.13. The sequence (1.13) converges as d → ∞,<br />

lim α (l, d) = α (l) ,<br />

d→∞<br />

with α (l) = 2Φ ( −Kl 3 /2 ) , where Φ (x) = 1 √<br />

2π<br />

´ x<br />

−∞ e−t2 /2 dt, and<br />

(<br />

)<br />

K = √ 5g ′′′ (X 2 ) − 3g ′′ (X) 3<br />

E > 0,<br />

48<br />

where the expectation is taken over X having density f = exp (g).<br />

The second result of the paper shows that for the target density π as given above and large values of d, a<br />

corresponding MALA algorithm for the target π with proposal variances given by σ (d), and l = σ d d 1/6 , the speed<br />

of the process is given by h (l) = 2l 2 Φ ( −Kl 3 /2 ) . Furthermore, the<br />

)<br />

optimal value ˆl of l which maximizes this speed<br />

function is such that the asymptotic acceptance <strong>probability</strong> α<br />

(ˆl ≈ 0.574.<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!