final report - probability.ca
final report - probability.ca
final report - probability.ca
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
1.4 Roberts and Rosenthal (1998): The MALA Algorithm<br />
Results similar to the paper discussed in the previous section <strong>ca</strong>n be obtained for other Metropolis-Hastings algorithms.<br />
Following the approach of [RGG97], Roberts and Rosenthal (1998) [RR98] demonstrate that the Metropolis-<br />
Adjusted Langevin Algorithm (MALA, see 1.2.1), applied to target distributions of the form (1.8), is optimized<br />
by fixing the variance of the proposal distribution to be O ( d −1/3) , which yields an overall asymptoti<strong>ca</strong>lly optimal<br />
acceptance rate close to 0.574 and requires O ( d 1/3) steps to converge. This compares favorably to the RWM algorithm,<br />
which require O (d) steps to explore the same class of target measures. The trade off being the need to<br />
<strong>ca</strong>lculate the gradient of the Markov chain at every iteration, adding to the computational burden.<br />
1.4.1 Target distribution and assumptions<br />
Consider Metropolis-adjusted discrete approximations {X t } to the Langevin diffusion for π as above, with<br />
( )<br />
π x (d) , d =<br />
d∏<br />
f (x i ) =<br />
i=1<br />
d∏<br />
exp {g (x i )} . (1.12)<br />
Assume X 0 ∼ π, i.e., the chain starts out in stationarity. Furthermore, we assume that g is a C 8 function with<br />
derivatives g (i) satisfying<br />
∣<br />
|g (x)| , ∣g (i) (x) ∣ ≤ M 0 (x) ,<br />
for i ∈ [1, 8] for some polynomial M 0 (·), and that<br />
ˆ<br />
R<br />
i=1<br />
x k f (x) dx < ∞,<br />
for k = 1, 2, 3, . . . . To be able apply results stemming from SDE theory, we assume that g ′ is a Lipschitz function.<br />
To compare the discrete approximations to limiting continuous-time processes, it is helpful to define the discrete<br />
approximations as jump processes { with } exponential holding times. In particular, let {J t } be a Poisson process<br />
with rate d 1/3 , and let Γ (d) = Γ (d)<br />
t be the d-dimensional jump process defined by Γ (d)<br />
t = X Jt , where we set<br />
t≥0<br />
σd 2 = l2 nd −1/3 in the definitions above, with l an arbitrary positive constant. It is assumed throughout that {X t }<br />
is non-explosive. Let<br />
ˆ ˆ<br />
( { })<br />
π (Y ) q (Y , X)<br />
α (l, d) = π (x) q (x, y) α (x, y, d) dxdy = E min<br />
π (X) q (X, Y ) , 1 , (1.13)<br />
denote the π-average acceptance rate of the algorithm generating Γ.<br />
The first primary result of the paper gives an analytic formula for the asymptotic acceptance <strong>probability</strong> of the<br />
algorithm.<br />
1.4.2 Main Results<br />
Theorem 1.13. The sequence (1.13) converges as d → ∞,<br />
lim α (l, d) = α (l) ,<br />
d→∞<br />
with α (l) = 2Φ ( −Kl 3 /2 ) , where Φ (x) = 1 √<br />
2π<br />
´ x<br />
−∞ e−t2 /2 dt, and<br />
(<br />
)<br />
K = √ 5g ′′′ (X 2 ) − 3g ′′ (X) 3<br />
E > 0,<br />
48<br />
where the expectation is taken over X having density f = exp (g).<br />
The second result of the paper shows that for the target density π as given above and large values of d, a<br />
corresponding MALA algorithm for the target π with proposal variances given by σ (d), and l = σ d d 1/6 , the speed<br />
of the process is given by h (l) = 2l 2 Φ ( −Kl 3 /2 ) . Furthermore, the<br />
)<br />
optimal value ˆl of l which maximizes this speed<br />
function is such that the asymptotic acceptance <strong>probability</strong> α<br />
(ˆl ≈ 0.574.<br />
14