final report - probability.ca
final report - probability.ca
final report - probability.ca
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2.4 Metropolis-within-Gibbs algorithms<br />
In this section we move our consideration from Metropolis algorithms that make updates for all components simultaneously<br />
to partially updating algorithms. This means that the updates are chosen to be lower dimensional than<br />
the target density itself. Optimal s<strong>ca</strong>ling in this context requires two choices, one for the s<strong>ca</strong>ling of the proposal<br />
density, and the second for the dimensionality of the proposal. The paper by Neal and Roberts, [NR05], proves<br />
optimality results for the Metropolis-within-Gibbs algorithm with random walk and Langevin updates separately.<br />
In the first instance, they find that full-dimensional random walk Metropolis updates are not better asymptoti<strong>ca</strong>lly<br />
than smaller dimensional updating schemes, and thus, with the added factor of computational cost for higher dimensional<br />
updates (computational overhead at every iteration is a non-decreasing function of the proportion of the<br />
density being updated), it is always optimal to use lower dimensional updates when possible. On the other hand,<br />
they find that full dimensional Langevin algorithms are worthwhile in general, although one needs to consider the<br />
computational overhead associated with their implementation when deciding on optimality.<br />
For RWM/MALA-within-Gibbs algorithms, the implementation of the MCMC algorithm requires choosing d·c d<br />
components, where c d denotes the fixed value of updating proportion, chosen at random at each iteration, and then<br />
attempt to update them jointly according to the RWM/MALA mechanism, respectively. Hence, the two algorithms<br />
(RWM and MALA, respectively) propose new values for coordinate i according to<br />
Y (d)<br />
i = x (d)<br />
i<br />
Y (d)<br />
i = x (d)<br />
i + χ (d)<br />
i<br />
+ χ (d)<br />
i σ d,cd Z i ,<br />
{<br />
σ d,cd Z i + σ2 d,c d<br />
2<br />
∂<br />
(<br />
log π d x (d))} ,<br />
∂x i<br />
where 1 ≤ i ≤ d. The random variables {Z i } i∈[1,d]<br />
are i.i.d. with distributions Z ∼ N (0, 1), and each of the<br />
characteristic functions χ (d)<br />
i are chosen independently of the Z i ’s. At each iteration, a new random subset A is<br />
chosen from {1, 2, . . . , d}, and χ d i = 1 if i ∈ A, and is zero otherwise. The Metropolis-Hastings acceptance rule for<br />
the two above setup when applied to the i.i.d. components target (1.8) yields the acceptance <strong>probability</strong><br />
⎧ (<br />
α c d<br />
d<br />
(x (d) , Y (d)) ⎨ π Y (d)) ) ⎫<br />
q<br />
(Y (d) , x (d) ⎬<br />
= min<br />
⎩ 1, π ( x (d)) q<br />
(x (d) , Y (d)) ⎭ ,<br />
where q (·, ·) is the chosen proposal density. In <strong>ca</strong>se of rejection, we choose X (d)<br />
m<br />
= X (d)<br />
m−1 . As we had done for the<br />
<strong>ca</strong>ses considered previously, we <strong>ca</strong>n use the notion of average acceptance rate as means for measuring efficiency and<br />
optimality of the algorithm. We denote<br />
[ (<br />
a c d<br />
d<br />
(l) = E π α c d<br />
d<br />
X d , Y d)]<br />
⎡ ⎧ (<br />
⎨ π d Y d) ) ⎫⎤<br />
q<br />
(Y d , x d ⎬<br />
= E π<br />
⎣min<br />
⎩ 1, (<br />
π d (x d ) q x d , Y d) ⎦ ,<br />
⎭<br />
where σd,c 2 d<br />
= l 2 d −s . Furthermore, as always, it is assumed throughout that the algorithm starts at stationarity.<br />
The s<strong>ca</strong>ling exponent s is given by s = 1 for RWM and s = 1 3<br />
for MALA.<br />
2.4.1 RWM-within-Gibbs<br />
Consider first the RWM-within-Gibbs algorithm applied to the i.i.d. target density. As was done in [RR98], it is<br />
helpful to express the density in the exponential target form<br />
(<br />
π x (d)) =<br />
d∏ (<br />
f<br />
i=1<br />
x (d)<br />
i<br />
)<br />
=<br />
d∏<br />
i=1<br />
{ (<br />
exp g<br />
The proposal standard deviation is denoted by σ (d) = √ l<br />
d−1<br />
, for some l > 0. Since the target is again formed<br />
of i.i.d. components, all individual components of the target density are equivalent. For convenience, it will be<br />
again useful to consider the first component of the resulting algorithm, with a time-increasing factor of d, which is<br />
denoted by Ut,1, d obtained from<br />
U (d)<br />
t = ( )<br />
X [dt],1 , X [dt],2 , . . . X [dt],d .<br />
x (d)<br />
i<br />
)}<br />
,<br />
24