30.09.2014 Views

final report - probability.ca

final report - probability.ca

final report - probability.ca

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.4 Metropolis-within-Gibbs algorithms<br />

In this section we move our consideration from Metropolis algorithms that make updates for all components simultaneously<br />

to partially updating algorithms. This means that the updates are chosen to be lower dimensional than<br />

the target density itself. Optimal s<strong>ca</strong>ling in this context requires two choices, one for the s<strong>ca</strong>ling of the proposal<br />

density, and the second for the dimensionality of the proposal. The paper by Neal and Roberts, [NR05], proves<br />

optimality results for the Metropolis-within-Gibbs algorithm with random walk and Langevin updates separately.<br />

In the first instance, they find that full-dimensional random walk Metropolis updates are not better asymptoti<strong>ca</strong>lly<br />

than smaller dimensional updating schemes, and thus, with the added factor of computational cost for higher dimensional<br />

updates (computational overhead at every iteration is a non-decreasing function of the proportion of the<br />

density being updated), it is always optimal to use lower dimensional updates when possible. On the other hand,<br />

they find that full dimensional Langevin algorithms are worthwhile in general, although one needs to consider the<br />

computational overhead associated with their implementation when deciding on optimality.<br />

For RWM/MALA-within-Gibbs algorithms, the implementation of the MCMC algorithm requires choosing d·c d<br />

components, where c d denotes the fixed value of updating proportion, chosen at random at each iteration, and then<br />

attempt to update them jointly according to the RWM/MALA mechanism, respectively. Hence, the two algorithms<br />

(RWM and MALA, respectively) propose new values for coordinate i according to<br />

Y (d)<br />

i = x (d)<br />

i<br />

Y (d)<br />

i = x (d)<br />

i + χ (d)<br />

i<br />

+ χ (d)<br />

i σ d,cd Z i ,<br />

{<br />

σ d,cd Z i + σ2 d,c d<br />

2<br />

∂<br />

(<br />

log π d x (d))} ,<br />

∂x i<br />

where 1 ≤ i ≤ d. The random variables {Z i } i∈[1,d]<br />

are i.i.d. with distributions Z ∼ N (0, 1), and each of the<br />

characteristic functions χ (d)<br />

i are chosen independently of the Z i ’s. At each iteration, a new random subset A is<br />

chosen from {1, 2, . . . , d}, and χ d i = 1 if i ∈ A, and is zero otherwise. The Metropolis-Hastings acceptance rule for<br />

the two above setup when applied to the i.i.d. components target (1.8) yields the acceptance <strong>probability</strong><br />

⎧ (<br />

α c d<br />

d<br />

(x (d) , Y (d)) ⎨ π Y (d)) ) ⎫<br />

q<br />

(Y (d) , x (d) ⎬<br />

= min<br />

⎩ 1, π ( x (d)) q<br />

(x (d) , Y (d)) ⎭ ,<br />

where q (·, ·) is the chosen proposal density. In <strong>ca</strong>se of rejection, we choose X (d)<br />

m<br />

= X (d)<br />

m−1 . As we had done for the<br />

<strong>ca</strong>ses considered previously, we <strong>ca</strong>n use the notion of average acceptance rate as means for measuring efficiency and<br />

optimality of the algorithm. We denote<br />

[ (<br />

a c d<br />

d<br />

(l) = E π α c d<br />

d<br />

X d , Y d)]<br />

⎡ ⎧ (<br />

⎨ π d Y d) ) ⎫⎤<br />

q<br />

(Y d , x d ⎬<br />

= E π<br />

⎣min<br />

⎩ 1, (<br />

π d (x d ) q x d , Y d) ⎦ ,<br />

⎭<br />

where σd,c 2 d<br />

= l 2 d −s . Furthermore, as always, it is assumed throughout that the algorithm starts at stationarity.<br />

The s<strong>ca</strong>ling exponent s is given by s = 1 for RWM and s = 1 3<br />

for MALA.<br />

2.4.1 RWM-within-Gibbs<br />

Consider first the RWM-within-Gibbs algorithm applied to the i.i.d. target density. As was done in [RR98], it is<br />

helpful to express the density in the exponential target form<br />

(<br />

π x (d)) =<br />

d∏ (<br />

f<br />

i=1<br />

x (d)<br />

i<br />

)<br />

=<br />

d∏<br />

i=1<br />

{ (<br />

exp g<br />

The proposal standard deviation is denoted by σ (d) = √ l<br />

d−1<br />

, for some l > 0. Since the target is again formed<br />

of i.i.d. components, all individual components of the target density are equivalent. For convenience, it will be<br />

again useful to consider the first component of the resulting algorithm, with a time-increasing factor of d, which is<br />

denoted by Ut,1, d obtained from<br />

U (d)<br />

t = ( )<br />

X [dt],1 , X [dt],2 , . . . X [dt],d .<br />

x (d)<br />

i<br />

)}<br />

,<br />

24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!