26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2 MCMC Sampling algorithms <strong>for</strong> IBLF models<br />

∗ µ ∗<br />

µ<br />

Figure 4.6: Pictorial representation of slice sampling <strong>for</strong> IBP. The slice value is sampled uni<strong>for</strong>mly<br />

randomly between 0 and the stick length of the last active feature. The slice<br />

is cut through the stick pieces. The representation is extended to include all features<br />

with stick lengths higher than the slice.<br />

where µ ∗ is a function of µ (1:∞) = {µ (k)} ∞ 1 and Z, and is chosen to be the length of the<br />

stick <strong>for</strong> the last feature with a non-zero entry,<br />

µ ∗ <br />

= min 1, min<br />

k: ∃i,zik=1 µ <br />

(k) . (4.47)<br />

The joint distribution of Z and the auxiliary variable s is<br />

where<br />

s<br />

p(Z, s | x, µ (1:∞)) = p(Z | x, µ (1:∞)) p(s | Z, µ (1:∞)) (4.48)<br />

p(s | Z, µ (1:∞)) = 1<br />

µ ∗ I{0 ≤ s ≤ µ∗ }. (4.49)<br />

Clearly, integrating out s preserves the original distribution over Z, while conditioned<br />

on Z and µ (1:∞), s is simply drawn from (4.46). Given s, the distribution of Z becomes:<br />

p(Z | x, s, µ (1:∞)) ∝ p(Z | x, µ (1:∞)) 1<br />

µ ∗ I{s ≤ µ ∗ } (4.50)<br />

which <strong>for</strong>ces all columns k of Z <strong>for</strong> which µ (k) < s to be zero. This can be interpreted<br />

as the auxiliary variable s cutting a slice through the feature presence probabilities,<br />

leaving only those with a larger weight than s to be updated. Since s is sampled<br />

uni<strong>for</strong>mly between 0 and the height of the last active feature, the features below the<br />

slice are fixed to zero.<br />

Let ˜ K be the maximal feature index with µ ( ˜ K) > s. Thus zik = 0 <strong>for</strong> all k > ˜ K,<br />

and we need only consider updating those features k ≤ ˜ K. Notice that ˜ K serves as a<br />

truncation level insofar as it limits the computational costs to a finite amount without<br />

approximation.<br />

Let K † be an index such that all active features have index k < K † (note that K † itself<br />

would be an empty feature). The computational representation <strong>for</strong> the slice sampler consists<br />

of the slice variables and the first K † features: {s, ˜ K, K † , Z 1:N,1:K †, µ (1:K † ), θ 1:K †}.<br />

The slice sampler proceeds by updating all variables in turn.<br />

The slice variable is drawn from (4.46). If the value of s is less than the last represented<br />

feature presence probability µ (K † ), then we need to extend the representation. We<br />

91

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!