26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2 MCMC Sampling algorithms <strong>for</strong> IBLF models<br />

Algorithm 13 Metropolis-Hastings sampling <strong>for</strong> IBP<br />

The state of the Markov chain consists of the infinite feature matrix Z and the set of<br />

infinitely many parameters Θ = {θk} ∞ 1 .<br />

Only the K ‡ active columns of Z and the corresponding parameters are represented.<br />

Repeatedly sample:<br />

<strong>for</strong> all rows i = 1, . . . , N do {Feature updates}<br />

<strong>for</strong> all columns k = 1, . . . , K ‡ do<br />

if m−i,k > 0 then<br />

Update zik by sampling from its conditional posterior, eq. (4.36).<br />

end if<br />

end <strong>for</strong><br />

Propose a number ˆ K <strong>for</strong> unique features from the prior Poisson(α/N)<br />

Sample ˆ K parameters <strong>for</strong> the unique features. and Θ ˆ K<br />

Evaluate the proposal using the acceptance ratio, eq. (4.40)<br />

end <strong>for</strong><br />

<strong>for</strong> all active columns k = 1, . . . , K ‡ do {Parameter updates}<br />

Update θk by sampling from its conditional posterior, eq. (4.35)<br />

end <strong>for</strong><br />

tial rate, which suggests adapting the truncation used <strong>for</strong> approximating the DP to the<br />

IBP. The bound on the error introduced by the truncation <strong>for</strong> the DP stick-breaking<br />

construction has been calculated by Ishwaran and James (2001). Noting the correspondences<br />

between the stick weights of the IBP and the DP, a similar approach can be used<br />

in this case.<br />

Let M be the truncation level. Setting µ (M+1) = 0 constrains all µ (k) = 0 <strong>for</strong> k > M,<br />

while the joint density <strong>for</strong> µ (1:M) is given as<br />

p(µ (1:M)) =<br />

M<br />

k=1<br />

p(µ (k) | µ (k−1))<br />

M<br />

=α M µ α (M)<br />

k=1<br />

µ −1<br />

(k) I{0 ≤ µ (M) ≤ · · · ≤ µ (1) ≤ 1}<br />

(4.42)<br />

Inference using Gibbs sampling is straight<strong>for</strong>ward to implement on the truncated<br />

model. The entries of Z are independent given µ (1:M), thus<br />

p(Z | µ (1:M)) =<br />

N<br />

M<br />

µ<br />

i=1 k=1<br />

zik<br />

(k) (1 − µ (k)) 1−zik . (4.43)<br />

Since the entries in a column are independent given the feature presence probabilities,<br />

we do not need to worry about whether the other data points have the feature being<br />

updated or not. That is, we do not need separate update rules <strong>for</strong> separate cases in this<br />

89

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!