26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Indian Buffet Process <strong>Models</strong><br />

Algorithm 16 Slice sampling <strong>for</strong> the semi-ordered IBP<br />

The state of the Markov chain consists of the infinite feature matrix Z, the feature<br />

presence probabilities µ1:∞ = µ1, . . . , µ∞ corresponding to each feature column and<br />

the set of infinitely many parameters Θ = {θk} ∞ 1 .<br />

Only the K ‡ active columns of Z up to and including the last active column and the<br />

corresponding parameters are represented.<br />

Repeatedly sample:<br />

Change to SB representation:<br />

Sample µs <strong>for</strong> active features (µ + ) from their posterior, eq. (4.56)<br />

Sample µs <strong>for</strong> inactive components (µ ◦ ) using eq. (4.57) until the smallest µ + is larger<br />

than the smallest µ ◦<br />

Sort columns to have µs in decreasing order<br />

<strong>for</strong> all i = 1, . . . , N do<br />

Do feature updates in the SB representation<br />

end <strong>for</strong><br />

Change to IBP representation:<br />

Remove feature presence probabilities from the representation<br />

Remove inactive feature columns from the representation<br />

<strong>for</strong> all columns k = 1, . . . , K † do {Parameter updates}<br />

Update θk by sampling from its conditional posterior, eq. (4.35)<br />

end <strong>for</strong><br />

results to have a sense of the accuracy of the approximation. Since <strong>for</strong> both cases the<br />

conjugate Gibbs sampling is taken as a basis of comparison, we use a conjugate model.<br />

We choose to use the linear-Gaussian binary latent feature model from Griffiths and<br />

Ghahramani (2005). The model is summarized below, see the referred paper <strong>for</strong> a<br />

detailed description. Each data point xi is assumed to be generated by a combination<br />

of a subset of the rows of A and distorted by spherical Gaussian noise,<br />

xi = ziA + ε, (4.59)<br />

with ε ∼ N (0, σ 2 xI). The infinite dimensional binary vector zi encodes which features<br />

are contributing to xi, and A is a matrix (with infinitely many rows) whose kth row<br />

corresponds to the parameters <strong>for</strong> the kth feature. This model can be interpreted as a<br />

binary factor analyzer with an infinite dimensional latent space. The distribution over<br />

the whole data matrix X can be written as a matrix Gaussian,<br />

X | Z, A, σx ∼ N (ZA, σ 2 xI). (4.60)<br />

Entries of A are drawn i.i.d. from a zero-mean Gaussian with variance σ2 A . We can<br />

96

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!