26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Indian Buffet Process <strong>Models</strong><br />

z<br />

α ξ γ<br />

ik<br />

x i<br />

θ ik<br />

Figure 4.5: Graphical representations of a general latent feature model. The data xi is generated<br />

by latent features, zi, parameters associated with the latent features, θk and other<br />

parameters φ, all of which have assigned prior distributions. The hierarchy can be<br />

extended by specifying priors on the hyperparameters.<br />

summarized as follows:<br />

8<br />

N<br />

(xi | zi, Θ, Φ) ∼ F (xi | zi, Θ, Φ)<br />

Z ∼ IBP (α)<br />

θk ∼ G0(θk | ξ)<br />

Φ ∼ H(Φ | γ)<br />

Φ<br />

(4.34)<br />

where F (·) is the distribution of the data and G0 and H are prior distributions <strong>for</strong> the<br />

parameters, specified by hyperparameters. See Figure 4.5 <strong>for</strong> a graphical representation.<br />

The model can be extended by adding more layers to the hierarchy by specifying priors<br />

<strong>for</strong> the hyperparameters. Note that there are infinitely many latent features. But since<br />

the data distribution does not depend on the inactive features, inference on this model<br />

is still tractable.<br />

We will consider updating each set of variables sequentially. Given the nonparametric<br />

part of the model concerning the latent variables, i.e. given Z and Θ, the updates<br />

<strong>for</strong> the rest of the parameters Φ is just like in the parametric hierarchical models.<br />

Furthermore, although Θ is infinite dimensional, Z will only have finitely many nonzero<br />

columns. Since the likelihood does not depend on the zero columns, the posterior<br />

<strong>for</strong> θk corresponding to the inactive columns of Z is same as the prior. There<strong>for</strong>e, we<br />

only need to update those θk that are associated with the active features. This is again<br />

simply sampling from the conditional posterior of θk,<br />

82<br />

P (θk | X, Z, Φ) ∝ G0(θk | ξ)F (X | Z, Θ, Φ) (4.35)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!