26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Indian Buffet Process <strong>Models</strong><br />

ciate parameters of the existing unique features of object i with some of the auxiliary<br />

parameters and draw values from the prior G0( ˆ θj | ξ) <strong>for</strong> the rest of the auxiliary parameters.<br />

Since each feature has a parameter associated with it, the particular order of<br />

the columns is important, unlike the case in the previous section. There are 2 ˆ K possible<br />

combinations <strong>for</strong> the ˆ K unique features <strong>for</strong> object i. We denote the part of the matrix<br />

with active features that are not unique to object i with Z−K (i) and the auxiliary<br />

features that are not associated with any object other than (possibly) object i with ˆ Zl<br />

where the index l ∈ [1, . . . , 2 ˆ K ] denotes the possible feature settings. (Note that all<br />

entries of ˆ Zl are zero except the ith row.) We evaluate the posterior probabilities of all<br />

possible ˆ Zl and sample from this distribution to decide on which to include. The joint<br />

posterior <strong>for</strong> the ith row of ˆ Zl will consist of the prior Bernoulli probabilities of setting<br />

each feature in the ith row to 0 or 1 (with probability<br />

the data given Z −K (i), θ −K (i), ˆ Zl and ˆ Θ,<br />

P ( ˆ Zl | X, Z −K (i), Θ −K (i), ˆ Θ) ∝ P<br />

α/ ˆ K<br />

N+α/ ˆ ), and the probability of<br />

K<br />

<br />

ˆZl P X | ˆ Zl, Z−K (i), Θ−K (i), ˆ <br />

Θ . (4.39)<br />

The IBLF models have close correspondences with the DPM models which allows<br />

infinite components in the mixture model. The unique features of an object can be<br />

thought of as the singleton components in the DP, and the inactive features as the mixture<br />

components that do not have any data associated with them. The sampling scheme<br />

described in this section (summarized as Algorithm 12) is inspired from ”Algorithm 8”<br />

of Neal (2000) <strong>for</strong> inference in the Dirichlet process models with non-conjugate priors<br />

(described in Section 3.2.2, Algorithm 5). However, note that Neal’s algorithm <strong>for</strong> DP<br />

models is exact, whereas the algorithm described here <strong>for</strong> the IBLF models uses an approximation.<br />

The quality of approximation gets better with larger values of truncation,<br />

however it should be noted that the stationary distribution will be different than the<br />

actual posterior distribution due to the approximation.<br />

4.2.3 Metropolis-Hastings Sampling<br />

As discussed above, not having conjugacy is a problem only when Gibbs sampling <strong>for</strong><br />

the unique features of the data point being considered. Meeds et al. (2007) suggest using<br />

Gibbs sampling <strong>for</strong> the features with m−i,k>0 and treating the unique features separately<br />

using Metropolis-Hastings sampling, which results in the true posterior distribution. We<br />

describe the algorithm below and give a summary in Algorithm 13.<br />

The set of features with m−i,k = 0 contains the finitely many unique features of<br />

data point i and the infinitely many features that do not belong to any of the data<br />

points. Instead of calculating the posterior over the number of new features and sampling<br />

directly from this distribution as in Section 4.2.1, we can propose the number of unique<br />

features ˆ K and the set of parameters associated with them ˆ Θ = ˆ θ 1: ˆ K from a proposal<br />

distribution Q( ˆ K, ˆ Θ) and consider this proposal with a Metropolis Hastings acceptance<br />

86

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!