26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2 MCMC Sampling algorithms <strong>for</strong> IBLF models<br />

as in the parametric models. Updating Z is more involved, since we have to deal with<br />

the infinite dimensionality. There<strong>for</strong>e, in the rest of this section, we will focus on updates<br />

concerning Z.<br />

In the following, we describe several different MCMC algorithms <strong>for</strong> inference on the<br />

IBLF models. Conjugacy of the distribution of the feature parameters to the likelihood<br />

function makes inference computationally easier. However, requiring conjugacy limits<br />

the use of IBP. We start with describing sampling <strong>for</strong> conjugate IBLF models and<br />

continue with other sampling algorithms that do not require conjugacy.<br />

As <strong>for</strong> the DP, the sampling algorithms <strong>for</strong> IBLF models may be divided into two<br />

subgroups: the ones that integrate out the feature presence probabilities µk, and the<br />

ones that explicitly represent these probabilities using the stick-breaking construction.<br />

We present algorithms that use these different approaches in the following subsections.<br />

All methods described below update each entry of Z incrementally. Note that in the<br />

IBP, the customers use different criteria <strong>for</strong> choosing dishes already sampled and <strong>for</strong><br />

choosing new dishes. As a consequence, the methods that use the representation where<br />

µk is integrated out employ different update rules <strong>for</strong> the features owned by other data<br />

points and <strong>for</strong> the (existing or new) unique features of the data point being considered.<br />

On the other hand, <strong>for</strong> the methods that use the stick-breaking construction, we need<br />

not make such a distinction.<br />

In the following sections, Z denotes the binary matrix (with unbounded number of<br />

columns), zik an entry of Z. The index i = 1, . . . , N runs over the rows, and k = 1, . . . , K<br />

runs over the columns of Z. Similar to the terminology used in the previous chapter <strong>for</strong><br />

the DPM models, the columns of Z with non-zero entries will be referred to as the active<br />

features, and the columns of zeros as the inactive features. Although Z has infinitely<br />

many columns, only finitely many of them will be active. We denote the number of active<br />

features with K ‡ and the number of features explicitly represented <strong>for</strong> the computations<br />

(which includes all active features and possibly some inactive features) with K † . The<br />

variable m is used to denote the number of entries that are set to 1 in a specified part<br />

of Z, e.g. m.,k means number of 1s in the kth column, and m. 0 is proportional to the product of<br />

the prior given in eq. (4.7) and the likelihood;<br />

p(zik = 1 | Z−ik, X, Θ, Φ) ∝ m−i,k<br />

N F (X | zik = 1, Z−ik, Θ, Φ). (4.36)<br />

83

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!