26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3 Dirichlet Process Mixture <strong>Models</strong><br />

(Neal, 2003). Thus, the problem of sampling from an arbitrary distribution reduces to<br />

sampling from uni<strong>for</strong>m distributions.<br />

In Section 3.2.2 we have described an algorithm by Walker and Damien (1998) that<br />

uses auxiliary variables to limit the space of sampling in the Pólya urn representation.<br />

The auxiliary variable in that algorithm is chosen such that it has uni<strong>for</strong>m distribution<br />

defined by the likelihood value. There<strong>for</strong>e, given the auxiliary variable, sampling from<br />

the posterior reduces to sampling from a truncated version of the prior.<br />

In this section, we describe a similar idea applied to the stick-breaking construction<br />

of the DP by Walker (2006) that results in an elegant algorithm which is widely applicable.<br />

The parameters, the mixing proportions and the indicator variables are repeatedly<br />

updated. We introduce the temporary slice variable s when updating the indicators,<br />

and discard it after the indicator update.<br />

The distribution of the auxiliary variable s is defined such that the joint prior of s<br />

and ci is a two-dimensional uni<strong>for</strong>m distribution. Conditioning on s, ci is uni<strong>for</strong>mly<br />

distributed on a limited part of the prior space. Combining this with the likelihood, we<br />

have the conditional posterior of the ci.<br />

Recall that the prior probability of assigning an observation to one of the components<br />

is given by the mixing proportions π = {π1, . . . , π∞},<br />

Multiplying with the likelihood, the posterior is,<br />

P (ci|π) = πci . (3.48)<br />

P (ci|π, xi, θ) ∝ πciF (xi|θci ). (3.49)<br />

We introduce the auxiliary slice variable s such that the joint posterior of the indicator<br />

variable and s is<br />

P (ci, s|π, xi, θ) ∝ I{s < πci }F (xi|θci ). (3.50)<br />

Thus, the distribution of s given π and ci is uni<strong>for</strong>m:<br />

(s|π, ci) ∼ U(0, πci ) = I{s < πci }π−1 ci<br />

and the distribution of ci conditioned also on s is<br />

<br />

F (xi|θci ) if s < πci ,<br />

P (ci|s, π, xi, θ) ∝<br />

0 otherwise.<br />

(3.51)<br />

(3.52)<br />

That is, the probability of assigning ci to components with mixing proportions less than<br />

the slice variable is 0. There<strong>for</strong>e, we only need to consider assignment to one of the<br />

components that have a larger stick length than the slice variable s. This will clearly<br />

be a finite number, rather than the infinitely many components.<br />

Using slice sampling, we only need to represent the mixing proportions and the parameters<br />

of the K † components. We allocate new components only when needed. The<br />

slice value is sampled uni<strong>for</strong>mly between 0 and πci . Note that the stick lengths are not<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!