26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Dirichlet Process Mixture <strong>Models</strong><br />

the components are not exchangeable in this representation. There<strong>for</strong>e, caution should<br />

be taken <strong>for</strong> mixing over the cluster labels to avoid clustering bias. Porteus et al. (2006)<br />

discuss this in detail and introduce moves that permutes or swaps the labels to improve<br />

mixing over the cluster labels.<br />

3.3 Empirical Study on the Choice of the Base Distribution<br />

In the previous section, we have described several different algorithms <strong>for</strong> doing inference<br />

on the DPM models with both conjugate and non-conjugate base distribution. We have<br />

seen that inference in the conjugate DPM models is relatively straight<strong>for</strong>ward. For some<br />

models, it is not possible to specify priors conjugate to the likelihood. The question is<br />

whether one should use conjugate priors at all when they are available. It is known<br />

that generally, conjugacy limits the flexibility of the models. Is the computational ease<br />

worth the price of worse modeling per<strong>for</strong>mance? Or, is using conjugate models really<br />

computationally cheaper than the non-conjugate alternatives, and how does the modeling<br />

per<strong>for</strong>mance change with the choice of priors? In this section, we seek to empirically<br />

address these questions using a DP mixture of Gaussians model (DPMoG). We choose<br />

to use a DPMoG because it is one of the most widely used DPM models <strong>for</strong> which it is<br />

possible to employ a conjugate and a conditionally conjugate base distribution.<br />

In the following, we give model <strong>for</strong>mulations <strong>for</strong> both a conjugate and a conditionallyconjugate<br />

base distribution. For both prior specifications, we define hyperpriors on G0<br />

<strong>for</strong> robustness. We refer to the models with the conjugate and the conditionally conjugate<br />

base distributions in short as the conjugate model and the conditionally conjugate<br />

model, respectively. After specifying the model structure, we will discuss in detail<br />

how to do inference on both models. We will show that mixing per<strong>for</strong>mance of the<br />

non-conjugate sampler can be improved substantially by exploiting the conditional conjugacy.<br />

We will present experimental results comparing the modeling per<strong>for</strong>mance of<br />

the two models and the computational cost of the samplers on several data sets.<br />

3.3.1 The Dirichlet Process Gaussian Mixture Model<br />

The finite Gaussian mixture model may be written as:<br />

p(xi|θ1, . . . , θK) =<br />

K<br />

j=1<br />

πjN (xi|µ j, S −1<br />

j ) (3.53)<br />

where θj = {µ j, Sj} is the set of parameters <strong>for</strong> component j, πj are the mixing proportions<br />

(which must be positive and sum to one), µ j is the mean vector <strong>for</strong> component<br />

j, and Sj is the component precision (inverse covariance matrix). Defining a joint prior<br />

distribution G0 on the component parameters and introducing indicator variables, the<br />

40

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!