26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3 Dirichlet Process Mixture <strong>Models</strong><br />

different models are depicted in Figure 3.14. This figure shows that the distribution of<br />

the number of components used by the CCDP is much broader and centered on higher<br />

values. The number of data points assigned to the components averaged over different<br />

positions in the chain is depicted in Figure 3.15.<br />

3.3.4 Conclusions<br />

The Dirichlet process mixtures of Gaussians model is one of the most widely used DPM<br />

models. We have presented and compared the conjugate and conditionally conjugate<br />

hierarchical Dirichlet process Gaussian mixture models. We presented two new MCMC<br />

schemes that improve the convergence and mixing properties of the MCMC algorithm<br />

of Neal (2000) <strong>for</strong> the conditionally conjugate Dirichlet process mixture model. The<br />

convergence and mixing properties of the samplers have been demonstrated on example<br />

data sets. The modeling properties of the conjugate and the conditionally conjugate<br />

model have been empirically compared. The predictive accuracy of the CCDP model<br />

is found to be better than the CDP model <strong>for</strong> all data sets considered, the difference<br />

being larger in high dimensions. It is also interesting to note that the CDP model tends<br />

to use less components than the CCDP model.<br />

In the light of the empirical results, we conclude that marginalizing over one of the<br />

parameters by exploiting conditional conjugacy leads to considerably faster mixing in<br />

the conditionally conjugate model. When using this trick, the fully conjugate model<br />

is not necessarily computationally cheaper. The DP Gaussian mixture model with the<br />

more flexible prior specification (conditionally conjugate prior) can be used on higher<br />

dimensional density estimation problems, resulting in better density estimates than the<br />

model with conjugate prior specification.<br />

3.4 Dirichlet Process Mixtures of Factor Analyzers<br />

Factor analysis (FA) is a well known latent variable model that models the correlation<br />

structure in the data. The mixture of factor analyzers (MFA) model combines FA with<br />

a mixture model, allowing each component to have a different latent representation.<br />

The generative model <strong>for</strong> FA is given by x = Λz + µ + ε, where z is the hidden<br />

factor, Λ the factor loading matrix, and ε the measurement noise. The factors and<br />

noise are assumed to be Gaussian distributed, z ∼ N (0, I) and ε ∼ N (0, Ψ) where Ψ is<br />

a diagonal matrix. There<strong>for</strong>e, x is also Gaussian distributed with mean µ and covariance<br />

ΛΛ T + Ψ. Considering a mixture of factor analyzers (MFA) with K components, the<br />

data distribution becomes<br />

p(x) =<br />

K<br />

j=1<br />

πjN (µ j, ΛjΛj T + Ψ), (3.68)<br />

where πj denote the mixing proportions. Each component has a separate mean parameter<br />

µ j and a factor loading matrix Λj. The diagonal uniqueness matrix Ψ is common<br />

<strong>for</strong> all components, capturing the measurement noise in the data.<br />

52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!