26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.5 Discussion<br />

3.5 Discussion<br />

In this chapter, we have considered the DPM models <strong>for</strong> density estimation and clustering.<br />

The DP has been introduced by Ferguson (1973) and has been extensively used<br />

especially since the development of MCMC methods <strong>for</strong> inference. There are several<br />

different approaches to define the DP. In the beginning of this chapter, we have summarized<br />

some of these approaches to give an insight about the DP and its properties. The<br />

different ways of defining the same distribution has lead to several different inference<br />

algorithms <strong>for</strong> DPM models. We have outlined some of the MCMC algorithms developed<br />

<strong>for</strong> inference on the DPM models. The list of algorithms that we described is not<br />

exhaustive, but we believe that they give a good overview of the development of the<br />

techniques.<br />

We compared the DPMoG models with conjugate and conditionally conjugate base<br />

distributions. We showed that <strong>for</strong> the inference algorithm we used, mixing time of the<br />

samplers can be vastly improved <strong>for</strong> the conditionally conjugate model by integrating<br />

out one of the parameters while conditioning on the other. Using this improved sampling<br />

scheme, inference <strong>for</strong> the conditionally conjugate model is not always computationally<br />

more expensive than the conjugate one. The relative mixing per<strong>for</strong>mance depends on<br />

the data modeled. The modeling per<strong>for</strong>mance of the conditionally conjugate model<br />

<strong>for</strong> density estimation is found to be always better than the fully conjugate one, the<br />

difference being significant in higher dimensions. These results suggest that one does<br />

not have to resort to using the conjugate base distribution when it does not represent<br />

the prior beliefs.<br />

We have introduced the DPMFA model <strong>for</strong> modeling high dimensional data that is<br />

believed to have a low dimensional representation as mixture of Gaussians with constrained<br />

covariance matrices. We have demonstrated the modeling per<strong>for</strong>mance of the<br />

DPMFA on a challenging clustering problem. Although the resulting clustering is successful,<br />

the incremental sampling algorithm used is not feasible <strong>for</strong> practical use of the<br />

model on high dimensional data and should be improved.<br />

Difference between the DPM models and mixtures with finite but unknown number<br />

of components is that the DPM takes into account the possibility of a yet not observed<br />

data point to be coming from an unrepresented component through the concentration<br />

parameter of the DPM both during training and when the predictive probabilities are<br />

calculated. Although the RJMCMC or BDMCMC give a distribution over the possible<br />

number of components, once an iteration is complete and a number of components<br />

is chosen, it is only those components that are used to explain the new data. The<br />

comparison of the per<strong>for</strong>mance if these models remains to be an interesting question.<br />

65

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!