26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Dirichlet Process Mixture <strong>Models</strong><br />

current duration<br />

Old Faithful data set<br />

previous duration<br />

conjugate model<br />

conditionally conjugate model<br />

kernel density estimation<br />

Figure 3.12: The Old Faithful geyser data set and its density modelled by CDP, CCDP and<br />

KDE. The two dimensional data consists of the durations of the consecutive eruptions<br />

of the Old Faithful geyser.<br />

As a measure <strong>for</strong> modeling per<strong>for</strong>mance, we use the average leave one out predictive<br />

densities. That is, <strong>for</strong> all data sets considered, we leave out one observation, model the<br />

density using all others, and calculate the predictive density on the left-out data point.<br />

We repeat this <strong>for</strong> all data points in the training set and report the average predictive<br />

density.<br />

The mixing of all samplers is equally fast <strong>for</strong> the two dimensional Geyser data set.<br />

There is also not a significant difference in the predictive per<strong>for</strong>mance, see Tables 3.1<br />

and 3.2. However, we can see <strong>for</strong>m the plots in Figure 3.12 that the resulting density<br />

estimates are different <strong>for</strong> all models.<br />

For all data sets, the KDE model has the lowest average leave one out predictive<br />

density, and the conditionally conjugate model has the best (see Table 3.1). To compare<br />

the distribution of the leave one out densities, p-values <strong>for</strong> a paired t-test are given<br />

(Table 3.2). For the Spiral, Iris and Wine data sets, the difference between the predictive<br />

densities of KDE and both DP models were statistically significant.<br />

The main objective of the models presented in this paper is density estimation, but<br />

the models can be used <strong>for</strong> clustering as well by observing the assignment of data points<br />

to model components. Since the number of components change over the chain, one<br />

48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!