26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

−1<br />

µ y Σy Σy D<br />

ξ<br />

normal<br />

R<br />

3.3 Empirical Study on the Choice of the Base Distribution<br />

Σy<br />

w<br />

D<br />

D −1<br />

β<br />

1<br />

β−D+1<br />

w<br />

Wishart<br />

−1<br />

Wishart inv gamma<br />

µ S<br />

k k<br />

normal Wishart<br />

x i<br />

normal<br />

N<br />

K<br />

hyperpriors<br />

hyperparameters<br />

mixture parameters<br />

observations<br />

Figure 3.11: Graphical representation of the layered structure of the hierarchical priors in the<br />

conditionally conjugate model. The distribution of the component means is independent<br />

from the component precision.<br />

translations, rotations and rescaling of the data. One could equivalently use unit priors<br />

and scale the data be<strong>for</strong>e the analysis since finding the overall mean and covariance of<br />

the data is not the primary concern of the analysis, rather, we wish to find structure<br />

within the data.<br />

In detail, <strong>for</strong> both models the hyperparameters W and β associated with the component<br />

precisions Sj are given the following priors, keeping in mind that β should be<br />

greater than D−1:<br />

W ∼ W(D, 1<br />

D Σx), 1<br />

1<br />

( β−D+1 ) ∼ G(1, D ). (3.61)<br />

For the conjugate model, the priors <strong>for</strong> the hyperparameters ξ and ρ associated with<br />

the mixture means µ j are Gaussian and gamma:<br />

ξ ∼ N (µ x, Σx), ρ ∼ G(1/2, 1/2). (3.62)<br />

For the non-conjugate case, the component means have a mean vector ξ and a precision<br />

matrix R as hyperparameters. We put a Gaussian prior on ξ and a Wishart prior<br />

on the precision matrix:<br />

ξ ∼ N (µ x, Σx), R ∼ W D, (DΣx) −1 . (3.63)<br />

3.3.2 Inference Using Gibbs Sampling<br />

In this section, we describe the MCMC algorithms we utilize <strong>for</strong> inference on the models.<br />

The Markov chain relies on Gibbs updates, where each variable is updated in turn by<br />

sampling from its posterior distribution conditional on all other variables. We repeatedly<br />

sample the parameters, hyperparameters and the indicator variables from their<br />

43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!