26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3 Dirichlet Process Mixture <strong>Models</strong><br />

<strong>Bayesian</strong> inference requires assigning prior distributions to all unknown quantities in<br />

a model. In parametric models, the prior distributions are assumed to be of known<br />

parametric <strong>for</strong>m, specified by hyperparameters. Hierarchical models are used when<br />

there is uncertainty about the value of the hyperparameters. The uncertainty about<br />

the parametric <strong>for</strong>m of the prior distribution can be expressed by specifying a prior<br />

distribution on the distribution functions using <strong>Bayesian</strong> nonparametrics. There have<br />

been many <strong>Bayesian</strong> nonparametric priors developed, see <strong>for</strong> example (Freedman, 1963;<br />

Ferguson, 1974; Sibisi and Skilling, 1997). The Dirichlet process (DP) is one of the<br />

most prominent random probability measures due to its richness, computational ease,<br />

and interpretability.<br />

The DP, first introduced by Ferguson (1973), can be defined using several different<br />

perspectives. Ferguson (1973) uses Kolmogorov consistency conditions to define the<br />

DP and also gives an alternative definition using the gamma process. Blackwell and<br />

MacQueen (1973) use exchangeability to show that a generalization of the Pólya urn<br />

scheme leads to the DP. A closely related sequential process is the Chinese restaurant<br />

process (CRP) (Aldous, 1985; Pitman, 2006), a distribution over partitions, which also<br />

results in the DP when each partition is assigned an independent parameter with a<br />

common distribution. A constructive definition of the DP was given by Sethuraman<br />

and Tiwari (1982), which leads to the characterization of the DP as a stick-breaking<br />

prior (Ishwaran and James, 2001). Additionally, the DP can be represented as a special<br />

case of many other random measures, see (Doksum, 1974; Pitman and Yor, 1997; Walker<br />

et al., 1999; Neal, 2001).<br />

The DP is defined by two parameters, a positive scalar α and a probability measure<br />

G0, referred to as the concentration parameter and the base measure, respectively.<br />

The base distribution G0 is the parameter on which the nonparametric distribution is<br />

centered, which can be thought of as the prior guess (Antoniak, 1974). The concentration<br />

parameter α expresses the strength of belief in G0. For small values of α, samples<br />

from a DP is likely to be composed of a small number of atomic measures. For large<br />

values, samples are likely to be concentrated around G0.<br />

The hierarchical models in which the DP is used as a prior over the distribution of the<br />

parameters are referred to as the Dirichlet Process mixture (DPM) models, also called<br />

mixture of Dirichlet process models by some authors due to Antoniak (1974). Putting<br />

a prior G on the distribution of model parameters θ, gives the following model:<br />

xi | θi ∼ F (xi | θi)<br />

θi ∼ G<br />

G ∼ DP (α, G0).<br />

(3.1)<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!