26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.1 The Dirichlet Process<br />

the per<strong>for</strong>mance of the models using these different prior specifications have not been<br />

compared. We present an empirical study on the choice of the base distribution <strong>for</strong><br />

the DPMoG model. We compare the computational cost and modeling per<strong>for</strong>mance<br />

of using conjugate and conditionally conjugate base distributions. When the data is<br />

believed to have a lower dimensional latent structure, it is possible to incorporate this<br />

prior knowledge to the model structure using a mixture of factor analyzers (MFA) model.<br />

In Section 3.4 we <strong>for</strong>mulate the Dirichlet process mixture of factor analyzers (DPMFA)<br />

model and present experimental results on a challenging clustering problem, known as<br />

spike sorting. We conclude this chapter with a discussion in Section 3.5.<br />

3.1 The Dirichlet Process<br />

Let X be a space and A be a σ-field of subsets of X . A stochastic process G on (X , A)<br />

is said to be a Dirichlet process (DP) with parameters α and G0 if <strong>for</strong> any partition<br />

(A1, . . . , Ak), on the space of support of G0, the random vector (G(A1), . . . , G(Ak)) has<br />

a k-dimensional Dirichlet distribution 1 with parameter (αG0(A1), . . . , αG0(Ak)), that<br />

is:<br />

(G(A1), . . . , G(Ak)) ∼ D((αG0(A1), . . . , αG0(Ak))). (3.2)<br />

We denote the random probability measure G that has a DP distribution with concentration<br />

parameter α and base distribution G0 by:<br />

G ∼ DP (α, G0). (3.3)<br />

Ferguson (1973) establishes the existence of the DP by verifying the Kolmogorov consistency<br />

conditions, appendix B.3.<br />

Some authors define the DP using a single parameter by combining the two parameters<br />

to <strong>for</strong>m the random measure α = αG0. Denoting the space of support of G0 as X , the<br />

mass of the random measure would be given as α = α(X ), and the base distribution<br />

as G0(·) = α(·)<br />

α(X ) . In the following, we use the two-parameter notation to denote the<br />

random distribution G with a DP prior, eq. (3.3).<br />

Some important properties of the DP are as follows:<br />

• The mean of the process is the base distribution, E{G} = G0. Thus, G0 can be<br />

thought of as the prior guess of the shape of the distribution of G.<br />

• Given samples θ1, . . . , θn from G, the posterior distribution is also a DP:<br />

<br />

G|θ1, . . . , θn ∼ DP α + n, αG0 + n α + n<br />

i=1<br />

δθi (·)<br />

<br />

. (3.4)<br />

Note that the concentration parameter becomes α + n after observing n samples,<br />

and the contribution of the prior base distribution G0 is scaled by α. Thus, the<br />

1 The definition of the Dirichlet distribution and some of its properties necessary <strong>for</strong> understanding the<br />

DP are given in Appendix B.1.<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!