26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

µ y Σy<br />

ξ R 0 ν<br />

w<br />

normal<br />

µ<br />

−1<br />

Σy k<br />

normal<br />

D<br />

Wishart<br />

x i<br />

Λ<br />

qk<br />

normal N<br />

1 1<br />

3.4 Dirichlet Process Mixtures of Factor Analyzers<br />

K<br />

Q<br />

σ 1<br />

−1<br />

y<br />

σ 2<br />

Ψ<br />

D −1<br />

inv gamma gamma inv gamma<br />

β<br />

1<br />

inv gamma<br />

hyperpriors<br />

hyperparameters<br />

mixture parameters<br />

observations<br />

Figure 3.16: Graphical representation of the layered structure of the hierarchical priors in the<br />

mixtures of factor analyzers model. <strong>Variable</strong>s are labelled below by the name of<br />

their distribution, and the parameters of these distributions are given above. The<br />

number of observations (N), number of mixture components (K) and the latent<br />

factor dimension (Q) are denoted by the numerals in the lower left hand corner of<br />

the large rectangles.<br />

condition on the latent factor z which results in the likelihood:<br />

p(x|z, cj, µ j, Λj, Ψ) = N (µ j + Λjz, Ψ) (3.76)<br />

This function factorizes over the data dimensions. To make use of this factorization,<br />

we can express the prior distribution on the rows of Λ as p(Λ j<br />

d· ) ∼ N 0, Υj −1 , where<br />

Λ j<br />

d. denotes the dth row of the jth factor loading matrix and Υj is the diagonal matrix<br />

which has ν1, . . . , νQ as its entries.<br />

The likelihood function in eq. (3.76) is used to compute the posteriors of Λj, Ψ and<br />

z. Once Λj and Ψ are updated conditioned on z, we can compute the covariance matrix<br />

Σj = ΛjΛj T + Ψ and condition on the full covariance to update the means and the<br />

indicators using eq. (3.68) with the hidden factors integrated out. The conditional<br />

posteriors <strong>for</strong> these variables are of standard <strong>for</strong>m and there<strong>for</strong>e can be sampled from<br />

easily.<br />

The conditional posterior of the indicator ci is obtained by combining the prior of<br />

the components with the likelihood. The likelihood of the components that have data<br />

(other than xi) associated with them is Gaussian with mean µ j and covariance ΛΛj T +Ψ.<br />

The likelihood pertaining to the remaining infinitely many components is obtained by<br />

integrating over the prior distribution of the parameters,<br />

p(ci = ci ′ <strong>for</strong> all i = i′ |c−i, ξ, R, ν,β, w, α)<br />

α<br />

∝<br />

n − 1 + α ×<br />

<br />

p(xi|µ, Λ, Ψ)p(µ, Λ|ξ, R, ν)dµdΛ.<br />

(3.77)<br />

This integral is not analytically tractable, there<strong>for</strong>e we need to use an inference technique<br />

55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!