26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

B Mathematical Appendix<br />

Multinomial distribution is the distribution of a random variable that can take a<br />

countable number of different values. The conjugate prior <strong>for</strong> the multinomial distribution<br />

is the Dirichlet distribution, which is the multivariate generalization of the Beta<br />

distribution.<br />

The marginal distribution of each πi is beta<br />

It follows from the definition of the Dirichlet distribution and the additive property of<br />

the gamma distribution that if (π1, . . . , πk) ∼ D(α1, . . . , αk) and r1, . . . , rl are integers<br />

such that 0 < r1, . . . , rl = k, then<br />

r1<br />

D( π1,<br />

1<br />

r2<br />

πi . . . ,<br />

r1+1<br />

rl <br />

rl−1+1<br />

r1<br />

πi | α1, . . . , αk) ∼ D( α1,<br />

1<br />

r2<br />

αi . . . ,<br />

r1+1<br />

rl <br />

rl−1+1<br />

In particular, the marginal distribution of each πi is Beta αi, ( k 1 αj)<br />

<br />

− αi .<br />

Scale of the Dirichlet distribution<br />

The first and second moments of πj are given by;<br />

E{πj} = αj<br />

α , Var(πi, πj) = αj(α − αj)<br />

α 2 (α + 1) ,<br />

αi), (B.6)<br />

where α = K<br />

j=1 αj is referred to as the scale of the distribution. The mean of the<br />

distribution does not depend on the scale of the parameters, however scale determines<br />

the spread.<br />

Posterior distribution<br />

The posterior distribution of the πj given multinomially distributed data is:<br />

p(π1, . . . , πk | x1, . . . , xn) = D(α1 + n1, . . . , αk + nk),<br />

where nj is the number of occurrence of the j th event. The parameters of the Dirichlet<br />

distribution can be interpreted as the pseudo observations; <strong>for</strong> larger scale, the distribution<br />

will be more concentrated around the mean, thus the prior will have more effect<br />

on the posterior.<br />

We can marginalize out the multinomial parameters πj using the Dirichlet integral<br />

(B.11) and express the probability of the observation directly in terms of the Dirichlet<br />

parameters:<br />

p(x1, . . . , xn | α1, . . . , αk) =<br />

Γ(α)<br />

Γ( k<br />

l=1 αl + n)<br />

k<br />

j=1<br />

Γ(αj + nj)<br />

Γ(αj)<br />

Thus, the conditional distribution <strong>for</strong> a new observation xn+1 given the previous observations<br />

is:<br />

p(xn+1 = j | x1, . . . , xn, α1, . . . , αk) = αj + nj<br />

k<br />

i=1 αi + n .<br />

120

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!