26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.1 The Indian Buffet Process<br />

and Ghahramani (2005) show that the distribution over the equivalence classes of the<br />

matrices generated by this model and the IBP are the same, which we briefly present<br />

below. See the referred paper <strong>for</strong> details of the derivation.<br />

Conditioned on µk, the rows of the matrix are assumed to be independent, there<strong>for</strong>e<br />

the joint distribution over a column is<br />

P (zk | µk) = (µk) m.,k (1 − µk) N−m.,k , (4.10)<br />

where m.,k = N i=1 zik, is the number of 1s in column k, i.e. the number of objects that<br />

share the kth feature. We can integrate over µk using the Dirichlet integral (B.11), and<br />

express the joint probability of the column directly in terms of the hyperparameter α:<br />

<br />

P (zk) = P (zk | µk)P (µk) dµk<br />

<br />

Γ(α/K + 1)<br />

=<br />

= α<br />

K<br />

k=1<br />

Γ(α/K) µα/K−1<br />

k<br />

µ m.,k<br />

k (1 − µk) N−m.,k dµk<br />

Γ(m.,k + α/K)Γ(N − m.,k + 1)<br />

.<br />

Γ(N + α/K + 1)<br />

k=1<br />

(4.11)<br />

Assuming the columns to be independent, the distribution over the whole matrix becomes<br />

K<br />

K α Γ(m.,k + α/K)Γ(N − m.,k + 1)<br />

P (Z) = P (zk) =<br />

. (4.12)<br />

K Γ(N + α/K + 1)<br />

Considering the lof -equivalence classes of the matrices, the number of matrices Z defined<br />

by the above generative model that map to the equivalent matrix [Z] is<br />

K!<br />

2N ,<br />

−1<br />

h=1 Kh!<br />

which leads to the following distribution over [Z],<br />

P ([Z]) =<br />

K!<br />

2 N −1<br />

h=1 Kh!<br />

K<br />

k=1<br />

α<br />

K<br />

Γ(m.,k + α/K)Γ(N − m.,k + 1)<br />

. (4.13)<br />

Γ(N + α/K + 1)<br />

This is the distribution of the equivalence classes <strong>for</strong> the binary matrix with K < ∞<br />

columns. The distribution over the matrix with infinitely many columns can be obtained<br />

by separating the terms <strong>for</strong> non-zero columns (i.e., columns with m.,k > 0) and the zero<br />

columns and simply taking the limit as the number of columns K → ∞. The limiting<br />

distribution is found to be equal to eq. (4.2).<br />

We repeat the argument of the previous section: even though Z has infinitely many<br />

columns, its distribution favors sparsity. We can compute the expected number of nonzero<br />

entries in Z <strong>for</strong> the finite case, and take the limit to obtain the expression <strong>for</strong> the<br />

73

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!