26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.1 The Indian Buffet Process<br />

number of occupied tables is limited by the number of customers. There is no such limit<br />

in IBP since customers can choose many dishes and the resulting matrix Z potentially<br />

has infinitely many non-zero columns. Nevertheless, the expected number of total dishes<br />

sampled remains finite, since the mean number of new dishes a customer samples decreases<br />

reciprocally. It was assumed that the ith customer chooses Poisson(α/i) number<br />

of new dishes. We can use the additive property of the Poisson distribution to deduce<br />

that the total number of dishes sampled (the number of non-zero columns of Z) follows<br />

a Poisson(αHN) distribution. This means, the effective dimension of the binary matrix<br />

is determined by the IBP parameter α and the number of customers.<br />

The exchangeability of both the customers and the dishes has been established by focusing<br />

on the equivalence classes of the generated matrices using the lof function. The<br />

process starts with the first customer selecting Poisson(α) number of dishes. Making<br />

use of exchangeability, any customer can be the first one to choose. With this argument,<br />

we can see that the total number of dishes sampled by each customer follows a<br />

Poisson(α) distribution. There<strong>for</strong>e, <strong>for</strong> a matrix with N rows (N customers), the number<br />

of non-zero entries in Z follows a Poisson(αN) distribution, which implies that the<br />

IBP produces sparse matrices.<br />

4.1.1 A Distribution on Infinite Binary Matrices<br />

The probability distribution over [Z] defined by the IBP that is given in eq. (4.2) can be<br />

derived as the limit of a distribution over finite size binary matrices when the number of<br />

columns approach infinity (Griffiths and Ghahramani, 2005). This approach is similar<br />

to deriving the equivalent distribution to the Dirichlet process mixtures by considering<br />

a mixture model with infinitely many components (see Section 3.1.5). We start with<br />

defining the distribution over a finite binary matrix Z with N rows (customers) and K<br />

columns (dishes), then take the limit K → ∞. Recall that an entry zik = 1 means that<br />

the ith customer has sampled the kth dish, or equivalently, the ith object has the kth<br />

feature. The graphical representation of the hierarchical model is depicted in Figure 4.2.<br />

Each entry on the kth column of the finite-dimensional matrix is assumed to have a<br />

probability µk of being 1,<br />

(zik | µk) ∼ Bernoulli(µk). (4.3)<br />

We will refer to µk as the feature presence probability <strong>for</strong> column k. Putting an independent<br />

beta prior on each µk,<br />

µk ∼ Beta( α<br />

, 1), (4.4)<br />

K<br />

the posterior distribution <strong>for</strong> µk given the previous i − 1 entries of the kth column is<br />

also beta by conjugacy;<br />

(µk | zjk, 1 ≤ j < i) ∼ Beta( α<br />

K + m.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!