26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Indian Buffet Process <strong>Models</strong><br />

α µ zik<br />

x (k) i<br />

Figure 4.3: Graphical representation <strong>for</strong> the infinite latent feature model using the stick breaking<br />

construction of the Indian buffet process. Note the strictly decreasing order of the<br />

feature presence probabilities.<br />

infinite matrix:<br />

E{1 T Z T 1} = E <br />

= K<br />

ik<br />

1<br />

N<br />

i=1<br />

<br />

zik = K<br />

0<br />

= KN α/K<br />

1 + α/K<br />

8<br />

N<br />

E{zik}<br />

i=1<br />

µkp(µk)dµk<br />

α<br />

= N<br />

1 + α/K<br />

N<br />

(4.14)<br />

Taking the limit as K → ∞, the expected number of non-zero entries <strong>for</strong> the infinite<br />

matrix is found to be αN, consistent with the result of the previous section.<br />

The derivation presented above starts with defining the distribution over a finite<br />

dimensional matrix and uses the conjugacy of the beta distribution to the binomial<br />

distribution to integrate out the feature presence probabilities µk. Although the rows<br />

are independent given µk, marginalization couples the rows, and the probability of an<br />

entry is given in terms of the hyperparameter α and the number of other entries in that<br />

column that are set to one, i.e. the number of objects sharing the feature. For the<br />

distribution over the infinite matrix to be well defined, the column indices are ignored<br />

by focusing on the permutation-invariant equivalence classes of matrices using the lof(·)<br />

function be<strong>for</strong>e taking the infinite limit.<br />

In the next section, we derive the same distribution with another approach, in which<br />

the feature presence probabilities are explicitly represented instead of being integrated<br />

out. To have a well defined distribution over the matrix with infinitely many columns,<br />

a strictly decreasing ordering of the µks is imposed, which results in the so called stickbreaking<br />

construction.<br />

4.1.2 Stick-Breaking Construction<br />

In Section 3.1.4 we described the stick-breaking construction <strong>for</strong> the DP. In this section,<br />

we describe a similar construction <strong>for</strong> the IBP which allows representing the feature presence<br />

probabilities explicitly in the IBLF model, see Figure 4.3. The derivation starts<br />

74

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!