26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.1 The Indian Buffet Process<br />

from the distribution on a finite binary matrix and imposing a strictly decreasing ordering<br />

of the feature presence probabilities be<strong>for</strong>e taking the infinite limit. Denoting the<br />

kth largest feature presence probability with µ (k), we will show that the stick-breaking<br />

construction <strong>for</strong> the IBP has the following law <strong>for</strong> the feature presence probabilities,<br />

µ (k) = νk µ (k−1) =<br />

k<br />

νl, with µ (0) = 1 and νk ∼ Beta(α, 1). (4.15)<br />

l=1<br />

That is, each feature presence probability µ (k) is a product of the previous one, µ (k−1),<br />

and the random variable νk. This can be described metaphorically as starting with<br />

a stick of unit length (µ (0) = 1), and breaking a piece off the stick at each iteration,<br />

discarding that piece and recursing on the piece that we kept. The breaking point is<br />

determined by the variable νk, and the feature presence probability µ (k) is the stick<br />

length left after k iterations. Note that this procedure en<strong>for</strong>ces a strictly decreasing<br />

ordering of the µ’s since µ (k) is given by the stick length that we have left after the kth<br />

iteration 1 and we recurse on the piece of the stick left at each iteration, see Figure 4.4<br />

<strong>for</strong> a pictorial representation.<br />

In the following, we present the outline of derivation of the stick-breaking construction<br />

<strong>for</strong> the IBP. Details of the derivation are given in Appendix A. We start with the same<br />

distributional assumptions on the entries of a finite binary matrix with N rows and K<br />

columns as in Section 4.1.1. That is, <strong>for</strong> the unordered columns we assume<br />

The density of µk is given as:<br />

p(µk) =<br />

(zik | µk) ∼Bernoulli(µk)<br />

µk ∼Beta( α<br />

, 1).<br />

K<br />

α Γ( K + 1)<br />

Γ( α<br />

K<br />

α<br />

)Γ(1)µ k<br />

K −1<br />

= α α<br />

µ K<br />

K −1 I{0 ≤ µk ≤ 1},<br />

(1 − µk) 1−1 I{0 ≤ µk ≤ 1}<br />

(4.16)<br />

(4.17)<br />

where I{A} is the indicator function <strong>for</strong> a measurable set A; I{A} = 1 if A is true, 0<br />

otherwise. The cumulative distribution function (cdf) <strong>for</strong> µk is:<br />

F (µk) =<br />

µk α α<br />

t K<br />

0 K −1 I{0 ≤ t ≤ 1}dt<br />

= µ α<br />

K<br />

k I{0 ≤ µk ≤ 1} + I{1 < µk}.<br />

(4.18)<br />

We define µ (1) ≥ µ (2) ≥ · · · ≥ µ (K) to be the decreasing ordering of µ1, . . . , µK. Thus,<br />

µ (1) is defined as<br />

µ (1) = max<br />

k=1,...,K µk, (4.19)<br />

1 For the DP, the mixing proportions correspond to the length of the piece that we discard. This<br />

relation be discussed further in the section<br />

75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!