26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Indian Buffet Process <strong>Models</strong><br />

The cumulative distribution function of the maxima of a set of independent random<br />

variables is the product of the cumulative distribution functions of each of the variables.<br />

We obtain the cdf <strong>for</strong> µ (1) by taking the product of the K (identical) cdf’s,<br />

F (µ (1)) = µ α<br />

K<br />

(1) I{0 ≤ µ (1) ≤ 1} + I{1 < µ (1)} K<br />

= µ α (1) I{0 ≤ µ (1) ≤ 1} + I{1 < µ (1)}.<br />

Differentiating, we obtain the probability density function (pdf) of µ (1):<br />

That is, µ (1) ∼ Beta(α, 1).<br />

(4.20)<br />

p(µ (1)) = αµ α−1<br />

(1) I{0 ≤ µ (1) ≤ 1}. (4.21)<br />

Following the same approach <strong>for</strong> the subsequent feature presence probabilities, the<br />

density of µ (k+1) is obtained to be,<br />

p(µ (k+1) | µ (1:k)) = p(µ (k+1) | µ (k))<br />

= α<br />

K − k<br />

K<br />

K−k<br />

K<br />

µ−α<br />

(k)<br />

µ α K−k<br />

K −1<br />

(k+1)<br />

I{0 ≤ µ (k+1) ≤ µ (k)}.<br />

(4.22)<br />

This is the density of the (k + 1)th largest value of K random variables all with distribution<br />

given in eq. (4.16). To obtain the distribution of the probabilities corresponding<br />

to the columns of the infinite matrix, we take the limit as K → ∞. In the limit, the<br />

density of µ (k+1) becomes<br />

p(µ (k+1) | µ (k)) = αµ −α<br />

(k) µα−1<br />

(k+1) I{0 ≤ µ (k+1) ≤ µ (k)}. (4.23)<br />

Defining µ (0) = 1, the above equation gives the densities <strong>for</strong> the feature presence probabilities<br />

<strong>for</strong> all columns of the infinite dimensional binary matrix Z sorted in a strictly<br />

decreasing order.<br />

Note that given µ (k), µ (k+1) is independent of all other µ values. We introduce a<br />

set of variables νk = µ (k)<br />

µ (k−1) to make use of this Markov property. Since µ (k) has range<br />

[0, µ (k−1)], νk has range [0, 1]. Using a change of variables, the distribution of νk can be<br />

obtained from eq. (4.23) to be,<br />

p(νk | µ (k−1)) = p(µ (k) | µ (1:k−1))<br />

<br />

<br />

dµ<br />

<br />

(k) <br />

<br />

dνk <br />

= αν α−1<br />

k I{0 ≤ νk ≤ 1}.<br />

(4.24)<br />

Thus, νk is independent from µ (1:k−1) and is simply Beta(α, 1) distributed. We obtain<br />

the stick-breaking representation by expanding µ (k),<br />

76<br />

µ (k) = νk µ (k−1) =<br />

k<br />

νl. (4.25)<br />

l=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!