26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Indian Buffet Process <strong>Models</strong><br />

Poisson(α) nonzero entries, and the distribution of the total nonzero entries in the<br />

matrix follows a Poisson(αN) distribution.<br />

The equivalence classes have been defined by using the lof (.) function. The ordering<br />

of the columns is not important <strong>for</strong> the lof representation, however the stick-breaking<br />

representation has a particular ordering. The columns are sorted with decreasing feature<br />

presence probabilities µ (k). This representation allows the µ (k) being represented in<br />

computations rather than being integrated out. The construction of the feature presence<br />

probabilities is such that the rate of decay is exponential. Given a part of the matrix,<br />

this representation allows judgment about the distribution of entries of the rest of the<br />

matrix. In particular, after a column with a feature presence probability µ (k), the<br />

expected number of nonzero entries in the rest of the row has a Poisson(αµ (k)), and the<br />

rest of the matrix a Poisson(αµ (k)N) distribution.<br />

The stick-breaking construction <strong>for</strong> the IBP and <strong>for</strong> the DP has an interesting relation.<br />

For the IBP, the feature presence probabilities µ (k) correspond to the stick lengths that<br />

we have left after breaking a piece off the stick, and <strong>for</strong> the DP the mixing proportions<br />

πk correspond to the piece we discard. For this reason, the stick-breaking construction<br />

<strong>for</strong> the IBP has strictly decreasing weights and the DP has size-biased ordering of the<br />

weights.<br />

The direct correspondence to stick-breaking in DP implies that a range of techniques<br />

<strong>for</strong> and extensions to the DP can be adapted <strong>for</strong> the IBP. For example, we can generalize<br />

the IBP by replacing the Beta(α, 1) distribution on νk’s with other distributions. One<br />

possibility is a Pitman-Yor extension (Pitman and Yor, 1997) of the IBP, defined as<br />

νk ∼ Beta(α + kd, 1 − d) µ (k) =<br />

k<br />

l=1<br />

νl<br />

(4.32)<br />

where d ∈ [0, 1) and α > −d. The Pitman-Yor IBP weights decrease in expectation as<br />

1<br />

− a O(k d ) power-law that has heavier tails <strong>for</strong> the distribution of the number features.<br />

This may be a better fit <strong>for</strong> some naturally occurring data which have a larger number<br />

of features with significant but small weights Goldwater et al. (2006).<br />

An example technique <strong>for</strong> the DP which we can adapt to the IBP is to truncate<br />

the stick-breaking construction after a certain number of break points and to per<strong>for</strong>m<br />

inference in the reduced space. Ishwaran and James (2001) give a bound <strong>for</strong> the error<br />

introduced by the truncation in the DP case which can be used here as well.<br />

Thibaux and Jordan (2007) show the connection between the beta process and the<br />

IBP. This suggests that the beta process, which has been extensively used in survival<br />

analysis, can be used <strong>for</strong> defining binary latent feature models. Furthermore, they use<br />

the connection between the IBP and the beta process to develop a new algorithm to<br />

sample beta processes with size-biased ordering of the weights.<br />

Wolpert and Ickstadt (1998) show how to construct the beta process with strictly<br />

decreasing weights using the inverse Lévy measure. The stick-breaking construction is<br />

a neat way of describing this algorithm.<br />

A direct consequence of the stick-breaking construction and the relation of the IBP to<br />

80

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!