26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Indian Buffet Process <strong>Models</strong><br />

matrix and taking the limit as the number of columns approaches infinity. Teh, Görür,<br />

and Ghahramani (2007) construct a stick-breaking representation <strong>for</strong> the IBP, and show<br />

interesting connections between the stick lengths of the stick-breaking construction <strong>for</strong><br />

the DP and the IBP. Thibaux and Jordan (2007) show that the underlying de Finetti<br />

mixing distribution <strong>for</strong> the IBP is the beta process (Hjort, 1990). In the next section,<br />

we present the description of the IBP and summarize these different approaches <strong>for</strong><br />

defining the distribution.<br />

Analytical inference is not possible in the IBLF models. Due to the similarity of<br />

the IBP and the DP, the methods <strong>for</strong> DPM models can be adjusted <strong>for</strong> the IBLF<br />

models. Similar to the DPM case, Gibbs sampling in conjugate IBLF models is fairly<br />

straight<strong>for</strong>ward to implement but requiring conjugacy does not hold <strong>for</strong> many interesting<br />

IBLF models. In Section 4.2, we summarize the Gibbs sampling algorithm <strong>for</strong> conjugate<br />

IBLF models, as well as other MCMC algorithms that can be used <strong>for</strong> inference on the<br />

non-conjugate IBLF models. In Section 4.3, we apply the described algorithms <strong>for</strong><br />

inference on a simple model on several synthetic data sets <strong>for</strong> empirically comparing the<br />

per<strong>for</strong>mance of the different samplers.<br />

In Section 4.4, we use an IBLF model suggested in (Griffiths and Ghahramani, 2005)<br />

to learn the features of handwritten digits. We employ two-part latent vectors <strong>for</strong><br />

representing the hidden features of the observations: a binary vector to indicate the<br />

presence or absence of a latent dimension and a real valued vector as the contribution<br />

of each latent dimension. Using an IBP prior over the binary part, we have an overcomplete<br />

dictionary of basis functions. In fact, we have infinitely many basis functions,<br />

only a small number of which are active <strong>for</strong> each data point. We show that the model<br />

can successfully reconstruct the images by finding features that make up the digits.<br />

Another interesting application of IBLF is modeling the choice behavior using the<br />

latent features to represent the alternatives in a choice set, Görür et al. (2006). In<br />

Section 4.5 we describe the non-parametric choice model using IBP as a prior. We show<br />

that the features that lead to the choice data can be inferred using the model.<br />

There have been other interesting applications of IBP to define flexible latent feature<br />

models. These include protein-protein interactions (Chu et al., 2006), the structure<br />

of causal graphs (Wood et al., 2006b), dyadic data <strong>for</strong> collaborative filtering (Meeds<br />

et al., 2007), human similarity judgements (Navarro and Griffiths, 2007) and document<br />

classification (Thibaux and Jordan, 2007).<br />

4.1 The Indian Buffet Process<br />

The Indian buffet process (IBP) (Griffiths and Ghahramani, 2005) defines a distribution<br />

on sparse binary matrices with infinitely many columns, and one row <strong>for</strong> each observation.<br />

It is a sequential process that has been inspired by the Chinese restaurant process<br />

(see Section 3.1.2) and is described by imagining an Indian buffet restaurant offering<br />

an unbounded number of dishes. The metaphor is describing the N rows of a binary<br />

matrix Z, each row represents a customer arriving at the restaurant and the infinitely<br />

many columns of the matrix represent the dishes offered, a finite number of which will<br />

68

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!