26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Indian Buffet Process <strong>Models</strong><br />

features, leaving only the K ‡ active feature columns along with the corresponding parameters.<br />

To change from IBP back to the stick-breaking representation, we have to<br />

draw the stick lengths and sort the features in decreasing stick lengths, introducing<br />

empty features if required.<br />

We consider the finite binary matrix of Section 4.1.1 with the same distributional<br />

assumptions. We index the K ‡ active features with k = 1, . . . , K ‡ . Let Z 1:K ‡ be the<br />

feature presence matrix, that is, the matrix composed of only the active feature columns.<br />

Suppose that we have K ≫ K ‡ features in the finite model. For the active features, the<br />

posterior <strong>for</strong> the feature presence probabilities are simply<br />

µ +<br />

k | z:,k ∼ Beta( α<br />

K + m·,k, 1 + N − m·,k), (4.54)<br />

see eq. (4.5). Taking the limit as K → ∞, the posterior becomes,<br />

Beta(m·,k, 1 + N − m·,k). (4.55)<br />

In the stick breaking representation, we need to represent at least all features up to<br />

and including the last active feature. There<strong>for</strong>e, the representation may include some<br />

inactive features. When changing the representation from IBP to the stick-breaking, it is<br />

sufficient to represent only those empty features with stick lengths larger than mink µ +<br />

k .<br />

Thus we consider a decreasing ordering µ ◦ (1) > µ◦ (2) > · · · on the stick lengths of the<br />

, we condition on the fact that there are no active<br />

inactive components. For each µ ◦ (k)<br />

features beyond that feature. Thus, considering infinitely many features, the density<br />

<strong>for</strong> µ ◦ (k) is given by (4.51). ARS can be used to draw µ◦ (1:K ◦ ) until µ◦ (K ◦ ) < mink µ +<br />

k .<br />

We sample parameters <strong>for</strong> the newly represented inactive features from the prior. The<br />

stick-breaking representation is obtained by reordering µ +<br />

1:K ‡, µ ◦ (1:K◦ ) in decreasing order,<br />

with the feature columns and parameters taking on the same ordering (columns and<br />

parameters corresponding to empty features are set to 0 and drawn from their prior<br />

respectively), resulting in K † = K ‡ + K ◦ features in the stick-breaking representation.<br />

The validity of this representation can be seen by referring to the connection of the IBP<br />

to the beta process Thibaux and Jordan (2007).<br />

Semi-Ordered Stick-Breaking<br />

In deriving the change of representations from the IBP to the stick-breaking representation,<br />

we made use of an intermediate representation whereby the active features are<br />

unordered, while the empty ones have an ordering of decreasing stick lengths. It is in<br />

fact possible to directly work with this representation, which we shall call semi-ordered<br />

stick-breaking.<br />

The Z matrix consists of K ‡ active and unordered features, as well as an ordered<br />

sequence of infinitely many empty features. The stick lengths <strong>for</strong> the active features<br />

have conditional distributions:<br />

94<br />

µ +<br />

k | z:,k ∼ Beta(m·,k, 1 + N − m·,k) (4.56)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!