26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Indian Buffet Process <strong>Models</strong><br />

sample more µ (k)s until ˜ K < K † and extend Z by adding the necessary number of<br />

columns of zero, see Figure 4.6 <strong>for</strong> a pictorial explanation. In the appendix we show<br />

that the stick lengths µ (k) <strong>for</strong> new features k > K † can be drawn iteratively from the<br />

following distribution:<br />

p(µ (k) | µ (k−1), Z:,.>k=0) ∝ µ α−1<br />

(k) (1 − µ (k)) N exp(α N<br />

i=1 1<br />

i (1 − µ (k)) i )<br />

I{0 ≤ µ (k) ≤ µ (k−1)},<br />

(4.51)<br />

which is obtained by conditioning on the fact that there are no non-zero entries beyond<br />

the kth column, eq. (4.29). We can use ARS to draw samples from (4.51) since it is<br />

log-concave in log µ (k). Finally, parameters <strong>for</strong> the new represented features are drawn<br />

from the prior θk ∼ H.<br />

Given s, we only need to update zik <strong>for</strong> each i = 1, . . . , N and k = 1, . . . , ˜ K. The<br />

conditional probabilities are:<br />

p(zik = 1 | s, µ (k), Z−ik, θ 1:K †) ∝ µ (k)<br />

µ ∗ f(xi | zik = 1, Z−ik, θ 1:K †) (4.52)<br />

The µ ∗ denominator is needed when different values of zik induces different values of µ ∗<br />

(e.g. if zik = 1 prior to the update, and was the only non-zero entry in the last non-zero<br />

column of Z).<br />

is<br />

For k = 1, . . . , K † −1, combining (4.42) and (4.43), the conditional probability of µ (k)<br />

p(µ (k) | µ (k−1), Z) ∝µ m·,k−1<br />

(k) (1 − µ (k)) N−m·,k<br />

I{µ (k+1) ≤ µ (k) ≤ µ (k−1)} (4.53)<br />

where m·,k = N<br />

i=1 zik. For k = K † , in addition to taking into account the probability<br />

of z :,K † = 0, we also have to take into account the probability that all columns of Z<br />

beyond K † is zero. The resulting conditional probability of µ (K † ) is given by (4.51) with<br />

k = K † . Drawing from both (4.53) and (4.51) can be done using ARS. The conditional<br />

probability of θk <strong>for</strong> each k = 1, . . . , K † , is as given be<strong>for</strong>e, eq. (4.35).<br />

The feature columns are sorted in a specific way in the stick-breaking representation,<br />

there<strong>for</strong>e they are no longer exchangeable. This can result in poor mixing over the<br />

features, similar to the DP case as pointed out by Porteus et al. (2006). This problem<br />

becomes more apparent when one of the features with a large index happens to contribute<br />

highly to the likelihood. In this case, this feature would persist, which would<br />

require keeping all other unnecessary features in between the representative ones even<br />

though they are not active. We can consider moves that facilitate mixing over the feature<br />

indices and use Metropolis-Hastings sampling to evaluate the proposals similar to<br />

the ones proposed <strong>for</strong> the DP case by Porteus et al. (2006). However, the additional<br />

constraint of the stick lengths being in a strictly decreasing order complicates the <strong>for</strong>mulation<br />

of the moves. In the following section, we describe an alternative method to<br />

facilitate mixing in the stick-breaking representation.<br />

92

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!