26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

while <strong>for</strong> the empty features we have a Markov property:<br />

p(µ ◦ (k) | µ◦ (k−1) ) ∝ (µ◦ (k) )α−1 (1 − µ ◦ (k) )N<br />

4.3 Comparing Per<strong>for</strong>mances of the Samplers<br />

exp( N i=1 1<br />

i (1 − µ◦ (k) )i ))I{0 ≤ µ ◦ (k) ≤ µ◦ (k−1) }. (4.57)<br />

Note that this equation is the same as eq. (4.51), with the conditioning on the rest of<br />

Z being inactive is inherent in the definition of µ ◦ .<br />

Slice Sampler<br />

To use the semi-ordered stick-breaking construction as a representation <strong>for</strong> inference,<br />

we can again use the slice sampler to adaptively truncate the representation <strong>for</strong> empty<br />

features. This gives an inference scheme which works in the non-conjugate case, is<br />

not approximate, has an adaptive truncation level, but without the restrictive ordering<br />

constraint of the stick-breaking construction, summarized in Algorithm 16.<br />

The representation consists only of the active features and the features and stick<br />

lengths associated with these features. The slice variable is defined as<br />

s ∼ Uni<strong>for</strong>m[0, µ ∗ ] µ ∗ <br />

<br />

= min 1, min µ+<br />

1≤k≤K ‡ k (4.58)<br />

Once a slice value is drawn, we extend the representation by generating K◦ empty<br />

features, with their stick lengths drawn from (4.57) until µ ◦ (K◦ +1) < s. The associated<br />

feature columns Z◦ K◦ are initialized to 0 and the parameters θ◦ 1:K◦ are drawn from<br />

their prior. Sampling <strong>for</strong> the matrix entries and the parameters proceed as be<strong>for</strong>e.<br />

Afterwards, we remove the zero columns and the corresponding parameters and stick<br />

lengths from the representation. Finally, the stick lengths <strong>for</strong> the new list of active<br />

features are drawn from their conditionals (4.56).<br />

We have presented several MCMC algorithms <strong>for</strong> inference on the IBLF models. The<br />

important question is which of these algorithms to use in practice. In the next section,<br />

we give an empirical comparison of some of the samplers.<br />

4.3 Comparing Per<strong>for</strong>mances of the Samplers<br />

We have described several sampling algorithms <strong>for</strong> inference on the models using IBP in<br />

the previous section. It is important to have an intuition about the comparative per<strong>for</strong>mance<br />

of the different samplers when choosing which one to use in practice. An especially<br />

interesting question is how the computational cost is effected when non-conjugate<br />

samplers are used. There<strong>for</strong>e, we compare the mixing per<strong>for</strong>mance of the conjugate<br />

Gibbs sampler (described in Algorithm 11) to the per<strong>for</strong>mance of the slice sampler using<br />

the strictly decreasing ordering of the stick lengths (Algorithm 15) and using the<br />

semi-ordered stick-breaking representation (Algorithm 16). We also compare the results<br />

of the the approximate Gibbs sampler (Algorithm 12) to the conjugate Gibbs sampler<br />

95

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!