26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3 Comparing Per<strong>for</strong>mances of the Samplers<br />

denote the distribution over a finite (K × D) part of A also as matrix Gaussian,<br />

A | σA ∼ N (0, σ 2 AI), (4.61)<br />

where 0 denotes a K × D matrix of zeros. We put an IBP(α) prior on the latent binary<br />

feature matrix Z, and a gamma prior on the IBP parameter α ∼ G(1, 1), completing<br />

the model.<br />

We compare the mixing per<strong>for</strong>mance of the two slice samplers (on the strictly decreasing<br />

weights, and on the semi-ordered weights) and Gibbs sampling described in the<br />

previous section. We chose to apply the samplers to simple synthetic data sets so that<br />

we can be assured of convergence to the true posterior and that mixing times can be<br />

estimated reliably in a reasonable amount of computation time <strong>for</strong> all samplers. We also<br />

chose to use a conjugate model since Gibbs sampling requires conjugacy. However, note<br />

that our implementation of the two slice samplers did not make use of the conjugacy.<br />

We generated 1, 2 and 3 dimensional data sets from the model with data variance<br />

fixed at σ2 x = 1, varying values of the strength parameter α = 1, 2 and the latent feature<br />

variance σ2 A = 1, 2, 4, 8. For each combination of parameters we produced five data sets<br />

with 100 data points, in total 120 data sets. For all data sets, we fixed σ2 x and σ2 A to<br />

the generating values and learned the feature matrix Z and α.<br />

We are interested in how the samplers on the nonparametric part of the model per<strong>for</strong>m<br />

There<strong>for</strong>e, we fix the σx and σA values and learn Z, A and α <strong>for</strong> all cases. For each<br />

data set and each sampler, we repeated 5 runs of 15, 000 iterations. We used the<br />

autocorrelation coefficients of the number of represented features K ‡ and α (with a<br />

maximum lag of 2500) as measures of mixing time. We found that mixing in K ‡ is<br />

slower than mixing in α <strong>for</strong> all data sets and all three samplers. We also found that in<br />

this regime the autocorrelation times do not vary with dimensionality or with σ2 A . In<br />

Figure 4.7 we show the autocorrelation times of α and K ‡ over all runs, all data sets,<br />

and all three samplers. As expected, the slice sampler using the decreasing stick lengths<br />

ordering was always slower than the semi-ordered one. Surprisingly, we found that the<br />

semi-ordered slice sampler was just as fast as the Gibbs sampler which fully exploits<br />

conjugacy. This is about as well as we would expect a more generally applicable nonconjugate<br />

sampler to per<strong>for</strong>m. This is a motivating result <strong>for</strong> using the semi-ordered<br />

slice sampler <strong>for</strong> inference on complex non-conjugate IBLF models.<br />

The first algorithm introduced <strong>for</strong> inference on the non-conjugate IBLF models is the<br />

approximate Gibbs sampling algorithm (Algorithm 12). Even though efficient sampling<br />

techniques that do not need to use an approximation have been developed, it is interesting<br />

to have an insight about how the approximate method per<strong>for</strong>ms compared to the<br />

non-approximate ones. We compare the modeling per<strong>for</strong>mance of Algorithm 12 to the<br />

conjugate Gibbs sampling results using the model described above.<br />

We generated a synthetic data set of 6 × 6 images described in Griffiths and Ghahramani<br />

(2005). The input images are composed of a combination of (a subset of) four<br />

latent features and zero-mean Gaussian noise with 0.5 standard deviation. We used<br />

the Gibbs sampling <strong>for</strong> conjugate models and the approximate Gibbs sampling <strong>for</strong> nonconjugate<br />

models to learn the latent structure. For the approximate scheme we used<br />

97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!