Nonparametric Bayesian Discrete Latent Variable Models for ...

More documents

Recommendations

Info

4 Indian Buffet Process Models Algorithm 16 Slice sampling for the semi-ordered IBP The state of the Markov chain consists of the infinite feature matrix Z, the feature presence probabilities µ1:∞ = µ1, . . . , µ∞ corresponding to each feature column and the set of infinitely many parameters Θ = {θk} ∞ 1 . Only the K ‡ active columns of Z up to and including the last active column and the corresponding parameters are represented. Repeatedly sample: Change to SB representation: Sample µs for active features (µ + ) from their posterior, eq. (4.56) Sample µs for inactive components (µ ◦ ) using eq. (4.57) until the smallest µ + is larger than the smallest µ ◦ Sort columns to have µs in decreasing order for all i = 1, . . . , N do Do feature updates in the SB representation end for Change to IBP representation: Remove feature presence probabilities from the representation Remove inactive feature columns from the representation for all columns k = 1, . . . , K † do {Parameter updates} Update θk by sampling from its conditional posterior, eq. (4.35) end for results to have a sense of the accuracy of the approximation. Since for both cases the conjugate Gibbs sampling is taken as a basis of comparison, we use a conjugate model. We choose to use the linear-Gaussian binary latent feature model from Griffiths and Ghahramani (2005). The model is summarized below, see the referred paper for a detailed description. Each data point xi is assumed to be generated by a combination of a subset of the rows of A and distorted by spherical Gaussian noise, xi = ziA + ε, (4.59) with ε ∼ N (0, σ 2 xI). The infinite dimensional binary vector zi encodes which features are contributing to xi, and A is a matrix (with infinitely many rows) whose kth row corresponds to the parameters for the kth feature. This model can be interpreted as a binary factor analyzer with an infinite dimensional latent space. The distribution over the whole data matrix X can be written as a matrix Gaussian, X | Z, A, σx ∼ N (ZA, σ 2 xI). (4.60) Entries of A are drawn i.i.d. from a zero-mean Gaussian with variance σ2 A . We can 96
4.3 Comparing Performances of the Samplers denote the distribution over a finite (K × D) part of A also as matrix Gaussian, A | σA ∼ N (0, σ 2 AI), (4.61) where 0 denotes a K × D matrix of zeros. We put an IBP(α) prior on the latent binary feature matrix Z, and a gamma prior on the IBP parameter α ∼ G(1, 1), completing the model. We compare the mixing performance of the two slice samplers (on the strictly decreasing weights, and on the semi-ordered weights) and Gibbs sampling described in the previous section. We chose to apply the samplers to simple synthetic data sets so that we can be assured of convergence to the true posterior and that mixing times can be estimated reliably in a reasonable amount of computation time for all samplers. We also chose to use a conjugate model since Gibbs sampling requires conjugacy. However, note that our implementation of the two slice samplers did not make use of the conjugacy. We generated 1, 2 and 3 dimensional data sets from the model with data variance fixed at σ2 x = 1, varying values of the strength parameter α = 1, 2 and the latent feature variance σ2 A = 1, 2, 4, 8. For each combination of parameters we produced five data sets with 100 data points, in total 120 data sets. For all data sets, we fixed σ2 x and σ2 A to the generating values and learned the feature matrix Z and α. We are interested in how the samplers on the nonparametric part of the model perform Therefore, we fix the σx and σA values and learn Z, A and α for all cases. For each data set and each sampler, we repeated 5 runs of 15, 000 iterations. We used the autocorrelation coefficients of the number of represented features K ‡ and α (with a maximum lag of 2500) as measures of mixing time. We found that mixing in K ‡ is slower than mixing in α for all data sets and all three samplers. We also found that in this regime the autocorrelation times do not vary with dimensionality or with σ2 A . In Figure 4.7 we show the autocorrelation times of α and K ‡ over all runs, all data sets, and all three samplers. As expected, the slice sampler using the decreasing stick lengths ordering was always slower than the semi-ordered one. Surprisingly, we found that the semi-ordered slice sampler was just as fast as the Gibbs sampler which fully exploits conjugacy. This is about as well as we would expect a more generally applicable nonconjugate sampler to perform. This is a motivating result for using the semi-ordered slice sampler for inference on complex non-conjugate IBLF models. The first algorithm introduced for inference on the non-conjugate IBLF models is the approximate Gibbs sampling algorithm (Algorithm 12). Even though efficient sampling techniques that do not need to use an approximation have been developed, it is interesting to have an insight about how the approximate method performs compared to the non-approximate ones. We compare the modeling performance of Algorithm 12 to the conjugate Gibbs sampling results using the model described above. We generated a synthetic data set of 6 × 6 images described in Griffiths and Ghahramani (2005). The input images are composed of a combination of (a subset of) four latent features and zero-mean Gaussian noise with 0.5 standard deviation. We used the Gibbs sampling for conjugate models and the approximate Gibbs sampling for nonconjugate models to learn the latent structure. For the approximate scheme we used 97
Page 1:
Nonparametric Bayesian Discrete Lat
Page 4 and 5:
Matrizen mit unendlich vielen Spalt
Page 7 and 8:
Contents Zusammenfassung iii Abstra
Page 9:
List of Algorithms 1 Gibbs sampling
Page 13 and 14:
Notation Matrices are capitalized a
Page 15:
Symbol Meaning IBP Z binary latent
Page 18 and 19:
1 Introduction belief in the prior.
Page 20 and 21:
2 Nonparametric Bayesian Analysis b
Page 22 and 23:
2 Nonparametric Bayesian Analysis s
Page 25 and 26:
3 Dirichlet Process Mixture Models
Page 27 and 28:
3.1 The Dirichlet Process the perfo
Page 29 and 30:
α G o G θi x i N 3.1 The Dirichle
Page 31 and 32:
15 10 5 −0.5 0 0.5 2 1 G 0 0 −0
Page 33 and 34:
increment process with the correspo
Page 35 and 36:
α G o π k c i θk x i 8 N 3.1 The
Page 37 and 38:
3.1 The Dirichlet Process Eq. (3.21
Page 39 and 40:
number of components, K number of c
Page 41 and 42:
3.2 MCMC Inference in Dirichlet Pro
Page 43 and 44:
and Bush and MacEachern (1996). 3.2
Page 45 and 46:
3.2.2 Algorithms for non-Conjugate
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
∗ π ∗ π s 3.2 MCMC Inference
Page 57 and 58:
model can be written in the form of
Page 59 and 60:
−1 µ y Σy Σy D ξ normal R 3.3
Page 61 and 62: the log likelihood term is: where a
Page 63 and 64: 3.3 Empirical Study on the Choice o
Page 65 and 66: autocovariance coefficient 1 0.8 0.
Page 67 and 68: # of data points # of data points 5
Page 69 and 70: 3.4 Dirichlet Process Mixtures of F
Page 71 and 72: µ y Σy ξ R 0 ν w normal µ −1
Page 73 and 74: ch1 ch2 ch3 ch4 3.4 Dirichlet Proce
Page 75 and 76: 3.4 Dirichlet Process Mixtures of F
Page 77 and 78: # of components # of components # o
Page 79 and 80: ch 2 ch 3 ch 1 ch 2 ch 3 ch 4 3.4 D
Page 81: 3.5 Discussion 3.5 Discussion In th
Page 84 and 85: 4 Indian Buffet Process Models matr
Page 86 and 87: 4 Indian Buffet Process Models In t
Page 88 and 89: 4 Indian Buffet Process Models α z
Page 90 and 91: 4 Indian Buffet Process Models α
Page 92 and 93: 4 Indian Buffet Process Models The
Page 94 and 95: 4 Indian Buffet Process Models colu
Page 96 and 97: 4 Indian Buffet Process Models Pois
Page 98 and 99: 4 Indian Buffet Process Models z α
Page 100 and 101: 4 Indian Buffet Process Models For
Page 102 and 103: 4 Indian Buffet Process Models ciat
Page 104 and 105: 4 Indian Buffet Process Models rati
Page 106 and 107: 4 Indian Buffet Process Models repr
Page 108 and 109: 4 Indian Buffet Process Models samp
Page 110 and 111: 4 Indian Buffet Process Models feat
Page 114 and 115: 4 Indian Buffet Process Models mixi
Page 116 and 117: 4 Indian Buffet Process Models Figu
Page 118 and 119: 4 Indian Buffet Process Models pres
Page 120 and 121: 4 Indian Buffet Process Models ε
Page 122 and 123: 4 Indian Buffet Process Models LL P
Page 124 and 125: 4 Indian Buffet Process Models P+ P
Page 126 and 127: 4 Indian Buffet Process Models tEBA
Page 128 and 129: 4 Indian Buffet Process Models dist
Page 130 and 131: 5 Conclusions has been defined as a
Page 132 and 133: A Details of Derivations for the St
Page 135 and 136: B Mathematical Appendix B.1 Dirichl
Page 137 and 138: p3 α 1 0.5 0 0 0.5 0.4 0.3 0.2 0.1
Page 139 and 140: Construction of A Process B.4 Equal
Page 141 and 142: Bibliography D. Aldous. Exchangeabi
Page 143 and 144: Bibliography T. S. Ferguson. Prior
Page 145 and 146: Bibliography L. F. James and J. W.
Page 147 and 148: Bibliography R. M. Neal. Probabilis
Page 149 and 150: Bibliography Y. W. Teh, M. I. Jorda
show all

Nonparametric Bayesian Discrete Latent Variable Models for ...

Create successful ePaper yourself

Delete template?

Save as template?