Nonparametric Bayesian Discrete Latent Variable Models for ...

More documents

Recommendations

Info

4 Indian Buffet Process Models features, leaving only the K ‡ active feature columns along with the corresponding parameters. To change from IBP back to the stick-breaking representation, we have to draw the stick lengths and sort the features in decreasing stick lengths, introducing empty features if required. We consider the finite binary matrix of Section 4.1.1 with the same distributional assumptions. We index the K ‡ active features with k = 1, . . . , K ‡ . Let Z 1:K ‡ be the feature presence matrix, that is, the matrix composed of only the active feature columns. Suppose that we have K ≫ K ‡ features in the finite model. For the active features, the posterior for the feature presence probabilities are simply µ + k | z:,k ∼ Beta( α K + m·,k, 1 + N − m·,k), (4.54) see eq. (4.5). Taking the limit as K → ∞, the posterior becomes, Beta(m·,k, 1 + N − m·,k). (4.55) In the stick breaking representation, we need to represent at least all features up to and including the last active feature. Therefore, the representation may include some inactive features. When changing the representation from IBP to the stick-breaking, it is sufficient to represent only those empty features with stick lengths larger than mink µ + k . Thus we consider a decreasing ordering µ ◦ (1) > µ◦ (2) > · · · on the stick lengths of the , we condition on the fact that there are no active inactive components. For each µ ◦ (k) features beyond that feature. Thus, considering infinitely many features, the density for µ ◦ (k) is given by (4.51). ARS can be used to draw µ◦ (1:K ◦ ) until µ◦ (K ◦ ) < mink µ + k . We sample parameters for the newly represented inactive features from the prior. The stick-breaking representation is obtained by reordering µ + 1:K ‡, µ ◦ (1:K◦ ) in decreasing order, with the feature columns and parameters taking on the same ordering (columns and parameters corresponding to empty features are set to 0 and drawn from their prior respectively), resulting in K † = K ‡ + K ◦ features in the stick-breaking representation. The validity of this representation can be seen by referring to the connection of the IBP to the beta process Thibaux and Jordan (2007). Semi-Ordered Stick-Breaking In deriving the change of representations from the IBP to the stick-breaking representation, we made use of an intermediate representation whereby the active features are unordered, while the empty ones have an ordering of decreasing stick lengths. It is in fact possible to directly work with this representation, which we shall call semi-ordered stick-breaking. The Z matrix consists of K ‡ active and unordered features, as well as an ordered sequence of infinitely many empty features. The stick lengths for the active features have conditional distributions: 94 µ + k | z:,k ∼ Beta(m·,k, 1 + N − m·,k) (4.56)
while for the empty features we have a Markov property: p(µ ◦ (k) | µ◦ (k−1) ) ∝ (µ◦ (k) )α−1 (1 − µ ◦ (k) )N 4.3 Comparing Performances of the Samplers exp( N i=1 1 i (1 − µ◦ (k) )i ))I{0 ≤ µ ◦ (k) ≤ µ◦ (k−1) }. (4.57) Note that this equation is the same as eq. (4.51), with the conditioning on the rest of Z being inactive is inherent in the definition of µ ◦ . Slice Sampler To use the semi-ordered stick-breaking construction as a representation for inference, we can again use the slice sampler to adaptively truncate the representation for empty features. This gives an inference scheme which works in the non-conjugate case, is not approximate, has an adaptive truncation level, but without the restrictive ordering constraint of the stick-breaking construction, summarized in Algorithm 16. The representation consists only of the active features and the features and stick lengths associated with these features. The slice variable is defined as s ∼ Uniform[0, µ ∗ ] µ ∗ = min 1, min µ+ 1≤k≤K ‡ k (4.58) Once a slice value is drawn, we extend the representation by generating K◦ empty features, with their stick lengths drawn from (4.57) until µ ◦ (K◦ +1) < s. The associated feature columns Z◦ K◦ are initialized to 0 and the parameters θ◦ 1:K◦ are drawn from their prior. Sampling for the matrix entries and the parameters proceed as before. Afterwards, we remove the zero columns and the corresponding parameters and stick lengths from the representation. Finally, the stick lengths for the new list of active features are drawn from their conditionals (4.56). We have presented several MCMC algorithms for inference on the IBLF models. The important question is which of these algorithms to use in practice. In the next section, we give an empirical comparison of some of the samplers. 4.3 Comparing Performances of the Samplers We have described several sampling algorithms for inference on the models using IBP in the previous section. It is important to have an intuition about the comparative performance of the different samplers when choosing which one to use in practice. An especially interesting question is how the computational cost is effected when non-conjugate samplers are used. Therefore, we compare the mixing performance of the conjugate Gibbs sampler (described in Algorithm 11) to the performance of the slice sampler using the strictly decreasing ordering of the stick lengths (Algorithm 15) and using the semi-ordered stick-breaking representation (Algorithm 16). We also compare the results of the the approximate Gibbs sampler (Algorithm 12) to the conjugate Gibbs sampler 95
Page 1:
Nonparametric Bayesian Discrete Lat
Page 4 and 5:
Matrizen mit unendlich vielen Spalt
Page 7 and 8:
Contents Zusammenfassung iii Abstra
Page 9:
List of Algorithms 1 Gibbs sampling
Page 13 and 14:
Notation Matrices are capitalized a
Page 15:
Symbol Meaning IBP Z binary latent
Page 18 and 19:
1 Introduction belief in the prior.
Page 20 and 21:
2 Nonparametric Bayesian Analysis b
Page 22 and 23:
2 Nonparametric Bayesian Analysis s
Page 25 and 26:
3 Dirichlet Process Mixture Models
Page 27 and 28:
3.1 The Dirichlet Process the perfo
Page 29 and 30:
α G o G θi x i N 3.1 The Dirichle
Page 31 and 32:
15 10 5 −0.5 0 0.5 2 1 G 0 0 −0
Page 33 and 34:
increment process with the correspo
Page 35 and 36:
α G o π k c i θk x i 8 N 3.1 The
Page 37 and 38:
3.1 The Dirichlet Process Eq. (3.21
Page 39 and 40:
number of components, K number of c
Page 41 and 42:
3.2 MCMC Inference in Dirichlet Pro
Page 43 and 44:
and Bush and MacEachern (1996). 3.2
Page 45 and 46:
3.2.2 Algorithms for non-Conjugate
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
∗ π ∗ π s 3.2 MCMC Inference
Page 57 and 58:
model can be written in the form of
Page 59 and 60: −1 µ y Σy Σy D ξ normal R 3.3
Page 61 and 62: the log likelihood term is: where a
Page 63 and 64: 3.3 Empirical Study on the Choice o
Page 65 and 66: autocovariance coefficient 1 0.8 0.
Page 67 and 68: # of data points # of data points 5
Page 69 and 70: 3.4 Dirichlet Process Mixtures of F
Page 71 and 72: µ y Σy ξ R 0 ν w normal µ −1
Page 73 and 74: ch1 ch2 ch3 ch4 3.4 Dirichlet Proce
Page 75 and 76: 3.4 Dirichlet Process Mixtures of F
Page 77 and 78: # of components # of components # o
Page 79 and 80: ch 2 ch 3 ch 1 ch 2 ch 3 ch 4 3.4 D
Page 81: 3.5 Discussion 3.5 Discussion In th
Page 84 and 85: 4 Indian Buffet Process Models matr
Page 86 and 87: 4 Indian Buffet Process Models In t
Page 88 and 89: 4 Indian Buffet Process Models α z
Page 90 and 91: 4 Indian Buffet Process Models α
Page 92 and 93: 4 Indian Buffet Process Models The
Page 94 and 95: 4 Indian Buffet Process Models colu
Page 96 and 97: 4 Indian Buffet Process Models Pois
Page 98 and 99: 4 Indian Buffet Process Models z α
Page 100 and 101: 4 Indian Buffet Process Models For
Page 102 and 103: 4 Indian Buffet Process Models ciat
Page 104 and 105: 4 Indian Buffet Process Models rati
Page 106 and 107: 4 Indian Buffet Process Models repr
Page 108 and 109: 4 Indian Buffet Process Models samp
Page 112 and 113: 4 Indian Buffet Process Models Algo
Page 114 and 115: 4 Indian Buffet Process Models mixi
Page 116 and 117: 4 Indian Buffet Process Models Figu
Page 118 and 119: 4 Indian Buffet Process Models pres
Page 120 and 121: 4 Indian Buffet Process Models ε
Page 122 and 123: 4 Indian Buffet Process Models LL P
Page 124 and 125: 4 Indian Buffet Process Models P+ P
Page 126 and 127: 4 Indian Buffet Process Models tEBA
Page 128 and 129: 4 Indian Buffet Process Models dist
Page 130 and 131: 5 Conclusions has been defined as a
Page 132 and 133: A Details of Derivations for the St
Page 135 and 136: B Mathematical Appendix B.1 Dirichl
Page 137 and 138: p3 α 1 0.5 0 0 0.5 0.4 0.3 0.2 0.1
Page 139 and 140: Construction of A Process B.4 Equal
Page 141 and 142: Bibliography D. Aldous. Exchangeabi
Page 143 and 144: Bibliography T. S. Ferguson. Prior
Page 145 and 146: Bibliography L. F. James and J. W.
Page 147 and 148: Bibliography R. M. Neal. Probabilis
Page 149 and 150: Bibliography Y. W. Teh, M. I. Jorda
show all

Nonparametric Bayesian Discrete Latent Variable Models for ...

Create successful ePaper yourself

Delete template?

Save as template?