Nonparametric Bayesian Discrete Latent Variable Models for ...

More documents

Recommendations

Info

3 Dirichlet Process Mixture Models The update equations for ci is given as: components for which n−i,k > 0: P (ci = k|xi, c−i, α) ∝ another component: n−i,k N − 1 + α P (ci = ci ′ for all i = i′ |c−i, α) ∝ F (xi | φ)G−i,k(φ)dφ, α N − 1 + α F (xi | φ)G0(φ)dφ. (3.32) where G−i,k is the posterior distribution obtained by updating the baseline prior with the observations assigned to component k, other than xi. Algorithm 3 Gibbs sampling for conjugate DPM models using indicator variables without parameter representations. The state of the Markov chain consists of the indicator variables c = {c1, . . . , cN} Repeatedly sample: for all i = 1, . . . , N do Update ci using eq. (3.32) end for All the above methods produce ergodic Markov chains. However, they require marginalizing over the parameter θ which means that the integrals in eqs. (3.29), (3.30) and (3.32)) should be analytically tractable. This restricts the choice of the baseline prior G0 to be conjugate to the likelihood F (X | θ). This requirement limits the family of DP models. West et al. (1994) suggest approximating the integral by using numerical quadrature or Monte Carlo approximation, which would provide only an approximation to the posterior. Methods for non-conjugate DP models that result in the correct stationary distribution have been developed later. MacEachern and Müller (1998) present a method that attends to the identities of the indicator variables and uses model augmentation. Neal (2000) follows a similar approach and uses auxiliary variables which exist only temporarily. Walker and Damien (1998) present a different auxiliary variable sampling scheme that truncates the posterior to avoid calculating the integral. In the following, we give an overview of these algorithms that do not require conjugacy, which can all be seen as extensions of the Gibbs sampling algorithm for the conjugate case. Neal (2000) also presents a combination of Metropolis Hastings proposals and Gibbs updates for the non-conjugate models, which is also summarized below. After these algorithms that use the Pólya urn representation, we describe the inference algorithms that use the stick-breaking construction. 28
3.2.2 Algorithms for non-Conjugate DP Models No Gaps Algorithm 3.2 MCMC Inference in Dirichlet Process Mixture Models The algorithms described above use the indicator variables ci, i = 1, . . . , N to assign identical values of θ to the component parameters φ. The set of numerical values of ci do not have significance beyond denoting the grouping. The ”no gaps” algorithm of MacEachern and Müller (1998) constrains the labels of the active components to cover the integers from 1 to K ‡ , and augments the state to include empty components so as to have a total of N represented components, i.e. K † = N. Augmenting the model replaces the integral evaluations with likelihood evaluations. The state of the Markov chain consists of the indicator variables and the parameters of the N components, K ‡ of which have data assigned, the rest being empty. First the to denote the number of distinct groups of data not including the ith data point. Label the groups indicator variable of each data point i is updated as follows. Use K ‡ − from 1 to K ‡ − . If ci is a singleton, with probability K ‡ − /(K‡ − + 1) leave ci unchanged, otherwise label ci as (K ‡ − + 1), consequently assigning φK ‡ to be the existing value − +1 for φci . Update ci with the following probabilities: P (ci = k | xi, c−i, φ) ∝ n−i,kF (xi | φk) for k = 1, . . . , K ‡ − , P (ci = K ‡ − + 1 | xi, c−i, φ) ∝ α K ‡ − + 1F (xi | φ K ‡ − +1) (3.33) After the indicator updates, update the component parameters by sampling from their conditional posterior, eq. (3.31). Algorithm 4 The No-Gaps algorithm for non-conjugate DPM models. The state of the Markov chain consists of the indicator variables c = {c1, . . . , cN} and N component parameters Φ = {φ1, . . . , φN} Of the N parameters, only K ‡ + 1 of them are represented Repeatedly sample: for all i = 1, . . . , N do {indicator updates} denote the number of active components without considering xi Let K ‡ − if ci is a singleton then With probability K ‡ − /(K‡ − + 1) do not update ci, otherwise, label ci as K ‡ − end if + 1 Label the components of all data points other than i from 1 to K ‡ − Update ci using eq. (3.33) end for for all k = 1, . . . , N do {parameter updates} Update φk by sampling from its posterior, eq. (3.31) end for 29
Page 1: Nonparametric Bayesian Discrete Lat
Page 4 and 5: Matrizen mit unendlich vielen Spalt
Page 7 and 8: Contents Zusammenfassung iii Abstra
Page 9: List of Algorithms 1 Gibbs sampling
Page 13 and 14: Notation Matrices are capitalized a
Page 15: Symbol Meaning IBP Z binary latent
Page 18 and 19: 1 Introduction belief in the prior.
Page 20 and 21: 2 Nonparametric Bayesian Analysis b
Page 22 and 23: 2 Nonparametric Bayesian Analysis s
Page 25 and 26: 3 Dirichlet Process Mixture Models
Page 27 and 28: 3.1 The Dirichlet Process the perfo
Page 29 and 30: α G o G θi x i N 3.1 The Dirichle
Page 31 and 32: 15 10 5 −0.5 0 0.5 2 1 G 0 0 −0
Page 33 and 34: increment process with the correspo
Page 35 and 36: α G o π k c i θk x i 8 N 3.1 The
Page 37 and 38: 3.1 The Dirichlet Process Eq. (3.21
Page 39 and 40: number of components, K number of c
Page 41 and 42: 3.2 MCMC Inference in Dirichlet Pro
Page 43: and Bush and MacEachern (1996). 3.2
Page 55 and 56: ∗ π ∗ π s 3.2 MCMC Inference
Page 57 and 58: model can be written in the form of
Page 59 and 60: −1 µ y Σy Σy D ξ normal R 3.3
Page 61 and 62: the log likelihood term is: where a
Page 63 and 64: 3.3 Empirical Study on the Choice o
Page 65 and 66: autocovariance coefficient 1 0.8 0.
Page 67 and 68: # of data points # of data points 5
Page 69 and 70: 3.4 Dirichlet Process Mixtures of F
Page 71 and 72: µ y Σy ξ R 0 ν w normal µ −1
Page 73 and 74: ch1 ch2 ch3 ch4 3.4 Dirichlet Proce
Page 75 and 76: 3.4 Dirichlet Process Mixtures of F
Page 77 and 78: # of components # of components # o
Page 79 and 80: ch 2 ch 3 ch 1 ch 2 ch 3 ch 4 3.4 D
Page 81: 3.5 Discussion 3.5 Discussion In th
Page 84 and 85: 4 Indian Buffet Process Models matr
Page 86 and 87: 4 Indian Buffet Process Models In t
Page 88 and 89: 4 Indian Buffet Process Models α z
Page 90 and 91: 4 Indian Buffet Process Models α
Page 92 and 93: 4 Indian Buffet Process Models The
Page 94 and 95:
4 Indian Buffet Process Models colu
Page 96 and 97:
4 Indian Buffet Process Models Pois
Page 98 and 99:
4 Indian Buffet Process Models z α
Page 100 and 101:
4 Indian Buffet Process Models For
Page 102 and 103:
4 Indian Buffet Process Models ciat
Page 104 and 105:
4 Indian Buffet Process Models rati
Page 106 and 107:
4 Indian Buffet Process Models repr
Page 108 and 109:
4 Indian Buffet Process Models samp
Page 110 and 111:
4 Indian Buffet Process Models feat
Page 112 and 113:
4 Indian Buffet Process Models Algo
Page 114 and 115:
4 Indian Buffet Process Models mixi
Page 116 and 117:
4 Indian Buffet Process Models Figu
Page 118 and 119:
4 Indian Buffet Process Models pres
Page 120 and 121:
4 Indian Buffet Process Models ε
Page 122 and 123:
4 Indian Buffet Process Models LL P
Page 124 and 125:
4 Indian Buffet Process Models P+ P
Page 126 and 127:
4 Indian Buffet Process Models tEBA
Page 128 and 129:
4 Indian Buffet Process Models dist
Page 130 and 131:
5 Conclusions has been defined as a
Page 132 and 133:
A Details of Derivations for the St
Page 135 and 136:
B Mathematical Appendix B.1 Dirichl
Page 137 and 138:
p3 α 1 0.5 0 0 0.5 0.4 0.3 0.2 0.1
Page 139 and 140:
Construction of A Process B.4 Equal
Page 141 and 142:
Bibliography D. Aldous. Exchangeabi
Page 143 and 144:
Bibliography T. S. Ferguson. Prior
Page 145 and 146:
Bibliography L. F. James and J. W.
Page 147 and 148:
Bibliography R. M. Neal. Probabilis
Page 149 and 150:
Bibliography Y. W. Teh, M. I. Jorda
show all

Nonparametric Bayesian Discrete Latent Variable Models for ...

Create successful ePaper yourself

Delete template?

Save as template?