Nonparametric Bayesian Discrete Latent Variable Models for ...

More documents

Recommendations

Info

3 Dirichlet Process Mixture Models (Neal, 2003). Thus, the problem of sampling from an arbitrary distribution reduces to sampling from uniform distributions. In Section 3.2.2 we have described an algorithm by Walker and Damien (1998) that uses auxiliary variables to limit the space of sampling in the Pólya urn representation. The auxiliary variable in that algorithm is chosen such that it has uniform distribution defined by the likelihood value. Therefore, given the auxiliary variable, sampling from the posterior reduces to sampling from a truncated version of the prior. In this section, we describe a similar idea applied to the stick-breaking construction of the DP by Walker (2006) that results in an elegant algorithm which is widely applicable. The parameters, the mixing proportions and the indicator variables are repeatedly updated. We introduce the temporary slice variable s when updating the indicators, and discard it after the indicator update. The distribution of the auxiliary variable s is defined such that the joint prior of s and ci is a two-dimensional uniform distribution. Conditioning on s, ci is uniformly distributed on a limited part of the prior space. Combining this with the likelihood, we have the conditional posterior of the ci. Recall that the prior probability of assigning an observation to one of the components is given by the mixing proportions π = {π1, . . . , π∞}, Multiplying with the likelihood, the posterior is, P (ci|π) = πci . (3.48) P (ci|π, xi, θ) ∝ πciF (xi|θci ). (3.49) We introduce the auxiliary slice variable s such that the joint posterior of the indicator variable and s is P (ci, s|π, xi, θ) ∝ I{s < πci }F (xi|θci ). (3.50) Thus, the distribution of s given π and ci is uniform: (s|π, ci) ∼ U(0, πci ) = I{s < πci }π−1 ci and the distribution of ci conditioned also on s is F (xi|θci ) if s < πci , P (ci|s, π, xi, θ) ∝ 0 otherwise. (3.51) (3.52) That is, the probability of assigning ci to components with mixing proportions less than the slice variable is 0. Therefore, we only need to consider assignment to one of the components that have a larger stick length than the slice variable s. This will clearly be a finite number, rather than the infinitely many components. Using slice sampling, we only need to represent the mixing proportions and the parameters of the K † components. We allocate new components only when needed. The slice value is sampled uniformly between 0 and πci . Note that the stick lengths are not 38
∗ π ∗ π s 3.2 MCMC Inference in Dirichlet Process Mixture Models Figure 3.9: Slice sampling for DP. The vertical lines denote the mixing proportions πk of the represented components in the stick breaking construction. Initially, there are six components represented and for the data point being considered, ci = 4 (left). A slice value s (shown by the dotted red line) is sampled from Uniform[0, π4] and the represented components are extended until the sum of the remaining mixing proportions is less than s (right). strictly decreasing, but we know that they have to sum to 1. Therefore, if K † components are represented, one of the unrepresented components can have a maximum stick length of 1− K † k=1 πk. If this value is greater than s, we keep breaking the stick until we are left with a stick length smaller than s, see Figure 3.9 for a pictorial representation. We sample parameters for the newly allocated components from the prior. Then we update the indicator variable given the data and the components above the slice. Algorithm 10 Slice sampling algorithm for the DPM model using the stick-breaking construction. The state of the Markov chain consists of indicator variables for each data point and the infinitely many components with corresponding mixing proportions and parameters. Only the mixing proportions and parameters of the components up to and including the last active component are represented. Repeatedly sample: for all i = 1, . . . , N do {indicator updates} Sample a slice variable s from the conditional distribution eq. (3.51) if s < K † l=1 πl then Extend the representation by breaking the stick until s > K † l=1 πl Sample parameters for the new represented stick pieces from G0 end if Assign xi to one of the components above the slice, using eq. (3.52) end for The methods presented in this section use the stick-breaking construction of the DP. In this construction, the mixing proportions of the (infinitely many) mixture components are represented. Therefore the indicator variables are updated without conditioning on the other indicators. This feature of the samplers encourages good mixing for the indicator variables. However, it should be noted that since the mixing proportions of each component is represented explicitly, with a size-biased ordering of the cluster labels, 39
Page 1:
Nonparametric Bayesian Discrete Lat
Page 4 and 5: Matrizen mit unendlich vielen Spalt
Page 7 and 8: Contents Zusammenfassung iii Abstra
Page 9: List of Algorithms 1 Gibbs sampling
Page 13 and 14: Notation Matrices are capitalized a
Page 15: Symbol Meaning IBP Z binary latent
Page 18 and 19: 1 Introduction belief in the prior.
Page 20 and 21: 2 Nonparametric Bayesian Analysis b
Page 22 and 23: 2 Nonparametric Bayesian Analysis s
Page 25 and 26: 3 Dirichlet Process Mixture Models
Page 27 and 28: 3.1 The Dirichlet Process the perfo
Page 29 and 30: α G o G θi x i N 3.1 The Dirichle
Page 31 and 32: 15 10 5 −0.5 0 0.5 2 1 G 0 0 −0
Page 33 and 34: increment process with the correspo
Page 35 and 36: α G o π k c i θk x i 8 N 3.1 The
Page 37 and 38: 3.1 The Dirichlet Process Eq. (3.21
Page 39 and 40: number of components, K number of c
Page 41 and 42: 3.2 MCMC Inference in Dirichlet Pro
Page 43 and 44: and Bush and MacEachern (1996). 3.2
Page 45 and 46: 3.2.2 Algorithms for non-Conjugate
Page 53: 3.2 MCMC Inference in Dirichlet Pro
Page 57 and 58: model can be written in the form of
Page 59 and 60: −1 µ y Σy Σy D ξ normal R 3.3
Page 61 and 62: the log likelihood term is: where a
Page 63 and 64: 3.3 Empirical Study on the Choice o
Page 65 and 66: autocovariance coefficient 1 0.8 0.
Page 67 and 68: # of data points # of data points 5
Page 69 and 70: 3.4 Dirichlet Process Mixtures of F
Page 71 and 72: µ y Σy ξ R 0 ν w normal µ −1
Page 73 and 74: ch1 ch2 ch3 ch4 3.4 Dirichlet Proce
Page 75 and 76: 3.4 Dirichlet Process Mixtures of F
Page 77 and 78: # of components # of components # o
Page 79 and 80: ch 2 ch 3 ch 1 ch 2 ch 3 ch 4 3.4 D
Page 81: 3.5 Discussion 3.5 Discussion In th
Page 84 and 85: 4 Indian Buffet Process Models matr
Page 86 and 87: 4 Indian Buffet Process Models In t
Page 88 and 89: 4 Indian Buffet Process Models α z
Page 90 and 91: 4 Indian Buffet Process Models α
Page 92 and 93: 4 Indian Buffet Process Models The
Page 94 and 95: 4 Indian Buffet Process Models colu
Page 96 and 97: 4 Indian Buffet Process Models Pois
Page 98 and 99: 4 Indian Buffet Process Models z α
Page 100 and 101: 4 Indian Buffet Process Models For
Page 102 and 103: 4 Indian Buffet Process Models ciat
Page 104 and 105:
4 Indian Buffet Process Models rati
Page 106 and 107:
4 Indian Buffet Process Models repr
Page 108 and 109:
4 Indian Buffet Process Models samp
Page 110 and 111:
4 Indian Buffet Process Models feat
Page 112 and 113:
4 Indian Buffet Process Models Algo
Page 114 and 115:
4 Indian Buffet Process Models mixi
Page 116 and 117:
4 Indian Buffet Process Models Figu
Page 118 and 119:
4 Indian Buffet Process Models pres
Page 120 and 121:
4 Indian Buffet Process Models ε
Page 122 and 123:
4 Indian Buffet Process Models LL P
Page 124 and 125:
4 Indian Buffet Process Models P+ P
Page 126 and 127:
4 Indian Buffet Process Models tEBA
Page 128 and 129:
4 Indian Buffet Process Models dist
Page 130 and 131:
5 Conclusions has been defined as a
Page 132 and 133:
A Details of Derivations for the St
Page 135 and 136:
B Mathematical Appendix B.1 Dirichl
Page 137 and 138:
p3 α 1 0.5 0 0 0.5 0.4 0.3 0.2 0.1
Page 139 and 140:
Construction of A Process B.4 Equal
Page 141 and 142:
Bibliography D. Aldous. Exchangeabi
Page 143 and 144:
Bibliography T. S. Ferguson. Prior
Page 145 and 146:
Bibliography L. F. James and J. W.
Page 147 and 148:
Bibliography R. M. Neal. Probabilis
Page 149 and 150:
Bibliography Y. W. Teh, M. I. Jorda
show all

Nonparametric Bayesian Discrete Latent Variable Models for ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?