Nonparametric Bayesian Discrete Latent Variable Models for ...

More documents

Recommendations

Info

4 Indian Buffet Process Models ratio. Meeds et al. (2007) choose the product of the prior distributions as the proposal distribution; Q( ˆ K, ˆ Θ | n, Θ) = Poisson( ˆ K | α/N)G0( ˆ Θ | ξ). That is, the number of unique features ˆ K for data point i is chosen from the Poisson distribution, and that many parameters are sampled from the prior distribution G0(θ | ξ). We use the same notation as the previous sections and denote the existing unique features of i with K (i) . Thus, the feature matrix without these features and the corresponding set of parameters are denoted with Z −K (i) and Θ −K (i), respectively. The acceptance ratio is given as P ( ˆ K, ˆ Θ | X) P (K (i) Q(K , Θ | X) (i) , Θ | ˆ K, ˆ Θ) Q( ˆ K, ˆ Θ | K (i) , Θ) = F (X | ˆ K, Θ −K (i), ˆ Θ, Z −K (i), ˆ Z, Φ) F (X | K (i) , Θ, Z, Φ) × Poisson( ˆ K | α/N)G0( ˆ Θ | ξ) Poisson(K (i) Q(K | α/N)G0(Θ | ξ) (i) , Θ | ˆ K, Θ−K (i), ˆ Θ) Q( ˆ K, Θ−K (i), ˆ Θ | K (i) , Θ) = F (X | ˆ K, Θ−K (i), ˆ Θ, Z−K (i), ˆ Z, Φ) F (X | K (i) . , Θ, Z, Φ) (4.40) Thus, the acceptance probability reduces to the ratio of the likelihoods. Note that even if the current number of unique features for i is greater than zero, the parameters corresponding to these features are ignored, and all proposal parameters are sampled from the prior. This might lead to low acceptance probabilities if the prior distribution is not a good representation of the posterior distribution for the parameters. For the conjugate models, the parameters Θ can be integrated out, resulting in the marginal likelihood F (X | n, Z, Φ). In that case, we only need to consider sampling the number of unique features to introduce with the simpler proposal distribution Q( ˆ K | n) = Poisson( ˆ K | α/N). The acceptance ratio is found to be the ratio of marginal likelihoods: P ( ˆ K | X) P ( ˆ Q( K | X) ˆ K | ˆ K) Q( ˆ K | ˆ K) = F (X | ˆ K, Z−K (i), ˆ Z, Φ) F (X | ˆ (4.41) K, Z, Φ) This algorithm has the true posterior distribution as its stationary distribution since it does not suffer from the approximation error introduced by approximating the distribution of the unique features as in the Gibbs sampling algorithm of Section 4.2.2. However, acceptance ratios for non-conjugate models are likely to be small, since we cannot expect the prior to be a good representation of the posterior distribution of the parameters. The parameters of the existing unique features are ignored, and the parameters for the proposed unique features are sampled from the prior. 4.2.4 Gibbs Sampling for the Truncated Stick-Breaking Construction The stick-breaking construction described in Section 4.1.2 results in a strictly decreasing ordering of the feature presence probabilities. The stick lengths decrease with exponen- 88
4.2 MCMC Sampling algorithms for IBLF models Algorithm 13 Metropolis-Hastings sampling for IBP The state of the Markov chain consists of the infinite feature matrix Z and the set of infinitely many parameters Θ = {θk} ∞ 1 . Only the K ‡ active columns of Z and the corresponding parameters are represented. Repeatedly sample: for all rows i = 1, . . . , N do {Feature updates} for all columns k = 1, . . . , K ‡ do if m−i,k > 0 then Update zik by sampling from its conditional posterior, eq. (4.36). end if end for Propose a number ˆ K for unique features from the prior Poisson(α/N) Sample ˆ K parameters for the unique features. and Θ ˆ K Evaluate the proposal using the acceptance ratio, eq. (4.40) end for for all active columns k = 1, . . . , K ‡ do {Parameter updates} Update θk by sampling from its conditional posterior, eq. (4.35) end for tial rate, which suggests adapting the truncation used for approximating the DP to the IBP. The bound on the error introduced by the truncation for the DP stick-breaking construction has been calculated by Ishwaran and James (2001). Noting the correspondences between the stick weights of the IBP and the DP, a similar approach can be used in this case. Let M be the truncation level. Setting µ (M+1) = 0 constrains all µ (k) = 0 for k > M, while the joint density for µ (1:M) is given as p(µ (1:M)) = M k=1 p(µ (k) | µ (k−1)) M =α M µ α (M) k=1 µ −1 (k) I{0 ≤ µ (M) ≤ · · · ≤ µ (1) ≤ 1} (4.42) Inference using Gibbs sampling is straightforward to implement on the truncated model. The entries of Z are independent given µ (1:M), thus p(Z | µ (1:M)) = N M µ i=1 k=1 zik (k) (1 − µ (k)) 1−zik . (4.43) Since the entries in a column are independent given the feature presence probabilities, we do not need to worry about whether the other data points have the feature being updated or not. That is, we do not need separate update rules for separate cases in this 89
Page 1:
Nonparametric Bayesian Discrete Lat
Page 4 and 5:
Matrizen mit unendlich vielen Spalt
Page 7 and 8:
Contents Zusammenfassung iii Abstra
Page 9:
List of Algorithms 1 Gibbs sampling
Page 13 and 14:
Notation Matrices are capitalized a
Page 15:
Symbol Meaning IBP Z binary latent
Page 18 and 19:
1 Introduction belief in the prior.
Page 20 and 21:
2 Nonparametric Bayesian Analysis b
Page 22 and 23:
2 Nonparametric Bayesian Analysis s
Page 25 and 26:
3 Dirichlet Process Mixture Models
Page 27 and 28:
3.1 The Dirichlet Process the perfo
Page 29 and 30:
α G o G θi x i N 3.1 The Dirichle
Page 31 and 32:
15 10 5 −0.5 0 0.5 2 1 G 0 0 −0
Page 33 and 34:
increment process with the correspo
Page 35 and 36:
α G o π k c i θk x i 8 N 3.1 The
Page 37 and 38:
3.1 The Dirichlet Process Eq. (3.21
Page 39 and 40:
number of components, K number of c
Page 41 and 42:
3.2 MCMC Inference in Dirichlet Pro
Page 43 and 44:
and Bush and MacEachern (1996). 3.2
Page 45 and 46:
3.2.2 Algorithms for non-Conjugate
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54: 3.2 MCMC Inference in Dirichlet Pro
Page 55 and 56: ∗ π ∗ π s 3.2 MCMC Inference
Page 57 and 58: model can be written in the form of
Page 59 and 60: −1 µ y Σy Σy D ξ normal R 3.3
Page 61 and 62: the log likelihood term is: where a
Page 63 and 64: 3.3 Empirical Study on the Choice o
Page 65 and 66: autocovariance coefficient 1 0.8 0.
Page 67 and 68: # of data points # of data points 5
Page 69 and 70: 3.4 Dirichlet Process Mixtures of F
Page 71 and 72: µ y Σy ξ R 0 ν w normal µ −1
Page 73 and 74: ch1 ch2 ch3 ch4 3.4 Dirichlet Proce
Page 75 and 76: 3.4 Dirichlet Process Mixtures of F
Page 77 and 78: # of components # of components # o
Page 79 and 80: ch 2 ch 3 ch 1 ch 2 ch 3 ch 4 3.4 D
Page 81: 3.5 Discussion 3.5 Discussion In th
Page 84 and 85: 4 Indian Buffet Process Models matr
Page 86 and 87: 4 Indian Buffet Process Models In t
Page 88 and 89: 4 Indian Buffet Process Models α z
Page 90 and 91: 4 Indian Buffet Process Models α
Page 92 and 93: 4 Indian Buffet Process Models The
Page 94 and 95: 4 Indian Buffet Process Models colu
Page 96 and 97: 4 Indian Buffet Process Models Pois
Page 98 and 99: 4 Indian Buffet Process Models z α
Page 100 and 101: 4 Indian Buffet Process Models For
Page 102 and 103: 4 Indian Buffet Process Models ciat
Page 106 and 107: 4 Indian Buffet Process Models repr
Page 108 and 109: 4 Indian Buffet Process Models samp
Page 110 and 111: 4 Indian Buffet Process Models feat
Page 112 and 113: 4 Indian Buffet Process Models Algo
Page 114 and 115: 4 Indian Buffet Process Models mixi
Page 116 and 117: 4 Indian Buffet Process Models Figu
Page 118 and 119: 4 Indian Buffet Process Models pres
Page 120 and 121: 4 Indian Buffet Process Models ε
Page 122 and 123: 4 Indian Buffet Process Models LL P
Page 124 and 125: 4 Indian Buffet Process Models P+ P
Page 126 and 127: 4 Indian Buffet Process Models tEBA
Page 128 and 129: 4 Indian Buffet Process Models dist
Page 130 and 131: 5 Conclusions has been defined as a
Page 132 and 133: A Details of Derivations for the St
Page 135 and 136: B Mathematical Appendix B.1 Dirichl
Page 137 and 138: p3 α 1 0.5 0 0 0.5 0.4 0.3 0.2 0.1
Page 139 and 140: Construction of A Process B.4 Equal
Page 141 and 142: Bibliography D. Aldous. Exchangeabi
Page 143 and 144: Bibliography T. S. Ferguson. Prior
Page 145 and 146: Bibliography L. F. James and J. W.
Page 147 and 148: Bibliography R. M. Neal. Probabilis
Page 149 and 150: Bibliography Y. W. Teh, M. I. Jorda
show all

Nonparametric Bayesian Discrete Latent Variable Models for ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?