Online Model Selection Based on the Variational Bayes

More documents

Recommendations

Info

1678 Masa-aki Sato The free energy of the MEF model after the VB M-step is expressed as F D MX h Thzi log P(x| N!, µ N )i N ¡ Thzi log P(x, z| N!, µ N )i N µ µ iD1 C log C(c ºi C 1) ¡ºi log C (c C M) ¡ log C(c (0)ºi(0) C 1) C ºi(0) log C (c (0) C M) C Wi(®i, c ºi) ¡ Wi(®i(0), c (0)ºi(0)) i . (C.13) The mixture of Gaussian (MG) model is obtained when the component distribution P(x| µi, i) is the normal distribution: P(x|mi, Si, i) D (2p ) ¡N/2 | Si | 1/2 µ exp ¡ 1 2 (x ¡ mi) T Si(x ¡ mi) µ D exp ¡ 1 2 xT Six C x T Simi ¡ Yi(mi, Si) , (C.14) Yi(mi, Si) D 1 2 mT i Simi C 1 2 log | Si | ¡ N 2 log(2p ), (C.15) where mi and Si denote the center and the inverse covariance matrix of the ith gaussian. The natural parameter of the normal distribution is given by µi D ( Si, Simi). The conjugate distribution for the normal distribution (see equation C.14) is given by the normal Wishart distribution (Gelman et al., 1995), P a (mi, Si |ci, Di, c i) D exp " ¡ 1 2 c i(mi ¡ ci) T Si(mi ¡ ci) ¡ 1 ¡1 c iTr(SiD i ) C 2 1 2 (c # i ¡ N) log | Si | ¡ Wi(Di, c i) , (C.16) Wi(Di, c i) D 1 2 c i log |D ¡1 i | C NX � ´ c i C 1 ¡ n log C 2 nD1 ¡ 1 2 c iN log ± c ² i N ¡ 2 2 log ± c ² i 1 C 2p 4 N(N ¡ 1) logp. (C.17) The natural parameter of the conjugate distribution, equation C.16, is given by (c i®i, c i) D (c i(D ¡1 i C cic T i ), c ici, c i). (C.18) The VB algorithm for the MG model can be derived by using the above equations.
<strong>Online</strong> <strong>Model</strong> <strong>Selection</strong> <strong>Based</strong> on the Variational Bayes 1679 Appendix D The VB method can be easily extended to the hierarchical Bayes model. Let us consider the EFH model (see equation 2.1) with the prior distribution P a (µ | ®0, c 0). The evidence for the hierarchical Bayes model is given by the marginal likelihood with respect to the model parameter µ and the prior hyperparameter ®0, Z P(XfTg) D dm (h )dm (®0)P(XfTg| µ )Pa (µ | ®0, c 0)P0(®0), (D.1) where P0(®0) is the prior distribution for the prior hyperparameter ®0. The free energy is de�ned by Z F(XfTg, Q) D dm (µ )dm (®0)dm (ZfTg)Q(µ, ®0, ZfTg) � P(XfTg, ZfTg| µ )Pa (µ £ log | ´ ®0, c 0)P0(®0) . (D.2) Q(µ, ®0, ZfTg) The hierarchical VB method can be obtained assuming the conjugate prior for P a (µ | ®0, c 0), P0(®0) D exp [b0 (a0®0c 0 ¡ Wa (®0, c 0)) ¡ Wa(a0, b0)] , (D.3) and the factorization for the trial posterior distribution, Q(µ, ®0, ZfTg) D Q h (µ )Q a (®0)Qz(ZfTg). (D.4) The remaining calculations can be done by the same way as in the VB method. The VB algorithm in this case consists of three steps. The posterior probability for the hidden variable P(z(t)|x(t), N µ ) is calculated in the VB E-step by using the ensemble average of the parameters N µ D hµi ® . (D.5) The posterior hyperparameter ® is calculated in the VB M-step, c ® D Thr(x, z)i N µ C c 0h®0ia, (D.6) c 0h®0ia D c 0 Z dm (®0)Q a (®0)®0 D 1 b @Wa (a, b), (D.7) @a together withc D T C c 0. The posterior hyper-hyperparameter (a, b) is then calculated: a D hµi ® C a0, (D.8)
Page 1 and 2: LETTER Communicated by Hagai Attias
Page 3 and 4: Online Mod
Page 29: Online Mod
Page 33: Online Mod

Online Model Selection Based on the Variational Bayes

Create successful ePaper yourself

Delete template?

Save as template?