Online Model Selection Based on the Variational Bayes

More documents

Recommendations

Info

1660 Masa-aki Sato ordinary EM algorithm for the EFH model. In a large sample limit, the data term is dominant over the model complexity term. Consequently, the free energy maximization becomes equivalent to the likelihood maximization. Using equations 2.40 and 2.41, the free energy becomes F � TX tD1 log P(x| N µ ) ¡ K 2 logc ¡ 1 2 log � �@ � � 2 Ã @µ@µ ( � µ N � ) � � C c 0(a0 ¢ µ N ¡ Ã ( µ N )) ¡ © (®0, c 0) C O(1/c ). (2.42) This expression coincides with the BIC/MDL criteria (Rissanen, 1987; Schwartz, 1978). The predictive distribution P(x|XfTg) in this limit coincides with the model distribution using the ML estimator P(x| N µ ). This can be shown by using the following relations: O® (x, z) � ® C 1 c (r(x, z) ¡ ® ) C O(1/c 2 ), © ( O® (x, z), r C 1) � © (®, c ) C 1 c (r(x, z) ¡ @© ® ) @® (®, c ) C @© @c (®, c ) 3 <strong>Online</strong> Variational Bayes Method � © (®, c ) C r(x, z) ¢ N µ ¡ Ã ( N µ ). 3.1 Expectation Value of the Free Energy. In this section, we derive an online version of the VB algorithm. The amount of data increases over time in the online learning. Therefore, it is desirable to calculate the free energy corresponding to a �xed amount of data. For this purpose, let us de�ne an expectation value of the log evidence for a �nite amount of data: E £ log P(XfTg) ¤ r D Z dm (XfTg)r (XfTg) �Z £ log dm (µ )P(XfTg| µ )P0(µ ) ´ , (3.1) where r represents an unknown probability distribution for observed data. The corresponding VB free energy is given by Z E [F(XfTg, Qh, Qz)] r D T µZ £ E dm (µ )Qh (µ ) dm (z)Qz(z) log ¡ P(x, z| µ )/Qz(z) ¢ r
<strong>Online</strong> <strong>Model</strong> <strong>Selection</strong> <strong>Based</strong> on the Variational Bayes 1661 Z C dm (µ )Qh (µ ) log ¡ P0(µ )/Qh (µ )¢ . (3.2) The ratio (c 0/ T) determines the relative reliability between the observed data and the prior belief for the parameter distribution. The expected free energy, equation 3.2, can be estimated by � ´ T Xt Z F(Xft g, Qzftg, Qh , T) D t tD1 dm (µ )Q Z h (µ ) £ log ¡ P(x(t), z(t)| µ )/Qz(z(t)) ¢ Z C dm (z(t))Qz(z(t)) dm (µ )Q h (µ ) log ¡ P0(µ )/Q h (µ )¢ , (3.3) where Qzftg D fQz(z(t))|t D 1, . . . , t g. Note that t represents the actual amount of observed data, and it increases over time while T is �xed. The estimation of the posterior distribution Qz(z(t)) is inaccurate in the early stage of the online learning and gradually becomes accurate as learning proceeds. However, the early inaccurate estimations and the later accurate estimations contribute to the free energy (see equation 3.3) in equal weight. This might cause slow convergence of the learning process. Therefore, we introduce a time-dependent discount factor l(t) (0 · l(t) · 1, t D 2, 3, . . .) for forgetting the earlier inaccurate estimation effects. Accordingly, a discounted free energy is de�ned by F l (Xft g, Qzft g, Q h , T) D Tg(t ) Z £ tX �tY tD1 l(s) sDtC1 dm (µ )Qh (µ ) Z ´ dm (z(t))Qz(z(t)) £ log ¡ P(x(t), z(t)| µ )/Qz(z(t)) ¢ Z C where g(t ) represents a normalization constant: g(t ) D " tX �tY tD1 l(s) sDtC1 ´#¡1 dm (µ )Q h (µ ) log ¡ P0(µ )/Q h (µ )¢ , (3.4) . (3.5) 3.2 <strong>Online</strong> Variational Bayes Algorithm. The online VB algorithm can be derived from the successive maximization of the discounted free energy (see equation 3.4). Let us assume that Qzft ¡1g D fQz(z(t))|t D 1, . . . , t ¡1g (µ ) have been determined for an observed data set Xft ¡ 1g D and Q (t ¡1) h
Page 1 and 2: LETTER Communicated by Hagai Attias
Page 3 and 4: Online Mod
Page 11: Online Mod
Page 33: Online Mod

Online Model Selection Based on the Variational Bayes

Create successful ePaper yourself

Delete template?

Save as template?