Online Model Selection Based on the Variational Bayes
Online Model Selection Based on the Variational Bayes
Online Model Selection Based on the Variational Bayes
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<str<strong>on</strong>g>Online</str<strong>on</strong>g> <str<strong>on</strong>g>Model</str<strong>on</strong>g> <str<strong>on</strong>g>Selecti<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Variati<strong>on</strong>al <strong>Bayes</strong> 1659<br />
By interchanging <strong>the</strong> integrati<strong>on</strong> with respect to µ and z, <strong>on</strong>e can get<br />
Z<br />
P(x|XfTg) D dm (z)<br />
£ exp £ r0(x, z) C © ( O® (x, z), c C 1) ¡ © (®, c ) ¤ , (2.37)<br />
O® (x, z) D (c ® C r(x, z))/(1 C c ).<br />
For a �nite T, this predictive distributi<strong>on</strong> has a different functi<strong>on</strong>al form<br />
from <strong>the</strong> model distributi<strong>on</strong> P(x| µ ), equati<strong>on</strong> 2.1.<br />
2.7 Large Sample Limit. When <strong>the</strong> amount of observed data becomes<br />
large (T À 1 : c À 1), <strong>the</strong> soluti<strong>on</strong> of <strong>the</strong> VB algorithm becomes <strong>the</strong> ML<br />
estimator (Attias, 1999). In this limit, <strong>the</strong> integrati<strong>on</strong> over <strong>the</strong> parameters<br />
with respect to <strong>the</strong> posterior parameter distributi<strong>on</strong> can be approximated<br />
by using a stati<strong>on</strong>ary point approximati<strong>on</strong>:<br />
Z<br />
exp [© (®, c )] D<br />
dm (µ ) exp [c (a ¢ µ ¡ Ã (µ ))]<br />
µ<br />
� exp c (a ¢ O µ ¡ Ã ( µ<br />
O ) ¡ 1<br />
2 log<br />
�<br />
�<br />
�<br />
�c @2Ã @µ@µ ( �<br />
µ<br />
O<br />
�<br />
) �<br />
� C O(1/c <br />
) , (2.38)<br />
where O µ is <strong>the</strong> maximum of <strong>the</strong> exp<strong>on</strong>ent ¡ ® ¢ µ ¡ Ã (µ )¢ , that is,<br />
@Ã<br />
@µ ( O µ ) D ®. (2.39)<br />
Therefore, © can be approximated as<br />
© (®, c ) � c (a ¢ O µ ¡ Ã ( µ<br />
O ) ¡ 1<br />
2 log<br />
�<br />
�<br />
�<br />
�c @2Ã @µ@µ ( �<br />
µ<br />
O<br />
�<br />
) �<br />
� C O(1/c ). (2.40)<br />
C<strong>on</strong>sequently, <strong>the</strong> ensemble average of <strong>the</strong> parameter N µ can be approximated<br />
as<br />
Nµ D 1<br />
c<br />
� 1<br />
c<br />
@©<br />
@® (®, c )<br />
@<br />
@® (c (® ¢ O µ ¡ Ã ( O µ ))) D O µ. (2.41)<br />
The relati<strong>on</strong>s 2.39 and 2.41 imply that <strong>the</strong> posterior hyperparameter ® is<br />
equal to <strong>the</strong> expectati<strong>on</strong> parameter of <strong>the</strong> EFH model, Á (see equati<strong>on</strong> 2.3)<br />
in this limit. Fur<strong>the</strong>rmore, equati<strong>on</strong>s 2.18, 2.39, and 2.41 are equivalent to <strong>the</strong>