24.12.2012 Views

Online Model Selection Based on the Variational Bayes

Online Model Selection Based on the Variational Bayes

Online Model Selection Based on the Variational Bayes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Online</str<strong>on</strong>g> <str<strong>on</strong>g>Model</str<strong>on</strong>g> <str<strong>on</strong>g>Selecti<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Variati<strong>on</strong>al <strong>Bayes</strong> 1653<br />

2.3 Evidence and Free Energy. The evidence for a data set XfTg is de-<br />

�ned by<br />

Z<br />

P(XfTg) D dm (µ )P(XfTg| µ )P0(µ ), (2.4)<br />

where dm (µ ) denotes a measure <strong>on</strong> <strong>the</strong> model parameter space and P0(µ )<br />

denotes a prior distributi<strong>on</strong> for <strong>the</strong> model parameters. The integrati<strong>on</strong> over<br />

model parameters in equati<strong>on</strong> 2.4 penalizes complex models with more degrees<br />

of freedom (Bishop, 1995). This integrati<strong>on</strong>, however, is often dif�cult<br />

to perform. In order to evaluate this integrati<strong>on</strong>, <strong>the</strong> VB method introduces<br />

a trial probability distributi<strong>on</strong> Q(µ, ZfTg), which approximates <strong>the</strong> posterior<br />

probability distributi<strong>on</strong> over <strong>the</strong> model parameter µ and <strong>the</strong> hidden<br />

variables ZfTg D fz(t)|t D 1, . . . , Tg:<br />

P(µ, ZfTg|XfTg) D P(XfTg, ZfTg| µ )P0(µ )<br />

, (2.5)<br />

P(XfTg)<br />

where <strong>the</strong> probability distributi<strong>on</strong> for a complete data set (XfTg, ZfTg) is<br />

given by<br />

P(XfTg, ZfTg| µ ) D<br />

TY<br />

P(x(t), z(t)| µ ). (2.6)<br />

tD1<br />

The Kullback-Leibler (KL) divergence between <strong>the</strong> trial distributi<strong>on</strong> Q(µ,<br />

ZfTg) and <strong>the</strong> true posterior distributi<strong>on</strong> P(µ, ZfTg|XfTg) is given by<br />

Z<br />

KL(Q||P) D dm (µ )dm (ZfTg) Q(µ, ZfTg)<br />

� ´<br />

Q(µ, ZfTg)<br />

£ log<br />

P(µ, ZfTg|XfTg)<br />

D log P(XfTg) ¡ F(XfTg, Q), (2.7)<br />

where <strong>the</strong> free energy F(XfTg, Q) is de�ned by<br />

Z<br />

F(XfTg, Q) D dm (µ )dm (ZfTg) Q(µ, ZfTg)<br />

�<br />

P(XfTg, ZfTg| µ )P0(µ<br />

£ log<br />

)<br />

´<br />

] . (2.8)<br />

Q(µ, ZfTg)<br />

Therefore, <strong>the</strong> true posterior distributi<strong>on</strong> is obtained by maximizing <strong>the</strong> free<br />

energy with respect to <strong>the</strong> trial distributi<strong>on</strong> Q(µ, ZfTg). The maximum of<br />

<strong>the</strong> free energy is equal to <strong>the</strong> log evidence:<br />

log P(XfTg) D max F(XfTg, Q) ¸ F(XfTg, Q), (2.9)<br />

Q<br />

Equati<strong>on</strong> 2.9 implies that <strong>the</strong> lower bound for <strong>the</strong> log evidence can be evaluated<br />

by using some trial posterior distributi<strong>on</strong>s Q(µ, ZfTg).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!