24.12.2012 Views

Online Model Selection Based on the Variational Bayes

Online Model Selection Based on the Variational Bayes

Online Model Selection Based on the Variational Bayes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Online</str<strong>on</strong>g> <str<strong>on</strong>g>Model</str<strong>on</strong>g> <str<strong>on</strong>g>Selecti<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Variati<strong>on</strong>al <strong>Bayes</strong> 1669<br />

Figure 2: Maximum free energies (FE) obtained by three learning methods and<br />

<strong>the</strong>ir c<strong>on</strong>vergence times measured by epoch numbers are plotted for various<br />

models. Three methods are batch VB (dash-dotted line with square), <strong>on</strong>line VB<br />

with discount factor (solid line with circles), and <strong>on</strong>line VB without discount<br />

factor (dashed line with triangles). The abscissa denotes <strong>the</strong> number of gaussian<br />

units in trained models. (A) Results for data set A. (B) Results for data set B.<br />

The <strong>on</strong>line VB method without <strong>the</strong> discount factor showed poor performance<br />

and slow c<strong>on</strong>vergence for all cases. This result implies that <strong>the</strong><br />

introducti<strong>on</strong> of <strong>the</strong> discount factor is crucial for good performance of <strong>the</strong><br />

<strong>on</strong>line VB method, as pointed out in secti<strong>on</strong> 3. If <strong>the</strong>re is no discount factor,<br />

<strong>the</strong> early inaccurate estimati<strong>on</strong>s c<strong>on</strong>tribute to <strong>the</strong> suf�cient statistics average<br />

even in <strong>the</strong> later stages of <strong>the</strong> learning process and degrade <strong>the</strong> quality<br />

of estimati<strong>on</strong>s.<br />

In <strong>the</strong> sec<strong>on</strong>d experiment, <strong>the</strong> sequential model selecti<strong>on</strong> procedure using<br />

a trial model (see secti<strong>on</strong> 4) was tested. When <strong>the</strong> free energy c<strong>on</strong>verged, <strong>the</strong><br />

structure of <strong>the</strong> trial model was changed based <strong>on</strong> <strong>the</strong> base model, which was<br />

<strong>the</strong> best model to date. We tested two initial model c<strong>on</strong>�gurati<strong>on</strong>s c<strong>on</strong>sisting<br />

of 2 units and 10 units. When <strong>the</strong> model structure was changed, <strong>the</strong> discount<br />

factor and <strong>the</strong> effective learning c<strong>on</strong>stant in <strong>the</strong> <strong>on</strong>line VB method were reset<br />

as (1 ¡l(t )) D 0.01 and g(t ) D 0.01. The <strong>on</strong>line VB method was able to �nd<br />

<strong>the</strong> best model in all cases (see Figures 3 and 4). It should be noted that <strong>the</strong><br />

VB method sometimes increased <strong>the</strong> free energy while decreasing <strong>the</strong> data<br />

likelihood (see Figures 3–6). This was achieved as a result of <strong>the</strong> decrease

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!