07.08.2013 Views

beamer - Vrije Universiteit Amsterdam

beamer - Vrije Universiteit Amsterdam

beamer - Vrije Universiteit Amsterdam

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

§3.3.3 Markov Chains with Rewards<br />

▶ Let f : I → be a reward or cost function;<br />

▶ ∑ n<br />

k=1 f (Xk) is the total reward up to time n;<br />

▶ limn→∞ 1<br />

n<br />

∑ n<br />

k=1 f (Xk) is the long-run average reward per unit of time;<br />

▶ We wish to have an ergodic (or Markov-reward) property<br />

1<br />

lim<br />

n→∞ n<br />

n∑<br />

f (Xk) = ∑<br />

πjf (j) (w.p. 1)<br />

k=1<br />

j∈I<br />

c⃝ Ad Ridder (VU) SOR– Fall 2012 28 / 36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!