beamer - Vrije Universiteit Amsterdam
beamer - Vrije Universiteit Amsterdam
beamer - Vrije Universiteit Amsterdam
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
§3.3.3 Markov Chains with Rewards<br />
▶ Let f : I → be a reward or cost function;<br />
▶ ∑ n<br />
k=1 f (Xk) is the total reward up to time n;<br />
▶ limn→∞ 1<br />
n<br />
∑ n<br />
k=1 f (Xk) is the long-run average reward per unit of time;<br />
▶ We wish to have an ergodic (or Markov-reward) property<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
f (Xk) = ∑<br />
πjf (j) (w.p. 1)<br />
k=1<br />
j∈I<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 28 / 36