Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
The <strong>Average</strong> <strong>Reward</strong> Optimality Equation – UnichainModelsTheoremSuppose S is countable.(i) If there exists a scalar g and an h ∈ V which satisfyB(g, h) ≤ 0, then ge ≥ g ∗ +;(ii) If there exists a scalar g and an h ∈ V which satisfyB(g, h) ≥ 0, then ge ≤ sup d∈D MD g d∞− ≤ g ∗ −;(iii) If there exists a scalar g and an h ∈ V which satisfyB(g, h) = 0, then ge = g ∗ = g ∗ + = g ∗ −.Dan Zhang, Spring 2012 <strong>Infinite</strong> <strong>Horizon</strong> <strong>Average</strong> <strong>Reward</strong> MDP 8