Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Existence of Optimal Policies – Unichain ModelsA decision d h is h-improving if d h ∈ argmax d∈D {r d + P d h}.TheoremSuppose there exists a scalar g ∗ and an h ∗ ∈ V for whichB(g ∗ , h ∗ ) = 0. Then if d ∗ is h ∗ -improving, (d ∗ ) ∞ is average optimal.Dan Zhang, Spring 2012 <strong>Infinite</strong> <strong>Horizon</strong> <strong>Average</strong> <strong>Reward</strong> MDP 10