Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Existence of Solutions to the Optimality Equation –Unichain ModelsTheoremSuppose S and A s are finite, |r(s, a)| ≤ M < ∞ for all s, a, and themodel is unichain.(i) There exists a g ∈ R 1 and h ∈ V for which0 = maxd∈D {r d − ge + (P d − I )h};(ii) If (g ′ , h ′ ) is any other solution of the average rewardoptimality equation, then g = g ′ .Dan Zhang, Spring 2012 <strong>Infinite</strong> <strong>Horizon</strong> <strong>Average</strong> <strong>Reward</strong> MDP 9