Infinite-Horizon Average Reward Markov Decision Processes

More documents

Recommendations

Info

The Average Reward Optimality Equation – UnichainModelsTheoremSuppose S is countable.(i) If there exists a scalar g and an h ∈ V which satisfyB(g, h) ≤ 0, then ge ≥ g ∗ +;(ii) If there exists a scalar g and an h ∈ V which satisfyB(g, h) ≥ 0, then ge ≤ sup d∈D MD g d∞− ≤ g ∗ −;(iii) If there exists a scalar g and an h ∈ V which satisfyB(g, h) = 0, then ge = g ∗ = g ∗ + = g ∗ −.Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 8
Existence of Solutions to the Optimality Equation –Unichain ModelsTheoremSuppose S and A s are finite, |r(s, a)| ≤ M < ∞ for all s, a, and themodel is unichain.(i) There exists a g ∈ R 1 and h ∈ V for which0 = maxd∈D {r d − ge + (P d − I )h};(ii) If (g ′ , h ′ ) is any other solution of the average rewardoptimality equation, then g = g ′ .Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 9
Page 2: OutlineThe average rewardClassifica
Page 7: The Average Reward Optimality Equat
Page 11 and 12: Existence of Optimal Policies - Uni
Page 13 and 14: Relative Value Iteration1 Select u
Page 15 and 16: Policy Iteration1 Set n = 0 and sel

Infinite-Horizon Average Reward Markov Decision Processes

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?