Infinite-Horizon Average Reward Markov Decision Processes

More documents

Recommendations

Info

Existence of Optimal Policies – Unichain ModelsA decision d h is h-improving if d h ∈ argmax d∈D {r d + P d h}.TheoremSuppose there exists a scalar g ∗ and an h ∗ ∈ V for whichB(g ∗ , h ∗ ) = 0. Then if d ∗ is h ∗ -improving, (d ∗ ) ∞ is average optimal.Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 10
Existence of Optimal Policies – Unichain ModelsTheoremSuppose S and A s are finite, r(s, a) is bounded, and the model isunichain. Then(i) there exists a stationary average optimal policy;(ii) there exists a scalar g ∗ and an h ∗ ∈ V for whichB(g ∗ , h ∗ ) = 0;(iii) any stationary policy derived from an h ∗ -improving decisionrule is average optimal;(iv) g ∗ e = g ∗ + = g ∗ −.Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 11
Page 2: OutlineThe average rewardClassifica
Page 7 and 8: The Average Reward Optimality Equat
Page 9: Existence of Solutions to the Optim
Page 13 and 14: Relative Value Iteration1 Select u
Page 15 and 16: Policy Iteration1 Set n = 0 and sel

Infinite-Horizon Average Reward Markov Decision Processes

Create successful ePaper yourself

Delete template?

Save as template?