11.07.2015 Views

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Existence of Optimal Policies – Unichain ModelsTheoremSuppose S and A s are finite, r(s, a) is bounded, and the model isunichain. Then(i) there exists a stationary average optimal policy;(ii) there exists a scalar g ∗ and an h ∗ ∈ V for whichB(g ∗ , h ∗ ) = 0;(iii) any stationary policy derived from an h ∗ -improving decisionrule is average optimal;(iv) g ∗ e = g ∗ + = g ∗ −.Dan Zhang, Spring 2012 <strong>Infinite</strong> <strong>Horizon</strong> <strong>Average</strong> <strong>Reward</strong> MDP 11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!