11.07.2015 Views

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Existence of Solutions to the Optimality Equation –Unichain ModelsTheoremSuppose S and A s are finite, |r(s, a)| ≤ M < ∞ for all s, a, and themodel is unichain.(i) There exists a g ∈ R 1 and h ∈ V for which0 = maxd∈D {r d − ge + (P d − I )h};(ii) If (g ′ , h ′ ) is any other solution of the average rewardoptimality equation, then g = g ′ .Dan Zhang, Spring 2012 <strong>Infinite</strong> <strong>Horizon</strong> <strong>Average</strong> <strong>Reward</strong> MDP 9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!