Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Linear ProgrammingPrimal linear program is given byming,h gg + h(s) − ∑ j∈Sp(j|s, a)h(j) ≥ r(s, a), ∀s ∈ S, a ∈ A s .Dual linear program is given by∑ ∑max r(s, a)x(s, a)xs∈S a∈A s∑∑x(j, a) − ∑ λp(j|s, a)x(s, a) = 0, ∀j ∈ S,a∈A j s∈S a∈A s∑ ∑x(s, a) = 1,s∈S a∈A sx(s, a) ≥ 0, ∀s ∈ S, a ∈ A s .Dan Zhang, Spring 2012 <strong>Infinite</strong> <strong>Horizon</strong> <strong>Average</strong> <strong>Reward</strong> MDP 15