Infinite-Horizon Average Reward Markov Decision Processes

More documents

Recommendations

Info

Value Iteration1 Select v 0 ∈ V , specify ɛ > 0, and set n = 0.2 For each s ∈ S, compute v n+1 (s) by⎡v n+1 (s) = max ⎣r(s, a) + ∑a∈A sj∈S⎤p(j|s, a)v n (j) ⎦ .3 If sp(v n+1 − v n ) < ɛ, go to step 4. Otherwise, increment n by1 and return to step 2.4 For each s ∈ S, choose⎡and stop.d ɛ (s) ∈ argmaxa∈A s⎣r(s, a) + ∑ j∈S⎤p(j|s, a)v n+1 (j) ⎦Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 12
Relative Value Iteration1 Select u 0 ∈ V , choose s ∗ ∈ S, specify ɛ > 0, setw 0 = u 0 − u 0 (s ∗ )e, and set n = 0.2 For each s ∈ S, compute u n+1 (s) by⎡⎤u n+1 (s) = max ⎣r(s, a) + ∑ p(j|s, a)w n (j) ⎦ .a∈A sj∈SLet w n+1 = u n+1 − u n+1 (s ∗ )e.3 If sp(u n+1 − u n ) < ɛ, go to step 4. Otherwise, increment n by1 and return to step 2.4 For each s ∈ S, choose⎡⎤and stop.d ɛ (s) ∈ argmaxa∈A s⎣r(s, a) + ∑ j∈Sp(j|s, a)u n (j) ⎦Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 13
Page 2: OutlineThe average rewardClassifica
Page 7 and 8: The Average Reward Optimality Equat
Page 9 and 10: Existence of Solutions to the Optim
Page 11: Existence of Optimal Policies - Uni
Page 15 and 16: Policy Iteration1 Set n = 0 and sel

Infinite-Horizon Average Reward Markov Decision Processes

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?