11.07.2015 Views

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Relative Value Iteration1 Select u 0 ∈ V , choose s ∗ ∈ S, specify ɛ > 0, setw 0 = u 0 − u 0 (s ∗ )e, and set n = 0.2 For each s ∈ S, compute u n+1 (s) by⎡⎤u n+1 (s) = max ⎣r(s, a) + ∑ p(j|s, a)w n (j) ⎦ .a∈A sj∈SLet w n+1 = u n+1 − u n+1 (s ∗ )e.3 If sp(u n+1 − u n ) < ɛ, go to step 4. Otherwise, increment n by1 and return to step 2.4 For each s ∈ S, choose⎡⎤and stop.d ɛ (s) ∈ argmaxa∈A s⎣r(s, a) + ∑ j∈Sp(j|s, a)u n (j) ⎦Dan Zhang, Spring 2012 <strong>Infinite</strong> <strong>Horizon</strong> <strong>Average</strong> <strong>Reward</strong> MDP 13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!