10.07.2015 Views

13. Acting under Uncertainty Maximizing Expected Utility

13. Acting under Uncertainty Maximizing Expected Utility

13. Acting under Uncertainty Maximizing Expected Utility

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Choosing Actions using theMaximum <strong>Expected</strong> <strong>Utility</strong> PrincipleBellman-EquationThe agent simply chooses the action that maximizes the expected utilityof the subsequent state:∑π(s) = argmax T (s, a, s ′ )U(s ′ )as ′The utility of a state is the immediate reward for that state plus theexpected discounted utility of the next state, assuming that the agentchooses the optimal action:U(s) = R(s) + γ maxa∑s ′ T (s, a, s ′ )U(s ′ )The equation∑U(s) = R(s) + γ max T (s, a, s ′ )U(s ′ )as ′is also called the Bellman-Equation.(University of Freiburg) Foundations of AI July 18, 2012 21 / 32(University of Freiburg) Foundations of AI July 18, 2012 22 / 32Bellman-Equation: ExampleIn our 4 × 3 world the equation for the state (1,1) isU(1, 1) = −0.04 + γ max{ 0.8U(1, 2) + 0.1U(2, 1) + 0.1U(1, 1),(Up)0.9U(1, 1) + 0.1U(1, 2),(Left)0.9U(1, 1) + 0.1U(2, 1),(Down)0.8U(2, 1) + 0.1U(1, 2) + 0.1U(1, 1)}(Right)= −0.04 + γ max{ 0.8 · 0.762 + 0.1 · 0.655 + 0.1 · 0.705, (Up)0.9 · 0.705 + 0.1 · 0.762, (Left)0.9 · 0.705 + 0.1 · 0.655, (Down)0.8 · 0.655 + 0.1 · 0.762 + 0.1 · 0.705} (Right)= −0.04 + 1.0 ( 0.6096 + 0.0655 + 0.0705), (Up) = −0.04 + 0.7456 = 0.7056→ Up is the optimal action in (1,1).320.8120.7620.8680.9180.660+ 1–1Value Iteration (1)An algorithm to calculate an optimal strategy.Basic Idea: Calculate the utility of each state. Then use the state utilitiesto select an optimal action for each state.A sequence of actions generates a branch in the tree of possible states(histories). A utility function on histories U h is separable iff there exists afunction f such thatU h ([s 0 , s 1 , . . . , s n ]) = f(s 0 , U h ([s 1 , . . . , s n ]))The simplest form is an additive reward function R:U h ([s 0 , s 1 , . . . , s n ]) = R(s 0 ) + U h ([s 1 , . . . , s n ]))10.7050.6550.6110.388In the example, R((4, 3)) = +1, R((4, 2)) = −1, R(other) = −1/25.1 2 34(University of Freiburg) Foundations of AI July 18, 2012 23 / 32(University of Freiburg) Foundations of AI July 18, 2012 24 / 32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!