13. Acting under Uncertainty Maximizing Expected Utility

13. Acting under Uncertainty Maximizing Expected Utility

13. Acting under Uncertainty Maximizing Expected Utility


Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sequential Decision Problems (2)Markov Decision Problem (MDP)Deterministic version: All actions always lead to the next square in theselected direction, except that moving into a wall results in no change inposition.Stochastic version: Each action achieves the intended effect withprobability 0.8, but the rest of the time, the agent moves at right anglesto the intended direction. a set of states in an accessible, stochastic environment, an MDP isdefined byInitial state S 0Transition Model T (s, a, s ′ )Reward function R(s)Transition model: T (s, a, s ′ ) is the probability that state s ′ is reached, ifaction a is executed in state s.Policy: Complete mapping π that specifies for each state s which actionπ(s) to take.Wanted: The optimal policy π ∗ is the policy that maximizes the expectedutility.(University of Freiburg) Foundations of AI July 18, 2012 13 / 32(University of Freiburg) Foundations of AI July 18, 2012 14 / 32Optimal Policies (1)Optimal Policies (2)Given the optimal policy, the agent uses its current percept that tells itits current state.It then executes the action π ∗ (s).We obtain a simple reflex agent that is computed from the informationused for a utility-based agent.Optimal policy for stochasticMDP with R(s) = −0.04:3+12–111 2 3 4Optimal policy changes with choice of transition costs R(s).How to compute optimal policies?(University of Freiburg) Foundations of AI July 18, 2012 15 / 32(University of Freiburg) Foundations of AI July 18, 2012 16 / 32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!