10.07.2015 Views

13. Acting under Uncertainty Maximizing Expected Utility

13. Acting under Uncertainty Maximizing Expected Utility

13. Acting under Uncertainty Maximizing Expected Utility

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sequential Decision Problems (2)Markov Decision Problem (MDP)Deterministic version: All actions always lead to the next square in theselected direction, except that moving into a wall results in no change inposition.Stochastic version: Each action achieves the intended effect withprobability 0.8, but the rest of the time, the agent moves at right anglesto the intended direction.0.10.80.1Given a set of states in an accessible, stochastic environment, an MDP isdefined byInitial state S 0Transition Model T (s, a, s ′ )Reward function R(s)Transition model: T (s, a, s ′ ) is the probability that state s ′ is reached, ifaction a is executed in state s.Policy: Complete mapping π that specifies for each state s which actionπ(s) to take.Wanted: The optimal policy π ∗ is the policy that maximizes the expectedutility.(University of Freiburg) Foundations of AI July 18, 2012 13 / 32(University of Freiburg) Foundations of AI July 18, 2012 14 / 32Optimal Policies (1)Optimal Policies (2)Given the optimal policy, the agent uses its current percept that tells itits current state.It then executes the action π ∗ (s).We obtain a simple reflex agent that is computed from the informationused for a utility-based agent.Optimal policy for stochasticMDP with R(s) = −0.04:3+12–111 2 3 4Optimal policy changes with choice of transition costs R(s).How to compute optimal policies?(University of Freiburg) Foundations of AI July 18, 2012 15 / 32(University of Freiburg) Foundations of AI July 18, 2012 16 / 32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!