Actor Critic Method: Maze Example - FIAS
Actor Critic Method: Maze Example - FIAS
Actor Critic Method: Maze Example - FIAS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Solving the <strong>Maze</strong> Problem<br />
Assumptions:<br />
• state is fully observable (in contrast to only partially<br />
observable), i.e. the rat knows exactly where it is at any<br />
time<br />
• actions have deterministic consequences (in contrast to<br />
probabilistic)<br />
Idea: maintain and improve a stochastic policy which<br />
determines the action at each decision point (A,B,C)<br />
using action values and softmax decision rule<br />
<strong>Actor</strong> <strong>Critic</strong> Learning:<br />
• critic: use temporal difference learning to predict<br />
future rewards from A,B,C if current policy is followed<br />
• actor: maintain and improve the policy