04.02.2015 Views

Actor Critic Method: Maze Example - FIAS

Actor Critic Method: Maze Example - FIAS

Actor Critic Method: Maze Example - FIAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Solving the <strong>Maze</strong> Problem<br />

Assumptions:<br />

• state is fully observable (in contrast to only partially<br />

observable), i.e. the rat knows exactly where it is at any<br />

time<br />

• actions have deterministic consequences (in contrast to<br />

probabilistic)<br />

Idea: maintain and improve a stochastic policy which<br />

determines the action at each decision point (A,B,C)<br />

using action values and softmax decision rule<br />

<strong>Actor</strong> <strong>Critic</strong> Learning:<br />

• critic: use temporal difference learning to predict<br />

future rewards from A,B,C if current policy is followed<br />

• actor: maintain and improve the policy

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!