Actor Critic Method: Maze Example - FIAS

ePAPER READ

DOWNLOAD ePAPER

bccn2009.org

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

START NOW

Solving the Maze Problem

Assumptions:

• state is fully observable (in contrast to only partially

observable), i.e. the rat knows exactly where it is at any

time

• actions have deterministic consequences (in contrast to

probabilistic)

Idea: maintain and improve a stochastic policy which

determines the action at each decision point (A,B,C)

using action values and softmax decision rule

Actor Critic Learning:

• critic: use temporal difference learning to predict

future rewards from A,B,C if current policy is followed

• actor: maintain and improve the policy

b ehavi orwe sh ow h ow t he ca li b ra tion of t he r ... - BCCN 2009

The RÃ´le of a priori Biases in Unsupervised ... - ResearchGate

Solving the Maze Problem Assumptions: • state is fully observable (in contrast to only partially observable), i.e. the rat knows exactly where it is at any time • actions have deterministic consequences (in contrast to probabilistic) Idea: maintain and improve a stochastic policy which determines the action at each decision point (A,B,C) using action values and softmax decision rule Actor Critic Learning: • critic: use temporal difference learning to predict future rewards from A,B,C if current policy is followed • actor: maintain and improve the policy

Actor-Critic Method Policy Agent Actor state Critic Value Function TD error action reward Environment

Page 1: Actor Critic Method: Maze Example
Page 5 and 6: Policy Iteration • Two Observatio
Page 7 and 8: Temporal Difference Learning Idea:
Page 9 and 10: Note: Sutton and Barto book uses
Page 11: Policy Improvement Example •learn

Actor Critic Method: Maze Example - FIAS

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?