12.07.2015 Views

Policy Gradient Algorithms

Policy Gradient Algorithms

Policy Gradient Algorithms

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Special case – Generalized L R-I• Consider binary bandit problems witharbitrary rewardsReinforcement Comparison• Set baseline to average of observedrewards• Softmax action selectionReinforcement Comparison contd.Computation ofcharacteristic eligibility forsoftmax action selectionContinuous Actions• Use a Gaussian distribution to selectactions• For suitable choice of parameters:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!