Policy Gradient Algorithms
Policy Gradient Algorithms
Policy Gradient Algorithms
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Special case – Generalized L R-I• Consider binary bandit problems witharbitrary rewardsReinforcement Comparison• Set baseline to average of observedrewards• Softmax action selectionReinforcement Comparison contd.Computation ofcharacteristic eligibility forsoftmax action selectionContinuous Actions• Use a Gaussian distribution to selectactions• For suitable choice of parameters: