17.01.2015 Views

Richard S. Sutton - Webdocs Cs Ualberta - University of Alberta

Richard S. Sutton - Webdocs Cs Ualberta - University of Alberta

Richard S. Sutton - Webdocs Cs Ualberta - University of Alberta

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CURRICULUM VITAE<br />

<strong>Richard</strong> S. <strong>Sutton</strong><br />

September, 2011<br />

Pr<strong>of</strong>essor, Department <strong>of</strong> Computing Science, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong><br />

address: Athabasca Hall 2-21, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong>, Edmonton, AB T6G 2E8<br />

telephone: 780-492-4584, fax: 780-492-1111, cell: 780-270-8139<br />

email: rich@richsutton.com<br />

web: http://richsutton.com<br />

Degrees<br />

Doctor <strong>of</strong> Philosophy in Computer Science, <strong>University</strong> <strong>of</strong> Massachusetts at Amherst, 1984.<br />

Dissertation: “Temporal credit assignment in reinforcement learning.”<br />

Master <strong>of</strong> Science in Computer Science, <strong>University</strong> <strong>of</strong> Massachusetts at Amherst, 1980.<br />

Thesis: “An adaptive network that constructs and uses an internal model <strong>of</strong> its world.”<br />

Bachelor <strong>of</strong> Arts in Psychology, Stanford <strong>University</strong>, with honors and distinction, 1978.<br />

Senior thesis: “A unified theory <strong>of</strong> expectation in classical and instrumental conditioning.”<br />

Pr<strong>of</strong>essional Experience<br />

Full Pr<strong>of</strong>essor and iCORE Chair, Department <strong>of</strong> Computing Science, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong>,<br />

2003–. Leads the Reinforcement Learning and Artificial Intelligence Laboratory <strong>of</strong> about 40<br />

graduate students, staff, postdocs, and faculty co-PIs.<br />

Principal Technical Staff Member, Artificial Intelligence Department, AT&T Shannon<br />

Laboratory, Florham Park, New Jersey, 1998–2002.<br />

Senior Research Scientist, Department <strong>of</strong> Computer Science, <strong>University</strong> <strong>of</strong> Massachusetts<br />

at Amherst, 1995–1998. Co-led (with A. Barto) the Adaptive Networks Laboratory<br />

comprising approximately 12 graduate students and other researchers.<br />

Principal Member <strong>of</strong> Technical Staff, Computer and Intelligent Systems Laboratory, GTE<br />

Laboratories, Waltham, Massachusetts, 1985–1994. From 1986 led the Connectionist<br />

Machine Learning Project, a group <strong>of</strong> 3-6 conducting basic and applied research.<br />

Senior Postdoctoral Research Associate, Department <strong>of</strong> Computer Science, <strong>University</strong> <strong>of</strong><br />

Massachusetts at Amherst, 1984.<br />

1


Research<br />

Pioneered and made repeated contributions to reinforcement learning, an approach to<br />

artificial and natural intelligence that emphasizes learning and planning from sample<br />

experience. Currently seeking to extend reinforcement learning ideas to an empirically<br />

grounded approach to knowledge representation.<br />

Most significant contributions:<br />

• The theory <strong>of</strong> temporal-difference learning and the TD(λ) algorithm (1988).<br />

• The standard textbook for reinforcement learning (with Barto, 1998)<br />

• The actor-critic (policy gradient) class <strong>of</strong> algorithms (1984, 2000).<br />

• The Dyna architecture integrating learning, planning and reacting (1990)<br />

• Temporal-difference models <strong>of</strong> animal learning (with Barto, 1981, 1990)<br />

• The “options” framework for temporal abstraction (with Precup, Singh, 1999)<br />

• Predictive state representations (with Littman, Singh, 2002)<br />

• Algorithms for online step-size adaptation (1981, 1992)<br />

• Temporal-difference networks for grounded knowledge representation (2005, 2006)<br />

• Gradient temporal-difference algorithms (with Maei, Szepesvari, 2008–)<br />

Selected Grants<br />

NSERC Collaborative Research and Development Grant, with Nortel Networks and Bell<br />

Canada, “Learning and Prediction in High-dimensional Stochastic Domains,” September<br />

2006 – August 2009, funded at CAD$186,523. One <strong>of</strong> five principal investigators.<br />

iCORE Chair and Pr<strong>of</strong>essorship Establishment Grant, “Reinforcement Learning and<br />

Artificial Intelligence,” September 1, 2003 – August 31, 2008, funded at CAD$3,000,000.<br />

Principal investigator. Renewed until August 2013 at an additional CAD$2,750,000.<br />

<strong>Alberta</strong> Ingenuity Centre Grant, “<strong>Alberta</strong> Ingenuity Centre for Machine Learning,” April<br />

2003 – March 2008, funded at CAD$9,887,600. One <strong>of</strong> eight principal investigators.<br />

Renewed until March 2009 at CAD$2,000,000. Renewed again in 2009 for another five<br />

years at CAD$10,000,000.<br />

NSERC Discovery Grant, “Reinforcement Learning and Artificial Intelligence,” April, 2004<br />

– March 2009, funded at CAD$250,000. Principal investigator. Renewed in 2009 for a<br />

second five years at an additional CAD$190,000.<br />

Air Force Office <strong>of</strong> Scientific Research to the <strong>University</strong> <strong>of</strong> Massachusetts, “Stochastic<br />

Scheduling and Planning Using Reinforcement Learning,” AFOSR Grant Number<br />

F49620-96-1-0254, June 1, 1996 – May 31, 2000, funded at USD$446,570. Coprincipal<br />

investigator with A. Barto.<br />

2


National Science Foundation to the <strong>University</strong> <strong>of</strong> Massachusetts, “Multiple Time Scale<br />

Reinforcement Learning,” Communications & Computational Systems/Neuroengineering<br />

Grant ECS-9511805, September 15, 1995 – August 31, 1998, funded at USD$157,261.<br />

Primary senior personnel (A. Barto, PI).<br />

Teaching Experience<br />

Courses<br />

Intelligent Systems, CMPUT 366, an introduction to artificial intelligence for<br />

undergraduates at the <strong>University</strong> <strong>of</strong> <strong>Alberta</strong>, 2008–2010<br />

Reinforcement Learning for Artificial Intelligence, CMPUT 499/609, cross-listed<br />

undergraduate/graduate course at the <strong>University</strong> <strong>of</strong> <strong>Alberta</strong>, 2003–2007, 2009-10<br />

Non-procedural Programming Languages, CMPUT 325, undergraduate course at the<br />

<strong>University</strong> <strong>of</strong> <strong>Alberta</strong>, Fall 2006<br />

Reinforcement Learning in Practice, CMPUT 607, graduate course at the <strong>University</strong> <strong>of</strong><br />

<strong>Alberta</strong>, Spring 2005<br />

Reinforcement Learning, CMPSCI 791O, graduate course at the <strong>University</strong> <strong>of</strong><br />

Massachusetts, with A. Barto, Fall 1996 and 1997<br />

Reinforcement Learning, special intensive course at the <strong>University</strong> <strong>of</strong> Uppsala, Sweden,<br />

April–May 1996<br />

Reinforcement Learning, CMPSCI 791N, graduate seminar at the <strong>University</strong> <strong>of</strong><br />

Massachusetts, with A. Barto, Fall 1995<br />

Cybernetics <strong>of</strong> Adaptation and Learning, COINS 791M, graduate seminar, teaching<br />

assistant including some lecturing, Fall 1981<br />

Summer Schools<br />

Instructor at Machine Learning Summer School, Ile de Rey, France, 2008<br />

Instructor at Cambridge <strong>University</strong> Neural Networks Summer School (three lectures each<br />

summer), 1993–1997<br />

Instructor at Cold Spring Harbor Summer School on Computational Neuroscience:<br />

Learning and Memory, July 1990<br />

3


Tutorials on Reinforcement Learning at Conferences<br />

Conference <strong>of</strong> the Association for the Advancement <strong>of</strong> Artificial Intelligence, 2010<br />

International Conference on Machine Learning, 1999<br />

Genetic Programming Conference, 1998<br />

Second Asia-Pacific Conference on Simulated Evolution and Learning, Australia, 1998<br />

National Conference on Artificial Intelligence, with L. Kaelbling, 1997<br />

National Conference on Computer Science – Mexico, 1997<br />

Conference on Neural Information Processing Systems, 1996<br />

Scholarly Service<br />

Meetings Organized<br />

Co-chair, Multidisciplinary Symposium on Reinforcement Learning, 2009<br />

Program co-chair, National Conference on Artificial Intelligence, 2002<br />

Chair, Workshop on Reinforcement Learning, Int’l. Conference on Machine Learning, 1993<br />

Co-chair, Bellairs Workshop on Reinforcement Learning, 2005-6, 2008-11<br />

Co-chair, Workshop on Reinforcement Learning–Benchmarks and Bake<strong>of</strong>fs, International<br />

Conference on Neural Information Processing Systems, 2004<br />

Co-chair, Workshop on Predictive Representation <strong>of</strong> World Knowledge, International<br />

Conference on Machine Learning, 2004<br />

Co-chair, NSF Workshop on Neural Networks for Robotics and Control, 1988<br />

Editorial<br />

Foundations and Trends in Machine Learning, founding editorial board, 2007–<br />

Journal <strong>of</strong> Machine Learning Research: advisory board 1999–<br />

J. <strong>of</strong> Artificial Intelligence Research: action editor 1993-1996, advisory board 1996–2000<br />

Machine Learning: action editor 1989–94, editorial board 1994–2001<br />

Connection Science: action editor 1989–90, editorial board 1990–1995<br />

Adaptive Behavior: editorial board 1991–2000<br />

Guest Editor, special double issue <strong>of</strong> Machine Learning on Reinforcement Learning, 1992<br />

Conference Program Committees<br />

International Conference on Machine Learning, 1988–01,03,04,10, area chair 2005–6,8<br />

Conf. on Neural Information Processing Systems, 1989–2000, 2003–2010, area chair 1997<br />

Conference on Uncertainty in Artificial Intelligence, 2005, 2009<br />

International Joint Conference on Autonomous Agents and Multi-Agent Systems, 2005<br />

AAAI, 1993–1996, 2007, 2008, senior program committee 1998–2000<br />

International Joint Conference on Artificial Intelligence, 1995, 1997, 2001, 2005<br />

International Conference on the Simulation <strong>of</strong> Adaptive Behavior, 1990, 1992<br />

Cognitive Science Conference, 1986, 1987<br />

Conference on Computational Learning Theory, 1996<br />

Canadian Conference on Artificial Intelligence, 2006<br />

4


Honors<br />

iCORE chair, 2003; renewed 2008<br />

Fellow <strong>of</strong> the American Association for Artificial Intelligence, 2001<br />

Elected member <strong>of</strong> the executive council <strong>of</strong> the American Association for Artificial<br />

Intelligence, 2007–2010<br />

President's Award, International Neural Network Society (INNS), 2003<br />

Warner award, GTE’s highest award for technical achievement contributing to business<br />

success, 1994, for leading a team that developed a method for adaptive channel assignment<br />

that improved call-carrying capacity in a detailed simulation <strong>of</strong> the GTE Airfone network<br />

by over 40%.<br />

Outstanding Leadership in <strong>Alberta</strong> Technology Award from the <strong>Alberta</strong> Science and<br />

Technology Leadership Foundation, awarded in 2006 to the <strong>Alberta</strong> Ingenuity Center for<br />

Machine Learning. (One <strong>of</strong> eight principal investigators.)<br />

Students<br />

Currently Under Supervision at the <strong>University</strong> <strong>of</strong> <strong>Alberta</strong><br />

Adam White, PhD<br />

Brian Tanner, PhD<br />

Marc Gendron-Bellemare, PhD<br />

Hamid Reza Maei, PhD<br />

Anna Koop, PhD<br />

Hengshuai Yao, MSc<br />

Leah Hackman, MSc<br />

Ashique Mahmood, PhD<br />

Neda Mirian HosseinAbadi, MSc<br />

Doctoral Graduates Supervised and Graduated<br />

Doina Precup, “Temporal abstraction in reinforcement learning,” <strong>University</strong> <strong>of</strong><br />

Massachusetts, 2000. Currently an associate pr<strong>of</strong>essor <strong>of</strong> Computer Science at McGill<br />

<strong>University</strong>.<br />

David Silver, “Reinforcement Learning and Simulation-Based Search in Computer Go,”<br />

<strong>University</strong> <strong>of</strong> <strong>Alberta</strong>, 2009. Currently a postdoctoral fellow at MIT.<br />

5


Masters Graduates Supervised and Graduated<br />

Michael Delp, MSc<br />

Ashique Mahmood, MSc<br />

Masoud Shahamiri, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong>, 2008<br />

Anna Koop, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong> 2007<br />

Eddie Rafols, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong> 2006<br />

Adam White, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong> 2006<br />

Alborz Geramifard, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong> 2006<br />

Cosmin Paduraru, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong> 2006<br />

Brian Tanner, <strong>University</strong> <strong>of</strong> <strong>Alberta</strong> 2005<br />

Amy McGovern, <strong>University</strong> <strong>of</strong> Massachusetts 1998<br />

Doina Precup, <strong>University</strong> <strong>of</strong> Massachusetts 1996<br />

Doctoral Examination Committees<br />

Hado van Hasselt, “Insights in Reinforcement Learning,” SIKS, Dutch Research School for<br />

Information and Knowledge Systems, 2011.<br />

Tao Wang, “New Representations and Approximations for Sequential Decision<br />

Making under Uncertainty,” <strong>University</strong> <strong>of</strong> <strong>Alberta</strong>, 2007<br />

Mohammad Al-Ansari, Northeastern <strong>University</strong>, 2001<br />

Manfred Huber, “A hybrid architecture for adaptive robot control,” <strong>University</strong> <strong>of</strong><br />

Massachusetts, 2000<br />

Gavin Rummery, “Problem solving with reinforcement learning,” Cambridge <strong>University</strong>,<br />

1995<br />

Chen Tham, “Online function approximation for scaling up reinforcement learning,”<br />

Cambridge <strong>University</strong>, 1994<br />

Jing Peng, “Efficient dynamic programming based learning for control,” Northeastern<br />

<strong>University</strong>, 1994<br />

Satinder Singh, “Learning to solve Markovian decision processes,” <strong>University</strong> <strong>of</strong><br />

Massachusetts, 1993<br />

Long-Ji Lin, “Reinforcement learning for robots using neural networks,” Carnegie-Mellon<br />

<strong>University</strong>, 1992<br />

Steve Whitehead, “Reinforcement learning for the adaptive control <strong>of</strong> perception and<br />

action,” <strong>University</strong> <strong>of</strong> Rochester, 1992<br />

Leslie Kaelbling, “Learning in embedded systems,” Stanford <strong>University</strong>, 1990<br />

6


Selected Invited Presentations<br />

“Learning about Sensorimotor data,” Posner (invited plenary) talk at the Twenty-Fifth<br />

Annual Conference on Neural Information Processing Systems, Granada, Spain, 2011, and<br />

keynote at the 21st Annual Conference <strong>of</strong> the Japanese Neural Network Society, Okinawa,<br />

Japan.<br />

“Beyond Reward: The Problem <strong>of</strong> Knowledge and Data,” Keynote at the 21st International<br />

Conference on Inductive Logic Programming, Windsor Great Park, United Kingdom, 2011<br />

“Toward Learning Human-level Predictive Knowledge,” Keynote at the Third Conference<br />

on Artificial General Intelligence, Lugano, Switzerland, 2010<br />

“Mind and Time: A View <strong>of</strong> Constructivist Reinforcement Learning,” invited plenary talk at<br />

the European Workshop on Reinforcement Learning, Lille, France, 2008<br />

“Stimulus Representation in Temporal-difference Models <strong>of</strong> the Dopamine System,”<br />

Computation and Neural Systems seminar series on the California Institute <strong>of</strong> Technology,<br />

2007<br />

“Reinforcement Learning’s Computational Theory <strong>of</strong> Mind,” plenary invited talk at a special<br />

Workshop on Dopamine and its Role in Learning, Motivation, and Psychiatric Disorders,<br />

McGill <strong>University</strong>, 2005<br />

“Experience-Oriented Artificial Intelligence,” Distinguished Lecture Series <strong>of</strong> the<br />

Mathematics and Computer Science Department <strong>of</strong> the <strong>University</strong> <strong>of</strong> Lethbridge, 2005<br />

“Experience-Oriented Artificial Intelligence,” Distinguished Lecture Series <strong>of</strong> the Cognitive<br />

Science Department <strong>of</strong> the <strong>University</strong> <strong>of</strong> California, San Diego, 2005<br />

“Grounding Knowledge in Subjective Experience” plenary provocateur talk, Second<br />

Cognitive Systems Conference, 2005<br />

“Reinforcement Learning: How Far Can It Go” joint plenary at the (co-located)<br />

International Conference on Machine Learning, Conference on Computational Learning<br />

Theory, and Conference on Uncertainty in Artificial Intelligence, 1998<br />

“Reinforcement Learning: Past, Present, and Future,” plenary at the Second Asia-Pacific<br />

Conference on Simulated Evolution and Learning, 1998, Australia<br />

“Between MDPs and Semi-MDPs: Learning, Planning and Representing Knowledge at<br />

Multiple Temporal Scales,” 1998, <strong>University</strong> Simon Bolivar, Caracas, Venezuela<br />

7


“Computational Reinforcement Learning,” plenary at the Annual Meeting <strong>of</strong> the Society for<br />

the Quantitative Analysis <strong>of</strong> Behavior, 1998<br />

“On the Significance <strong>of</strong> Markov Decision Processes,” plenary at the European Conference<br />

on Computational Learning Theory, 1997, Germany<br />

“Reinforcement Learning: Lessons for AI,” International Joint Conference on Artificial<br />

Intelligence, 1997, Japan<br />

“Markov Decision Processes: Lessons from Reinforcement Learning,” plenary at the<br />

International Conference on Artificial Neural Networks, 1997, Switzerland<br />

“Learning from Interaction,” keynote address, National Conference on Computer Science –<br />

Mexico, 1997, Mexico<br />

“The Value <strong>of</strong> Reinforcement Learning in Neurobiology,” Gottingen Neurobiology<br />

Conference, 1995, Germany<br />

“Temporal-Difference Learning,” National Conference on Artificial Intelligence, 1993<br />

“Temporal-Difference Learning,” plenary at the IEEE International Conference on Neural<br />

Networks, 1993<br />

“Reinforcement Learning Architectures,” International Symposium on Neural Information<br />

Processing, 1992, Kyushu, Japan<br />

Publications<br />

Books<br />

1. <strong>Sutton</strong>, R. S., Barto, A. G., Reinforcement Learning: An Introduction. MIT Press, 1998.<br />

Also translated into Japanese.<br />

2. Miller, W. T., <strong>Sutton</strong>, R. S., Werbos, P. J. (Eds.), Neural Networks for Control. MIT<br />

Press, 1991.<br />

3. <strong>Sutton</strong>, R. S. (Ed.), Reinforcement Learning. Reprinting <strong>of</strong> a special issue <strong>of</strong> Machine<br />

Learning Journal. Kluwer Academic Press, 1992<br />

Journal Articles<br />

4. Kehoe, E. J., Ludvig, E. A., <strong>Sutton</strong>, R. S., “Timing in trace conditioning <strong>of</strong> the<br />

nictitating membrane response <strong>of</strong> the rabbit (Oryctolagus cuniculus): Scalar, nonscalar,<br />

and adaptive features,” Learning and Memory 17:600-604, 2010.<br />

8


5. Kehoe, E. J., Ludvig, E. A., <strong>Sutton</strong>, R. S., “Magnitude and timing <strong>of</strong> conditioned<br />

responses in delay and trace classical conditioning <strong>of</strong> the nictitating membrane response<br />

<strong>of</strong> the rabbit (Oryctolagus cuniculus),” Behavioral Neuroscience 123(5):1095–1101,<br />

2009.<br />

6. Kehoe, E. J., Olsen, K. N., Ludvig, E. A., <strong>Sutton</strong>, R. S., “Scalar timing varies with<br />

response magnitude in classical conditioning <strong>of</strong> the rabbit nictitating membrane<br />

response (Oryctolagus cuniculus),” Behavioral Neuroscience 123, 212–217, 2009.<br />

7. Bhatnagar, S., <strong>Sutton</strong>, R. S., Ghavamzadeh, M., Lee, M., “Natural actor–critic<br />

algorithms,” Automatica 45(11):2471–2482, 2009.<br />

8. Ludvig, E. A., <strong>Sutton</strong>, R. S., Kehoe, E. J., “Stimulus representation and the timing <strong>of</strong><br />

reward-prediction errors in models <strong>of</strong> the dopamine system,” Neural Computation<br />

20:3034–3054, 2008.<br />

9. Kehoe, E. J., Ludvig, E. A., Dudeney, J. E., Neufeld, J., <strong>Sutton</strong>, R. S., “Magnitude and<br />

timing <strong>of</strong> nictitating membrane movements during classical conditioning <strong>of</strong> the rabbit<br />

(Oryctolagus cuniculus),” Behavioral Neuroscience 122(2):471–476, 2008.<br />

10. Stone, P., <strong>Sutton</strong>, R. S., Kuhlmann, G., “Reinforcement Learning for RoboCup-Soccer<br />

Keepaway.” Adaptive Behavior 13(3):165–188, 2005.<br />

11. <strong>Sutton</strong>, R. S., Precup, D., Singh, S., “Between MDPs and semi-MDPs: a framework for<br />

temporal abstraction in reinforcement learning,” Artificial Intelligence, 112, 1999. pp.<br />

181–211.<br />

12. Santamaria, J. C., <strong>Sutton</strong>, R. S., Ram, A., “Experiments with reinforcement learning in<br />

problems with continuous state and action spaces,” Adaptive Behavior, 6, No. 2, 1998,<br />

pp. 163–218.<br />

13. Singh, S., <strong>Sutton</strong>, R. S., “Reinforcement learning with replacing eligibility traces,”<br />

Machine Learning, 22, 1996, pp. 123–158.<br />

14. <strong>Sutton</strong>, R. S., “Learning to predict by the methods <strong>of</strong> temporal differences,” Machine<br />

Learning, 3, 1988, No. 1, pp. 9–44.<br />

15. Moore, J., Desmond, J, Berthier, N., Blazis, D., <strong>Sutton</strong>, R. S., Barto, A. G., “Simulation<br />

<strong>of</strong> the classically conditioned nictitating membrane response by a neuron-like adaptive<br />

element: response topography, neuronal firing, and interstimulus intervals,”<br />

Behavioural Brain Research, 21, 1986, pp. 143–154.<br />

16. Barto, A. G., <strong>Sutton</strong>, R. S., Anderson, C., “Neuron-like adaptive elements that can solve<br />

difficult learning control problems,” IEEE Transactions on Systems, Man, and<br />

Cybernetics, SMC-13, No. 5, 1983, pp. 834–846.<br />

17. Barto, A. G., <strong>Sutton</strong>, R. S., “Simulation <strong>of</strong> anticipatory responses in classical<br />

conditioning by a neuron-like adaptive element,” Behavioral Brain Research, 4, 1982,<br />

pp. 221–235.<br />

9


18. Barto, A. G., Anderson, C., <strong>Sutton</strong>, R. S., “Synthesis <strong>of</strong> nonlinear control surfaces by a<br />

layered associative network,” Biological Cybernetics, 43, 1982, pp. 175–185.<br />

19. <strong>Sutton</strong>, R. S., Barto, A. G., “Toward a modern theory <strong>of</strong> adaptive networks: expectation<br />

and prediction,” Psychological Review, 88, No. 2, 1981, pp. 135–140. Translated into<br />

Spanish by G. Ruiz to appear in the journal Estudios de Psicologia.<br />

20. <strong>Sutton</strong>, R. S., Barto, A. G., “An adaptive network that constructs and uses an internal<br />

model <strong>of</strong> its world,” Cognition and Brain Theory, 4, 1981, pp. 217–246.<br />

21. Barto, A. G., <strong>Sutton</strong>, R. S., Brouwer, P., “Associative search network: a reinforcement<br />

learning associative memory,” Biological Cybernetics, 40, 1981, pp. 201–211.<br />

22. Barto, A., <strong>Sutton</strong>, R. S., “Landmark learning: an illustration <strong>of</strong> associative search,”<br />

Biological Cybernetics, 42, 1981, pp. 1–8.<br />

Book Chapters<br />

23. Stone, P., <strong>Sutton</strong>, R. S., “Keepaway soccer: A machine learning testbed.” In Lecture Notes<br />

in Computer Science 2377, pp. 207–237, Springer, 2002.<br />

24. Stone, P., <strong>Sutton</strong>, R. S., Singh, S., “Reinforcement learning for 3 vs. 2 keepaway.” In<br />

RoboCup-2000: Robot Soccer World Cup IV, P. Stone, T. Balch, G. Kraetszchmar, Eds.<br />

Springer Verlag, 2001.<br />

25. <strong>Sutton</strong>, R. S. “Reinforcement learning.” In The MIT Encyclopedia <strong>of</strong> the Cognitive<br />

Sciences, R. Wilson, F. Keil, Eds., pp. 715–717, MIT Press, 1999.<br />

26. <strong>Sutton</strong>, R. S., “On the significance <strong>of</strong> Markov decision processes.” In Artificial Neural<br />

Networks – ICANN'97, W. Gerstner, A. Germond, M. Hasler, J.-D. Nicoud, Eds., pp.<br />

273–282. Springer, 1997.<br />

27. Barto, A. G., <strong>Sutton</strong>, R. S., “Reinforcement learning in artificial intelligence.” In Neural<br />

Network Models <strong>of</strong> Cognition, Donahoe, J. W., Packard Dorsel, V., Eds., pp. 358–386,<br />

Elsevier, 1997.<br />

28. <strong>Sutton</strong>, R. S., “First results with Dyna: an integrated architecture for learning, planning,<br />

and reacting.” In Neural Networks for Control, Miller, T., <strong>Sutton</strong>, R. S., Werbos, P., Eds.,<br />

pp. 179–189, MIT Press, 1990.<br />

29. <strong>Sutton</strong>, R. S., Barto, A. G., “Time-derivative models <strong>of</strong> pavlovian reinforcement.” In<br />

Learning and Computational Neuroscience, M. Gabriel, J.W. Moore, Eds., pages 497–<br />

537, MIT Press, 1990.<br />

10


30. Barto, A. G., <strong>Sutton</strong>, R. S., Watkins, C., “Learning and sequential decision making.” In<br />

Learning and Computational Neuroscience, M. Gabriel, J.W. Moore, Eds., pages 539–<br />

602, MIT Press, 1990.<br />

31. Selfridge, O., <strong>Sutton</strong>, R. S., Anderson, C., “Selected bibliography on connectionism,” In:<br />

Evolution, Learning, and Cognition, Y.C. Lee, Ed., pp. 391–403, World Scientific<br />

Publishing, 1988.<br />

32. Barto, A., <strong>Sutton</strong>, R. S., “Neural problem solving.” In: Synaptic Modification, Neuron<br />

Selectivity, and Nervous System Organization, W.B. Levy, J.A. Anderson, Eds., pp. 123–<br />

152, Lawrence Erlbaum, 1985.<br />

Refereed Articles in Conference Proceedings<br />

33. Pilarski, P. M., Dawson, M. R., Degris, T.,Fahimi, F., Carey, J. P., <strong>Sutton</strong>, R. S., “Human<br />

training for online myoelectric prosthesis control using actor-critic reinforcement<br />

learning.” In Proceedings <strong>of</strong> the 2011 IEEE International Conference on Rehabilitation<br />

Robotics, Zurich, Switzerland, 7 pages, 2011.<br />

34. <strong>Sutton</strong>, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., Precup, D.,<br />

“Horde: A scalable real-time architecture for learning knowledge from unsupervised<br />

sensorimotor interaction.” In Proceedings <strong>of</strong> the Tenth International Conference on<br />

Autonomous Agents and Multiagent Systems, pp. 761–768, Taipei, Taiwan, 2011.<br />

35. Maei, H. R., Szepesvari, <strong>Cs</strong>., Bhatnagar, S., <strong>Sutton</strong>, R. S., “Toward <strong>of</strong>f-policy learning<br />

control with function approximation.” In Proceedings <strong>of</strong> the 27th International<br />

Conference on Machine Learning, Haifa, Israel, 8 pages, 2010. Omnipress.<br />

36. Maei, H. R., <strong>Sutton</strong>, R. S., “GQ(λ): A general gradient algorithm for temporal-difference<br />

prediction learning with eligibility traces.” In Proceedings <strong>of</strong> the Third Conference on<br />

Artificial General Intelligence, Lugano, Switzerland, 2010.<br />

37. Maei, H. R., Szepesvari, <strong>Cs</strong>., Bhatnagar, S., Precup, D., Silver, D., <strong>Sutton</strong>, R. S.,<br />

“Convergent temporal-difference learning with arbitrary smooth function<br />

approximation.” Advances in Neural Information Processing Systems 22, 9 pages, 2009.<br />

MIT Press.<br />

38. <strong>Sutton</strong>, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, <strong>Cs</strong>.,<br />

Wiewiora, E., “Fast gradient-descent methods for temporal-difference learning with<br />

linear function approximation.” Proceedings <strong>of</strong> the 26th International Conference on<br />

Machine Learning, 2009.<br />

39. <strong>Sutton</strong>, R. S., Szepesvari, <strong>Cs</strong>., Maei, H. R., “A convergent O(n) algorithm for <strong>of</strong>f-policy<br />

temporal-difference learning with linear function approximation,” Advances in Neural<br />

Information Processing Systems 21, 8 pages, 2009. MIT Press.<br />

11


40. Ludvig, E., <strong>Sutton</strong>, R. S., Verbeek, E., Kehoe, E. J., “A computational model <strong>of</strong><br />

hippocampal function in trace conditioning,” Advances in Neural Information Processing<br />

Systems 21, 8 pages, 2009. MIT Press.<br />

41. Cutumisu, M., Szafron, D., Bowling, M., <strong>Sutton</strong> R. S., “Agent learning using actiondependent<br />

learning rates in computer role-playing games,” Proceedings <strong>of</strong> the 4th<br />

Artificial Intelligence and Interactive Digital Entertainment Conference, pp. 22-29, 2008.<br />

42. <strong>Sutton</strong>, R. S., Szepesvari, <strong>Cs</strong>., Geramifard, A., Bowling, M., “Dyna-style planning with<br />

linear function approximation and prioritized sweeping,” Proceedings <strong>of</strong> the 24th<br />

Conference on Uncertainty in Artificial Intelligence, pp. 528–536, 2008.<br />

43. Silver, D., <strong>Sutton</strong>, R. S., Müller, M., “Sample-based learning and search with permanent<br />

and transient memories,” Proceedings <strong>of</strong> the 25th International Conference on Machine<br />

Learning, 2008. (27% Acceptance)<br />

44. Bhatnagar, S., <strong>Sutton</strong>, R. S., Ghavamzadeh, M., Lee, M., “Incremental natural actor-critic<br />

algorithms,” Advances in Neural Information Processing Systems 20, 2008.<br />

45. <strong>Sutton</strong>, R. S., Koop, A., Silver, D., “On the role <strong>of</strong> tracking in stationary environments,”<br />

Proceedings <strong>of</strong> the 24th International Conference on Machine Learning, 2007.<br />

46. Silver, D., <strong>Sutton</strong>, R. S., Müller, M., “Reinforcement learning <strong>of</strong> local shape in the game<br />

<strong>of</strong> Go,” Proceedings <strong>of</strong> the 20th International Joint Conference on Artificial Intelligence,<br />

2007.<br />

47. Geramifard, A., Bowling, M., Zinkevich, M., <strong>Sutton</strong>, R. S., “iLSTD: Eligibility traces and<br />

convergence analysis,” Advances in Neural Information Processing Systems 19, 2007.<br />

48. Geramifard, A., Bowling, M., <strong>Sutton</strong>, R. S., “Incremental least-squares temporal<br />

difference learning,” Proceedings <strong>of</strong> the Twenty-First National Conference on Artificial<br />

Intelligence, pages 356-361, 2006.<br />

49. Precup, D., <strong>Sutton</strong>, R. S., Paduraru, C., Koop, A., Singh, S., “Off-policy learning with<br />

options and recognizers,” Advances in Neural Information Processing Systems 18, 2006.<br />

50. <strong>Sutton</strong>, R. S., Rafols, E. J., Koop, A., “Temporal abstraction in temporal-difference<br />

networks,” Advances in Neural Information Processing Systems 18, 2006.<br />

51. Tanner, B., <strong>Sutton</strong>, R. S., “TD(λ) networks: Temporal-difference networks with eligibility<br />

traces,” Proceedings <strong>of</strong> the 22nd International Conference on Machine Learning<br />

(ICML), Bonn, Germany, 2005. 8 large typeset pages. 27% acceptance rate.<br />

52. Rafols, E. J., Ring, M. B., <strong>Sutton</strong>, R. S., Tanner, B., “Using predictive representations to<br />

improve generalization in reinforcement learning,” Proceedings <strong>of</strong> the 19th International<br />

Joint Conference on Artificial Intelligence, Edinburgh, Scotland, 2005. 6 large typeset<br />

pages. 18% acceptance rate.<br />

12


53. Tanner, B., <strong>Sutton</strong>, R. S., “Temporal-difference networks with history,” Proceedings <strong>of</strong><br />

the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI),<br />

Edinburgh, Scotland, 2005. 6 large typeset pages. 18% acceptance rate.<br />

54. <strong>Sutton</strong>, R. S., Tanner, B., “Temporal-difference networks,” Advances in Neural<br />

Information Processing Systems 17. MIT Press, 2005.<br />

55. Littman, M. L., <strong>Sutton</strong>, R. S., Singh, S., “Predictive representations <strong>of</strong> state,” Advances in<br />

Neural Information Processing Systems 14. MIT Press, 2002.<br />

56. Stone, P., <strong>Sutton</strong>, R. S., “Scaling reinforcement learning toward RoboCup soccer,”<br />

Proceedings <strong>of</strong> the 18th International Conference on Machine Learning, pp. 537–544.<br />

Morgan Kaufmann, 2001.<br />

57. Precup, D., <strong>Sutton</strong>, R. S., Dasgupta, S., “Off-policy temporal-difference learning with<br />

function approximation,” Proceedings <strong>of</strong> the 18th International Conference on Machine<br />

Learning, pp. 417–424. Morgan Kaufmann, 2001.<br />

58. <strong>Sutton</strong>, R. S., McAllester, D., Singh, S., Mansour, Y., “Policy gradient methods for<br />

reinforcement learning with function approximation,” Advances in Neural Information<br />

Processing Systems 12 (Proceedings <strong>of</strong> the 1999 conference), pp. 1057–1063. MIT Press,<br />

2000.<br />

59. Precup, D., <strong>Sutton</strong>, R. S., Singh, S., “Eligibility traces for <strong>of</strong>f-policy policy evaluation,”<br />

Proceedings <strong>of</strong> the 17th International Conference on Machine Learning, pp. 759–766.<br />

Morgan Kaufmann, 2000.<br />

60. <strong>Sutton</strong>, R. S., Singh, S., Precup, D., Ravindran, B., “Improved switching among<br />

temporally abstract actions,” Advances in Neural Information Processing Systems 11<br />

(Proceedings <strong>of</strong> the 1998 conference), pp. 1066–1072. MIT Press, 1999.<br />

61. Moll, R., Barto, A. G., Perkins, T. J., <strong>Sutton</strong>, R. S., “Learning instance-independent value<br />

functions to enhance local search,” Advances in Neural Information Processing Systems<br />

11 (Proceedings <strong>of</strong> the 1998 conference), pp. 1017–1023. MIT Press, 1999.<br />

62. <strong>Sutton</strong>, R. S., Precup, D., Singh, S., “Intra-option learning about temporally abstract<br />

actions,” Proceedings <strong>of</strong> the 15th International Conference on Machine Learning, pp.<br />

556–564. Morgan Kaufmann, 1998.<br />

63. Precup, D., <strong>Sutton</strong>, R. S., Singh, S., “Theoretical results on reinforcement learning with<br />

temporally abstract options,” Machine Learning: ECML-98, Proceedings <strong>of</strong> the 10th<br />

European Conference on Machine Learning, pp. 382–393. Springer Verlag, 1998.<br />

64. Precup, D., <strong>Sutton</strong>, R. S., “Multi-time models for temporally abstract planning,”<br />

Advances in Neural Information Processing Systems 10 (Proceedings <strong>of</strong> the 1997<br />

conference), pp. 1050–1056. MIT Press, 1998.<br />

13


65. Precup, D., <strong>Sutton</strong>, R. S., “Exponentiated gradient methods for reinforcement learning,”<br />

Proceedings <strong>of</strong> the 14th International Conference on Machine Learning, pp. 272-277.<br />

Morgan Kaufmann, 1997.<br />

66. <strong>Sutton</strong>, R. S., “Generalization in reinforcement learning: successful examples using<br />

sparse coarse coding,” Advances in Neural Information Processing Systems 8<br />

(Proceedings <strong>of</strong> the 1995 conference), pp. 1038–1044. MIT Press, 1996.<br />

67. <strong>Sutton</strong>, R. S. “TD models: modeling the world at a mixture <strong>of</strong> time scales,” Proceedings<br />

<strong>of</strong> the Twelfth International Conference on Machine Learning, pp. 531–539. Morgan<br />

Kaufmann, 1995.<br />

68. <strong>Sutton</strong>, R. S., Whitehead, S. “Online learning with random representations,” Proceedings<br />

<strong>of</strong> the Tenth International Conference on Machine Learning, pp. 314–321. Morgan<br />

Kaufmann, 1993.<br />

69. Gluck, M., Glauthier, P., <strong>Sutton</strong>, R. S. “Adaptation <strong>of</strong> cue-specific learning rates in<br />

network models <strong>of</strong> human category learning,” Proceedings <strong>of</strong> the Fourteenth Annual<br />

Conference <strong>of</strong> the Cognitive Science Society, pp. 540–545. Erlbaum, 1992.<br />

70. <strong>Sutton</strong>, R. S. “Adapting bias by gradient descent: an incremental version <strong>of</strong> delta-bardelta,”<br />

Proceedings <strong>of</strong> the Tenth National Conference on Artificial Intelligence, pp. 171–<br />

176. MIT/AAAI Press, 1992.<br />

71. Sanger, T., <strong>Sutton</strong>, R. S., Matheus, C., “Iterative construction <strong>of</strong> sparse polynomial<br />

approximations,” Advances in Neural Information Processing Systems 4, pp. 1064–1071.<br />

Morgan Kaufmann, 1992.<br />

72. <strong>Sutton</strong>, R. S., “Planning by incremental dynamic programming,” Proceedings <strong>of</strong> the<br />

Eighth International Workshop on Machine Learning, pp. 353–357. Morgan Kaufmann,<br />

1991.<br />

73. <strong>Sutton</strong>, R. S., Matheus, C., “Learning polynomial functions by feature construction,”<br />

Proceedings <strong>of</strong> the Eighth International Workshop on Machine Learning, pp. 208–212.<br />

Morgan Kaufmann, 1991.<br />

74. <strong>Sutton</strong>, R. S., Barto, A. G., Williams, R., “Reinforcement learning is direct adaptive<br />

optimal control,” Proceedings <strong>of</strong> the American Control Conference, pp. 2143–2146,<br />

1991. Also published in IEEE Control Systems Magazine 12, No. 2, 19-22.<br />

75. <strong>Sutton</strong>, R. S., “Integrated modeling and control based on reinforcement learning and<br />

dynamic programming,” Advances in Neural Information Processing Systems 3, pp.<br />

471–478. MIT Press, 1991.<br />

76. <strong>Sutton</strong>, R. S., “Reinforcement learning architectures for animats,” Proceedings <strong>of</strong> the<br />

First International Conference on Simulation <strong>of</strong> Adaptive Behavior: From Animals to<br />

Animats, pp. 288–296, 1991.<br />

14


77. <strong>Sutton</strong>, R. S., “Integrated architectures for learning, planning, and reacting based on<br />

approximating dynamic programming,” Proceedings <strong>of</strong> the Seventh International<br />

Conference on Machine Learning, pp. 216–224, 1990. Reprinted in Artificial Neural<br />

Networks: Concepts and Theory, edited by P. Mehra and B. Wah, IEEE Computer Society<br />

Press, 1992.<br />

78. Barto, A. G., <strong>Sutton</strong>, R. S., Watkins, C., “Sequential decision problems and neural<br />

networks,” Advances in Neural Information Processing Systems 2, pp. 686–693. MIT<br />

Press, 1990.<br />

79. <strong>Sutton</strong>, R. S., Barto, A. G., “A temporal-difference model <strong>of</strong> classical conditioning,”<br />

Proceedings <strong>of</strong> the Ninth Annual Conference <strong>of</strong> the Cognitive Science Society, pp. 355–<br />

378, 1987.<br />

80. <strong>Sutton</strong>, R. S., “Two problems with backpropagation and other steepest-descent learning<br />

procedures for networks,” Proceedings <strong>of</strong> the Eighth Annual Conference <strong>of</strong> the Cognitive<br />

Science Society, pp. 823–831, 1986. Reprinted in Artificial Neural Networks: Concepts<br />

and Theory, edited by P. Mehra and B. Wah, IEEE Computer Society Press, 1992.<br />

81. Selfridge, O., <strong>Sutton</strong>, R. S., Barto, A. G., “Training and tracking in robotics,”<br />

Proceedings <strong>of</strong> the Ninth International Joint Conference on Artificial Intelligence, pp.<br />

670–672, 1985.<br />

82. <strong>Sutton</strong>, R. S., Pinette, B., “The learning <strong>of</strong> world models by connectionist networks,”<br />

Proceedings <strong>of</strong> the Seventh Annual Conference <strong>of</strong> the Cognitive Science Society, pp. 54–<br />

64, 1985.<br />

83. Moore, J., Desmond, J., Berthier, N., Blazis, D., <strong>Sutton</strong>, R. S., Barto, A. G.,<br />

“Connectionist learning in real time: <strong>Sutton</strong>-Barto adaptive element and classical<br />

conditioning <strong>of</strong> the nictitating membrane response,” Proceedings <strong>of</strong> the Seventh Annual<br />

Conference <strong>of</strong> the Cognitive Science Society, pp. 318–322, 1985.<br />

Other Articles in Proceedings<br />

84. Modayil, J., Pilarski, P. M., White, A., Degris, T., <strong>Sutton</strong>, R. S., “Off-policy knowledge<br />

maintenance for robots.” Robotics Science and Systems Workshop (Towards Closing the<br />

Loop: Active Learning for Robotics). Extended abstract and poster, 2010.<br />

85. <strong>Sutton</strong>, R. S., “The grand challenge <strong>of</strong> predictive empirical abstract knowledge.” Working<br />

Notes <strong>of</strong> the IJCAI-09 Workshop on Grand Challenges for Reasoning from Experiences,<br />

2009.<br />

86. <strong>Sutton</strong>, R. S., “Open theoretical questions in reinforcement learning,” Proceedings <strong>of</strong> the<br />

Fourth European Conference on Computational Learning Theory, pp. 11–17. Springer-<br />

Verlag, 1999.<br />

15


87. McGovern, A., Precup, D., Ravindran, B., Singh, S., <strong>Sutton</strong>, R. S., “Hierarchical Optimal<br />

Control <strong>of</strong> MDPs,” Proceedings <strong>of</strong> the Tenth Yale Workshop on Adaptive and Learning<br />

Systems, pp. 186–191. Yale <strong>University</strong>, New Haven, CT, 1998.<br />

88. Precup, D., <strong>Sutton</strong>, R. S., “Multi-time models for reinforcement learning,” Proceedings <strong>of</strong><br />

the ICML'97 Workshop on Modeling in Reinforcement Learning, 1997.<br />

89. McGovern, A., <strong>Sutton</strong>, R. S., Fagg, A. H., “Roles <strong>of</strong> macro-actions in accelerating<br />

reinforcement learning,” Proceedings <strong>of</strong> the 1997 Grace Hopper Celebration <strong>of</strong> Women<br />

in Computing, pp. 13–17, 1997.<br />

90. Precup, D., <strong>Sutton</strong>, R. S., Singh, S. P., “Planning with closed-loop macro actions,”<br />

Working notes <strong>of</strong> the 1997 AAAI Fall Symposium on Model-directed Autonomous<br />

Systems, pp. 70–76, 1997.<br />

91. Kuvayev, L., <strong>Sutton</strong>, R. S., “Model-based reinforcement learning with an approximate,<br />

learned model,” Proceedings <strong>of</strong> the Ninth Yale Workshop on Adaptive and Learning<br />

Systems, pp. 101–105. Yale <strong>University</strong>, New Haven, CT, 1996.<br />

92. Mehra, R. K., Ravichandran, B., <strong>Sutton</strong>, R. S., “Adaptive intelligent scheduling for ATM<br />

networks,” Proceedings <strong>of</strong> the Ninth Yale Workshop on Adaptive and Learning Systems,<br />

pp. 106–111. Yale <strong>University</strong>, New Haven, CT, 1996.<br />

93. <strong>Sutton</strong>, R. S., Singh, S. “On step size and bias in temporal-difference learning,”<br />

Proceedings <strong>of</strong> the Eighth Yale Workshop on Adaptive and Learning Systems, pp. 91–96.<br />

Yale <strong>University</strong>, New Haven, CT, 1994.<br />

94. <strong>Sutton</strong>, R. S., “Gain adaptation beats least squares,” Proceedings <strong>of</strong> the Seventh Yale<br />

Workshop on Adaptive and Learning Systems, pp. 161–166. Yale <strong>University</strong>, New<br />

Haven, CT, 1992.<br />

95. <strong>Sutton</strong>, R. S., “Reinforcement learning architectures,” Proceedings ISKIT'92<br />

International Symposium on Neural Information Processing, pp. 211–216. Fukuoka,<br />

Japan, 1992.<br />

96. <strong>Sutton</strong>, R. S., “Dyna, an integrated architecture for learning, planning and reacting,”<br />

Working Notes <strong>of</strong> the 1991 AAAI Spring Symposium on Integrated Intelligent<br />

Architectures and SIGART Bulletin 2, pp. 160-163, 1991.<br />

97. Whitehead, S., <strong>Sutton</strong>, R. S., Ballard, D., “Advances in reinforcement learning and their<br />

implications for intelligent control,” Proceedings <strong>of</strong> the Fifth IEEE International<br />

Symposium on Intelligent Control, pp. 1289–1297, 1990.<br />

98. Anderson, C., Franklin, J., <strong>Sutton</strong>, R. S., “Learning a nonlinear model <strong>of</strong> a manufacturing<br />

process using multilayer connectionist networks,” Proceedings <strong>of</strong> the Fifth IEEE<br />

International Symposium on Intelligent Control, pp. 404–409, 1990.<br />

16


99. <strong>Sutton</strong>, R. S., “Artificial intelligence by dynamic programming,” Proceedings <strong>of</strong> the Sixth<br />

Yale Workshop on Adaptive and Learning Systems, pp. 89–95. Yale <strong>University</strong>, New<br />

Haven, CT, 1990.<br />

100.Franklin, J., <strong>Sutton</strong>, R. S., Anderson, C., Selfridge, O., Schwartz, D., “Connectionist<br />

learning control at GTE laboratories,” Intelligent Control and Adaptive Systems, G.<br />

Rodriguez, Ed., Proc. SPIE 1196, pp. 242–253, 1990.<br />

101.<strong>Sutton</strong>, R. S., “Artificial intelligence as a control problem: comments on the relationship<br />

between machine learning and intelligent control,” Proceedings <strong>of</strong> the IEEE International<br />

Symposium on Intelligent Control 1988, pp. 500–507, 1989.<br />

102.Franklin, J., <strong>Sutton</strong>, R. S., Anderson, C. “Application <strong>of</strong> connectionist learning methods<br />

to manufacturing process monitoring,” Proceedings <strong>of</strong> the IEEE International<br />

Symposium on Intelligent Control 1988, pp. 709–712, 1989.<br />

103.<strong>Sutton</strong>, R. S., “Convergence theory for a new kind <strong>of</strong> prediction learning,” Proceedings<br />

<strong>of</strong> the 1988 Workshop on Computational Learning Theory, pp. 421–422 1988.<br />

104.<strong>Sutton</strong>, R. S., “Learning distributed, searchable, internal models,” Proceedings <strong>of</strong> the<br />

Distributed Artificial Intelligence Workshop, pp. 287–289, 1985.<br />

105.Barto, A. G., <strong>Sutton</strong>, R. S., Anderson, C., “Spatial learning simulation systems,”<br />

Proceedings <strong>of</strong> the 10th IMACS World Congress on Systems Simulation and Scientific<br />

Computation, pp. 204–206, 1982.<br />

Other Publications<br />

106.“Beyond Reward: The Problem <strong>of</strong> Knowledge and Data.” Extended abstract in the<br />

Proceedings <strong>of</strong> the 21st International Conference on Inductive Logic Programming,<br />

Muggleton, S., Watanabe, H., Tamaddoni-Nezhad, A., and Chen, J. Eds. Lecture Notes in<br />

Computer Science, Springer, 2011<br />

107.“Interview with <strong>Richard</strong> S. <strong>Sutton</strong>,” by Verena Heidrich-Meisner. Kuenstliche<br />

Intelligenz. (2009). pp. 41-43.<br />

108.<strong>Sutton</strong>, R. S., “Reinforcement learning: Past, present, and future.” Extended abstract in<br />

Simulated Evolution and Learning, McKay, B., Yao, X., Newton, C. S., Kim, J.-H.,<br />

Furuhashi, T., Eds. Published as Lecture Notes in Computer Science 1585, pp. 195–197,<br />

Springer, 1999.<br />

109.McGovern, A., <strong>Sutton</strong>, R. S., “Macro-actions in reinforcement learning: an empirical<br />

analysis.” Technical Report 98-70, Department <strong>of</strong> Computer Science, <strong>University</strong> <strong>of</strong><br />

Massachusetts, Amherst, MA 01003, 1998.<br />

17


110.Precup, D., <strong>Sutton</strong>, R. S., “Empirical comparison <strong>of</strong> gradient descent and exponentiated<br />

gradient descent in supervised and reinforcement learning.” Technical Report UM-<br />

CS-1996-070, Department <strong>of</strong> Computer Science, <strong>University</strong> <strong>of</strong> Massachusetts, Amherst,<br />

MA 01003, 1996.<br />

111.<strong>Sutton</strong>, R. S., “On the virtues <strong>of</strong> linear learning and trajectory distributions.” Abstract in<br />

Proceedings <strong>of</strong> the Workshop on Value Function Approximation, Conference on Neural<br />

Information Processing Systems, 1995.<br />

112.<strong>Sutton</strong>, R. S., “Machines that learn and mimic the brain.” In ACCESS, GTE's Journal <strong>of</strong><br />

Science and Technology, 1992. Reprinted in Stethoscope Quarterly, Spring 1993.<br />

113.<strong>Sutton</strong>, R. S., “The challenge <strong>of</strong> reinforcement learning.” Introduction to a special issue<br />

on reinforcement learning, Machine Learning 8, No 3/4, pp. 225–227, 1992.<br />

114.<strong>Sutton</strong>, R. S., “Implementation details <strong>of</strong> the TD(lambda) procedure for the case <strong>of</strong><br />

vector predictions and backpropagation.” GTE Laboratories Technical Report<br />

TR87-509.1, GTE Laboratories, 40 Sylvan Road, Waltham, MA 02254, 1989.<br />

115.<strong>Sutton</strong>, R. S., “NADALINE: a normalized adaptive linear element that learns<br />

efficiently.” GTE Laboratories Technical Report TR88-509.4, GTE Laboratories, 40<br />

Sylvan Road, Waltham, MA 02254, 1988.<br />

116.<strong>Sutton</strong>, R. S., “Temporal credit assignment in reinforcement learning,” Ph.D.<br />

dissertation, <strong>University</strong> <strong>of</strong> Massachusetts, Amherst, MA. Published as COINS Technical<br />

Report 84-2, 1984.<br />

117.Barto, A. G., <strong>Sutton</strong>, R. S., “Goal seeking components for adaptive intelligence: an initial<br />

assessment.” Air Force Wright Aeronautical Laboratories Technical Report, AFWAL-<br />

TR-81-1070, Wright-Patterson Air Force Base, Ohio, 1981.<br />

118.<strong>Sutton</strong>, R. S., “Single channel theory: a neuronal theory <strong>of</strong> learning,” Brain Theory<br />

Newsletter 3, No. 3/4, pp. 72–75, 1978. (earliest publication)<br />

Patent<br />

119.<strong>Sutton</strong>, R. S., “Apparatus for Machine Learning.” United States Patent No. 5,946,675,<br />

August 31, 1999. Assigned to GTE Laboratories Incorporated, Waltham, Mass.<br />

18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!