Richard S. Sutton - Webdocs Cs Ualberta - University of Alberta

CURRICULUM VITAE 

Richard S. Sutton 

September, 2011 

Professor, Department of Computing Science, University of Alberta 

address: Athabasca Hall 2-21, University of Alberta, Edmonton, AB T6G 2E8 

telephone: 780-492-4584, fax: 780-492-1111, cell: 780-270-8139 

email: rich@richsutton.com 

web: http://richsutton.com 

Degrees 

Doctor of Philosophy in Computer Science, University of Massachusetts at Amherst, 1984. 

Dissertation: “Temporal credit assignment in reinforcement learning.” 

Master of Science in Computer Science, University of Massachusetts at Amherst, 1980. 

Thesis: “An adaptive network that constructs and uses an internal model of its world.” 

Bachelor of Arts in Psychology, Stanford University, with honors and distinction, 1978. 

Senior thesis: “A unified theory of expectation in classical and instrumental conditioning.” 

Professional Experience 

Full Professor and iCORE Chair, Department of Computing Science, University of Alberta, 

2003–. Leads the Reinforcement Learning and Artificial Intelligence Laboratory of about 40 

graduate students, staff, postdocs, and faculty co-PIs. 

Principal Technical Staff Member, Artificial Intelligence Department, AT&T Shannon 

Laboratory, Florham Park, New Jersey, 1998–2002. 

Senior Research Scientist, Department of Computer Science, University of Massachusetts 

at Amherst, 1995–1998. Co-led (with A. Barto) the Adaptive Networks Laboratory 

comprising approximately 12 graduate students and other researchers. 

Principal Member of Technical Staff, Computer and Intelligent Systems Laboratory, GTE 

Laboratories, Waltham, Massachusetts, 1985–1994. From 1986 led the Connectionist 

Machine Learning Project, a group of 3-6 conducting basic and applied research. 

Senior Postdoctoral Research Associate, Department of Computer Science, University of 

Massachusetts at Amherst, 1984. 

1

Research 

Pioneered and made repeated contributions to reinforcement learning, an approach to 

artificial and natural intelligence that emphasizes learning and planning from sample 

experience. Currently seeking to extend reinforcement learning ideas to an empirically 

grounded approach to knowledge representation. 

Most significant contributions: 

• The theory of temporal-difference learning and the TD(λ) algorithm (1988). 

• The standard textbook for reinforcement learning (with Barto, 1998) 

• The actor-critic (policy gradient) class of algorithms (1984, 2000). 

• The Dyna architecture integrating learning, planning and reacting (1990) 

• Temporal-difference models of animal learning (with Barto, 1981, 1990) 

• The “options” framework for temporal abstraction (with Precup, Singh, 1999) 

• Predictive state representations (with Littman, Singh, 2002) 

• Algorithms for online step-size adaptation (1981, 1992) 

• Temporal-difference networks for grounded knowledge representation (2005, 2006) 

• Gradient temporal-difference algorithms (with Maei, Szepesvari, 2008–) 

Selected Grants 

NSERC Collaborative Research and Development Grant, with Nortel Networks and Bell 

Canada, “Learning and Prediction in High-dimensional Stochastic Domains,” September 

2006 – August 2009, funded at CAD$186,523. One of five principal investigators. 

iCORE Chair and Professorship Establishment Grant, “Reinforcement Learning and 

Artificial Intelligence,” September 1, 2003 – August 31, 2008, funded at CAD$3,000,000. 

Principal investigator. Renewed until August 2013 at an additional CAD$2,750,000. 

Alberta Ingenuity Centre Grant, “Alberta Ingenuity Centre for Machine Learning,” April 

2003 – March 2008, funded at CAD$9,887,600. One of eight principal investigators. 

Renewed until March 2009 at CAD$2,000,000. Renewed again in 2009 for another five 

years at CAD$10,000,000. 

NSERC Discovery Grant, “Reinforcement Learning and Artificial Intelligence,” April, 2004 

– March 2009, funded at CAD$250,000. Principal investigator. Renewed in 2009 for a 

second five years at an additional CAD$190,000. 

Air Force Office of Scientific Research to the University of Massachusetts, “Stochastic 

Scheduling and Planning Using Reinforcement Learning,” AFOSR Grant Number 

F49620-96-1-0254, June 1, 1996 – May 31, 2000, funded at USD$446,570. Coprincipal 

investigator with A. Barto. 

2

National Science Foundation to the University of Massachusetts, “Multiple Time Scale 

Reinforcement Learning,” Communications & Computational Systems/Neuroengineering 

Grant ECS-9511805, September 15, 1995 – August 31, 1998, funded at USD$157,261. 

Primary senior personnel (A. Barto, PI). 

Teaching Experience 

Courses 

Intelligent Systems, CMPUT 366, an introduction to artificial intelligence for 

undergraduates at the University of Alberta, 2008–2010 

Reinforcement Learning for Artificial Intelligence, CMPUT 499/609, cross-listed 

undergraduate/graduate course at the University of Alberta, 2003–2007, 2009-10 

Non-procedural Programming Languages, CMPUT 325, undergraduate course at the 

University of Alberta, Fall 2006 

Reinforcement Learning in Practice, CMPUT 607, graduate course at the University of 

Alberta, Spring 2005 

Reinforcement Learning, CMPSCI 791O, graduate course at the University of 

Massachusetts, with A. Barto, Fall 1996 and 1997 

Reinforcement Learning, special intensive course at the University of Uppsala, Sweden, 

April–May 1996 

Reinforcement Learning, CMPSCI 791N, graduate seminar at the University of 

Massachusetts, with A. Barto, Fall 1995 

Cybernetics of Adaptation and Learning, COINS 791M, graduate seminar, teaching 

assistant including some lecturing, Fall 1981 

Summer Schools 

Instructor at Machine Learning Summer School, Ile de Rey, France, 2008 

Instructor at Cambridge University Neural Networks Summer School (three lectures each 

summer), 1993–1997 

Instructor at Cold Spring Harbor Summer School on Computational Neuroscience: 

Learning and Memory, July 1990 

3

Tutorials on Reinforcement Learning at Conferences 

Conference of the Association for the Advancement of Artificial Intelligence, 2010 

International Conference on Machine Learning, 1999 

Genetic Programming Conference, 1998 

Second Asia-Pacific Conference on Simulated Evolution and Learning, Australia, 1998 

National Conference on Artificial Intelligence, with L. Kaelbling, 1997 

National Conference on Computer Science – Mexico, 1997 

Conference on Neural Information Processing Systems, 1996 

Scholarly Service 

Meetings Organized 

Co-chair, Multidisciplinary Symposium on Reinforcement Learning, 2009 

Program co-chair, National Conference on Artificial Intelligence, 2002 

Chair, Workshop on Reinforcement Learning, Int’l. Conference on Machine Learning, 1993 

Co-chair, Bellairs Workshop on Reinforcement Learning, 2005-6, 2008-11 

Co-chair, Workshop on Reinforcement Learning–Benchmarks and Bakeoffs, International 

Conference on Neural Information Processing Systems, 2004 

Co-chair, Workshop on Predictive Representation of World Knowledge, International 

Conference on Machine Learning, 2004 

Co-chair, NSF Workshop on Neural Networks for Robotics and Control, 1988 

Editorial 

Foundations and Trends in Machine Learning, founding editorial board, 2007– 

Journal of Machine Learning Research: advisory board 1999– 

J. of Artificial Intelligence Research: action editor 1993-1996, advisory board 1996–2000 

Machine Learning: action editor 1989–94, editorial board 1994–2001 

Connection Science: action editor 1989–90, editorial board 1990–1995 

Adaptive Behavior: editorial board 1991–2000 

Guest Editor, special double issue of Machine Learning on Reinforcement Learning, 1992 

Conference Program Committees 

International Conference on Machine Learning, 1988–01,03,04,10, area chair 2005–6,8 

Conf. on Neural Information Processing Systems, 1989–2000, 2003–2010, area chair 1997 

Conference on Uncertainty in Artificial Intelligence, 2005, 2009 

International Joint Conference on Autonomous Agents and Multi-Agent Systems, 2005 

AAAI, 1993–1996, 2007, 2008, senior program committee 1998–2000 

International Joint Conference on Artificial Intelligence, 1995, 1997, 2001, 2005 

International Conference on the Simulation of Adaptive Behavior, 1990, 1992 

Cognitive Science Conference, 1986, 1987 

Conference on Computational Learning Theory, 1996 

Canadian Conference on Artificial Intelligence, 2006 

4

Honors 

iCORE chair, 2003; renewed 2008 

Fellow of the American Association for Artificial Intelligence, 2001 

Elected member of the executive council of the American Association for Artificial 

Intelligence, 2007–2010 

President's Award, International Neural Network Society (INNS), 2003 

Warner award, GTE’s highest award for technical achievement contributing to business 

success, 1994, for leading a team that developed a method for adaptive channel assignment 

that improved call-carrying capacity in a detailed simulation of the GTE Airfone network 

by over 40%. 

Outstanding Leadership in Alberta Technology Award from the Alberta Science and 

Technology Leadership Foundation, awarded in 2006 to the Alberta Ingenuity Center for 

Machine Learning. (One of eight principal investigators.) 

Students 

Currently Under Supervision at the University of Alberta 

Adam White, PhD 

Brian Tanner, PhD 

Marc Gendron-Bellemare, PhD 

Hamid Reza Maei, PhD 

Anna Koop, PhD 

Hengshuai Yao, MSc 

Leah Hackman, MSc 

Ashique Mahmood, PhD 

Neda Mirian HosseinAbadi, MSc 

Doctoral Graduates Supervised and Graduated 

Doina Precup, “Temporal abstraction in reinforcement learning,” University of 

Massachusetts, 2000. Currently an associate professor of Computer Science at McGill 

University. 

David Silver, “Reinforcement Learning and Simulation-Based Search in Computer Go,” 

University of Alberta, 2009. Currently a postdoctoral fellow at MIT. 

5

Masters Graduates Supervised and Graduated 

Michael Delp, MSc 

Ashique Mahmood, MSc 

Masoud Shahamiri, University of Alberta, 2008 

Anna Koop, University of Alberta 2007 

Eddie Rafols, University of Alberta 2006 

Adam White, University of Alberta 2006 

Alborz Geramifard, University of Alberta 2006 

Cosmin Paduraru, University of Alberta 2006 

Brian Tanner, University of Alberta 2005 

Amy McGovern, University of Massachusetts 1998 

Doina Precup, University of Massachusetts 1996 

Doctoral Examination Committees 

Hado van Hasselt, “Insights in Reinforcement Learning,” SIKS, Dutch Research School for 

Information and Knowledge Systems, 2011. 

Tao Wang, “New Representations and Approximations for Sequential Decision 

Making under Uncertainty,” University of Alberta, 2007 

Mohammad Al-Ansari, Northeastern University, 2001 

Manfred Huber, “A hybrid architecture for adaptive robot control,” University of 

Massachusetts, 2000 

Gavin Rummery, “Problem solving with reinforcement learning,” Cambridge University, 

1995 

Chen Tham, “Online function approximation for scaling up reinforcement learning,” 

Cambridge University, 1994 

Jing Peng, “Efficient dynamic programming based learning for control,” Northeastern 

University, 1994 

Satinder Singh, “Learning to solve Markovian decision processes,” University of 

Massachusetts, 1993 

Long-Ji Lin, “Reinforcement learning for robots using neural networks,” Carnegie-Mellon 

University, 1992 

Steve Whitehead, “Reinforcement learning for the adaptive control of perception and 

action,” University of Rochester, 1992 

Leslie Kaelbling, “Learning in embedded systems,” Stanford University, 1990 

6

Selected Invited Presentations 

“Learning about Sensorimotor data,” Posner (invited plenary) talk at the Twenty-Fifth 

Annual Conference on Neural Information Processing Systems, Granada, Spain, 2011, and 

keynote at the 21st Annual Conference of the Japanese Neural Network Society, Okinawa, 

Japan. 

“Beyond Reward: The Problem of Knowledge and Data,” Keynote at the 21st International 

Conference on Inductive Logic Programming, Windsor Great Park, United Kingdom, 2011 

“Toward Learning Human-level Predictive Knowledge,” Keynote at the Third Conference 

on Artificial General Intelligence, Lugano, Switzerland, 2010 

“Mind and Time: A View of Constructivist Reinforcement Learning,” invited plenary talk at 

the European Workshop on Reinforcement Learning, Lille, France, 2008 

“Stimulus Representation in Temporal-difference Models of the Dopamine System,” 

Computation and Neural Systems seminar series on the California Institute of Technology, 

2007 

“Reinforcement Learning’s Computational Theory of Mind,” plenary invited talk at a special 

Workshop on Dopamine and its Role in Learning, Motivation, and Psychiatric Disorders, 

McGill University, 2005 

“Experience-Oriented Artificial Intelligence,” Distinguished Lecture Series of the 

Mathematics and Computer Science Department of the University of Lethbridge, 2005 

“Experience-Oriented Artificial Intelligence,” Distinguished Lecture Series of the Cognitive 

Science Department of the University of California, San Diego, 2005 

“Grounding Knowledge in Subjective Experience” plenary provocateur talk, Second 

Cognitive Systems Conference, 2005 

“Reinforcement Learning: How Far Can It Go” joint plenary at the (co-located) 

International Conference on Machine Learning, Conference on Computational Learning 

Theory, and Conference on Uncertainty in Artificial Intelligence, 1998 

“Reinforcement Learning: Past, Present, and Future,” plenary at the Second Asia-Pacific 

Conference on Simulated Evolution and Learning, 1998, Australia 

“Between MDPs and Semi-MDPs: Learning, Planning and Representing Knowledge at 

Multiple Temporal Scales,” 1998, University Simon Bolivar, Caracas, Venezuela 

7

“Computational Reinforcement Learning,” plenary at the Annual Meeting of the Society for 

the Quantitative Analysis of Behavior, 1998 

“On the Significance of Markov Decision Processes,” plenary at the European Conference 

on Computational Learning Theory, 1997, Germany 

“Reinforcement Learning: Lessons for AI,” International Joint Conference on Artificial 

Intelligence, 1997, Japan 

“Markov Decision Processes: Lessons from Reinforcement Learning,” plenary at the 

International Conference on Artificial Neural Networks, 1997, Switzerland 

“Learning from Interaction,” keynote address, National Conference on Computer Science – 

Mexico, 1997, Mexico 

“The Value of Reinforcement Learning in Neurobiology,” Gottingen Neurobiology 

Conference, 1995, Germany 

“Temporal-Difference Learning,” National Conference on Artificial Intelligence, 1993 

“Temporal-Difference Learning,” plenary at the IEEE International Conference on Neural 

Networks, 1993 

“Reinforcement Learning Architectures,” International Symposium on Neural Information 

Processing, 1992, Kyushu, Japan 

Publications 

Books 

1. Sutton, R. S., Barto, A. G., Reinforcement Learning: An Introduction. MIT Press, 1998. 

Also translated into Japanese. 

2. Miller, W. T., Sutton, R. S., Werbos, P. J. (Eds.), Neural Networks for Control. MIT 

Press, 1991. 

3. Sutton, R. S. (Ed.), Reinforcement Learning. Reprinting of a special issue of Machine 

Learning Journal. Kluwer Academic Press, 1992 

Journal Articles 

4. Kehoe, E. J., Ludvig, E. A., Sutton, R. S., “Timing in trace conditioning of the 

nictitating membrane response of the rabbit (Oryctolagus cuniculus): Scalar, nonscalar, 

and adaptive features,” Learning and Memory 17:600-604, 2010. 

8

5. Kehoe, E. J., Ludvig, E. A., Sutton, R. S., “Magnitude and timing of conditioned 

responses in delay and trace classical conditioning of the nictitating membrane response 

of the rabbit (Oryctolagus cuniculus),” Behavioral Neuroscience 123(5):1095–1101, 

2009. 

6. Kehoe, E. J., Olsen, K. N., Ludvig, E. A., Sutton, R. S., “Scalar timing varies with 

response magnitude in classical conditioning of the rabbit nictitating membrane 

response (Oryctolagus cuniculus),” Behavioral Neuroscience 123, 212–217, 2009. 

7. Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., Lee, M., “Natural actor–critic 

algorithms,” Automatica 45(11):2471–2482, 2009. 

8. Ludvig, E. A., Sutton, R. S., Kehoe, E. J., “Stimulus representation and the timing of 

reward-prediction errors in models of the dopamine system,” Neural Computation 

20:3034–3054, 2008. 

9. Kehoe, E. J., Ludvig, E. A., Dudeney, J. E., Neufeld, J., Sutton, R. S., “Magnitude and 

timing of nictitating membrane movements during classical conditioning of the rabbit 

(Oryctolagus cuniculus),” Behavioral Neuroscience 122(2):471–476, 2008. 

10. Stone, P., Sutton, R. S., Kuhlmann, G., “Reinforcement Learning for RoboCup-Soccer 

Keepaway.” Adaptive Behavior 13(3):165–188, 2005. 

11. Sutton, R. S., Precup, D., Singh, S., “Between MDPs and semi-MDPs: a framework for 

temporal abstraction in reinforcement learning,” Artificial Intelligence, 112, 1999. pp. 

181–211. 

12. Santamaria, J. C., Sutton, R. S., Ram, A., “Experiments with reinforcement learning in 

problems with continuous state and action spaces,” Adaptive Behavior, 6, No. 2, 1998, 

pp. 163–218. 

13. Singh, S., Sutton, R. S., “Reinforcement learning with replacing eligibility traces,” 

Machine Learning, 22, 1996, pp. 123–158. 

14. Sutton, R. S., “Learning to predict by the methods of temporal differences,” Machine 

Learning, 3, 1988, No. 1, pp. 9–44. 

15. Moore, J., Desmond, J, Berthier, N., Blazis, D., Sutton, R. S., Barto, A. G., “Simulation 

of the classically conditioned nictitating membrane response by a neuron-like adaptive 

element: response topography, neuronal firing, and interstimulus intervals,” 

Behavioural Brain Research, 21, 1986, pp. 143–154. 

16. Barto, A. G., Sutton, R. S., Anderson, C., “Neuron-like adaptive elements that can solve 

difficult learning control problems,” IEEE Transactions on Systems, Man, and 

Cybernetics, SMC-13, No. 5, 1983, pp. 834–846. 

17. Barto, A. G., Sutton, R. S., “Simulation of anticipatory responses in classical 

conditioning by a neuron-like adaptive element,” Behavioral Brain Research, 4, 1982, 

pp. 221–235. 

9

18. Barto, A. G., Anderson, C., Sutton, R. S., “Synthesis of nonlinear control surfaces by a 

layered associative network,” Biological Cybernetics, 43, 1982, pp. 175–185. 

19. Sutton, R. S., Barto, A. G., “Toward a modern theory of adaptive networks: expectation 

and prediction,” Psychological Review, 88, No. 2, 1981, pp. 135–140. Translated into 

Spanish by G. Ruiz to appear in the journal Estudios de Psicologia. 

20. Sutton, R. S., Barto, A. G., “An adaptive network that constructs and uses an internal 

model of its world,” Cognition and Brain Theory, 4, 1981, pp. 217–246. 

21. Barto, A. G., Sutton, R. S., Brouwer, P., “Associative search network: a reinforcement 

learning associative memory,” Biological Cybernetics, 40, 1981, pp. 201–211. 

22. Barto, A., Sutton, R. S., “Landmark learning: an illustration of associative search,” 

Biological Cybernetics, 42, 1981, pp. 1–8. 

Book Chapters 

23. Stone, P., Sutton, R. S., “Keepaway soccer: A machine learning testbed.” In Lecture Notes 

in Computer Science 2377, pp. 207–237, Springer, 2002. 

24. Stone, P., Sutton, R. S., Singh, S., “Reinforcement learning for 3 vs. 2 keepaway.” In 

RoboCup-2000: Robot Soccer World Cup IV, P. Stone, T. Balch, G. Kraetszchmar, Eds. 

Springer Verlag, 2001. 

25. Sutton, R. S. “Reinforcement learning.” In The MIT Encyclopedia of the Cognitive 

Sciences, R. Wilson, F. Keil, Eds., pp. 715–717, MIT Press, 1999. 

26. Sutton, R. S., “On the significance of Markov decision processes.” In Artificial Neural 

Networks – ICANN'97, W. Gerstner, A. Germond, M. Hasler, J.-D. Nicoud, Eds., pp. 

273–282. Springer, 1997. 

27. Barto, A. G., Sutton, R. S., “Reinforcement learning in artificial intelligence.” In Neural 

Network Models of Cognition, Donahoe, J. W., Packard Dorsel, V., Eds., pp. 358–386, 

Elsevier, 1997. 

28. Sutton, R. S., “First results with Dyna: an integrated architecture for learning, planning, 

and reacting.” In Neural Networks for Control, Miller, T., Sutton, R. S., Werbos, P., Eds., 

pp. 179–189, MIT Press, 1990. 

29. Sutton, R. S., Barto, A. G., “Time-derivative models of pavlovian reinforcement.” In 

Learning and Computational Neuroscience, M. Gabriel, J.W. Moore, Eds., pages 497– 

537, MIT Press, 1990. 

10

30. Barto, A. G., Sutton, R. S., Watkins, C., “Learning and sequential decision making.” In 

Learning and Computational Neuroscience, M. Gabriel, J.W. Moore, Eds., pages 539– 

602, MIT Press, 1990. 

31. Selfridge, O., Sutton, R. S., Anderson, C., “Selected bibliography on connectionism,” In: 

Evolution, Learning, and Cognition, Y.C. Lee, Ed., pp. 391–403, World Scientific 

Publishing, 1988. 

32. Barto, A., Sutton, R. S., “Neural problem solving.” In: Synaptic Modification, Neuron 

Selectivity, and Nervous System Organization, W.B. Levy, J.A. Anderson, Eds., pp. 123– 

152, Lawrence Erlbaum, 1985. 

Refereed Articles in Conference Proceedings 

33. Pilarski, P. M., Dawson, M. R., Degris, T.,Fahimi, F., Carey, J. P., Sutton, R. S., “Human 

training for online myoelectric prosthesis control using actor-critic reinforcement 

learning.” In Proceedings of the 2011 IEEE International Conference on Rehabilitation 

Robotics, Zurich, Switzerland, 7 pages, 2011. 

34. Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., Precup, D., 

“Horde: A scalable real-time architecture for learning knowledge from unsupervised 

sensorimotor interaction.” In Proceedings of the Tenth International Conference on 

Autonomous Agents and Multiagent Systems, pp. 761–768, Taipei, Taiwan, 2011. 

35. Maei, H. R., Szepesvari, Cs., Bhatnagar, S., Sutton, R. S., “Toward off-policy learning 

control with function approximation.” In Proceedings of the 27th International 

Conference on Machine Learning, Haifa, Israel, 8 pages, 2010. Omnipress. 

36. Maei, H. R., Sutton, R. S., “GQ(λ): A general gradient algorithm for temporal-difference 

prediction learning with eligibility traces.” In Proceedings of the Third Conference on 

Artificial General Intelligence, Lugano, Switzerland, 2010. 

37. Maei, H. R., Szepesvari, Cs., Bhatnagar, S., Precup, D., Silver, D., Sutton, R. S., 

“Convergent temporal-difference learning with arbitrary smooth function 

approximation.” Advances in Neural Information Processing Systems 22, 9 pages, 2009. 

MIT Press. 

38. Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, Cs., 

Wiewiora, E., “Fast gradient-descent methods for temporal-difference learning with 

linear function approximation.” Proceedings of the 26th International Conference on 

Machine Learning, 2009. 

39. Sutton, R. S., Szepesvari, Cs., Maei, H. R., “A convergent O(n) algorithm for off-policy 

temporal-difference learning with linear function approximation,” Advances in Neural 

Information Processing Systems 21, 8 pages, 2009. MIT Press. 

11

40. Ludvig, E., Sutton, R. S., Verbeek, E., Kehoe, E. J., “A computational model of 

hippocampal function in trace conditioning,” Advances in Neural Information Processing 

Systems 21, 8 pages, 2009. MIT Press. 

41. Cutumisu, M., Szafron, D., Bowling, M., Sutton R. S., “Agent learning using actiondependent 

learning rates in computer role-playing games,” Proceedings of the 4th 

Artificial Intelligence and Interactive Digital Entertainment Conference, pp. 22-29, 2008. 

42. Sutton, R. S., Szepesvari, Cs., Geramifard, A., Bowling, M., “Dyna-style planning with 

linear function approximation and prioritized sweeping,” Proceedings of the 24th 

Conference on Uncertainty in Artificial Intelligence, pp. 528–536, 2008. 

43. Silver, D., Sutton, R. S., Müller, M., “Sample-based learning and search with permanent 

and transient memories,” Proceedings of the 25th International Conference on Machine 

Learning, 2008. (27% Acceptance) 

44. Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., Lee, M., “Incremental natural actor-critic 

algorithms,” Advances in Neural Information Processing Systems 20, 2008. 

45. Sutton, R. S., Koop, A., Silver, D., “On the role of tracking in stationary environments,” 

Proceedings of the 24th International Conference on Machine Learning, 2007. 

46. Silver, D., Sutton, R. S., Müller, M., “Reinforcement learning of local shape in the game 

of Go,” Proceedings of the 20th International Joint Conference on Artificial Intelligence, 

2007. 

47. Geramifard, A., Bowling, M., Zinkevich, M., Sutton, R. S., “iLSTD: Eligibility traces and 

convergence analysis,” Advances in Neural Information Processing Systems 19, 2007. 

48. Geramifard, A., Bowling, M., Sutton, R. S., “Incremental least-squares temporal 

difference learning,” Proceedings of the Twenty-First National Conference on Artificial 

Intelligence, pages 356-361, 2006. 

49. Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S., “Off-policy learning with 

options and recognizers,” Advances in Neural Information Processing Systems 18, 2006. 

50. Sutton, R. S., Rafols, E. J., Koop, A., “Temporal abstraction in temporal-difference 

networks,” Advances in Neural Information Processing Systems 18, 2006. 

51. Tanner, B., Sutton, R. S., “TD(λ) networks: Temporal-difference networks with eligibility 

traces,” Proceedings of the 22nd International Conference on Machine Learning 

(ICML), Bonn, Germany, 2005. 8 large typeset pages. 27% acceptance rate. 

52. Rafols, E. J., Ring, M. B., Sutton, R. S., Tanner, B., “Using predictive representations to 

improve generalization in reinforcement learning,” Proceedings of the 19th International 

Joint Conference on Artificial Intelligence, Edinburgh, Scotland, 2005. 6 large typeset 

pages. 18% acceptance rate. 

12

53. Tanner, B., Sutton, R. S., “Temporal-difference networks with history,” Proceedings of 

the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI), 

Edinburgh, Scotland, 2005. 6 large typeset pages. 18% acceptance rate. 

54. Sutton, R. S., Tanner, B., “Temporal-difference networks,” Advances in Neural 

Information Processing Systems 17. MIT Press, 2005. 

55. Littman, M. L., Sutton, R. S., Singh, S., “Predictive representations of state,” Advances in 

Neural Information Processing Systems 14. MIT Press, 2002. 

56. Stone, P., Sutton, R. S., “Scaling reinforcement learning toward RoboCup soccer,” 

Proceedings of the 18th International Conference on Machine Learning, pp. 537–544. 

Morgan Kaufmann, 2001. 

57. Precup, D., Sutton, R. S., Dasgupta, S., “Off-policy temporal-difference learning with 

function approximation,” Proceedings of the 18th International Conference on Machine 

Learning, pp. 417–424. Morgan Kaufmann, 2001. 

58. Sutton, R. S., McAllester, D., Singh, S., Mansour, Y., “Policy gradient methods for 

reinforcement learning with function approximation,” Advances in Neural Information 

Processing Systems 12 (Proceedings of the 1999 conference), pp. 1057–1063. MIT Press, 

2000. 

59. Precup, D., Sutton, R. S., Singh, S., “Eligibility traces for off-policy policy evaluation,” 

Proceedings of the 17th International Conference on Machine Learning, pp. 759–766. 


60. Sutton, R. S., Singh, S., Precup, D., Ravindran, B., “Improved switching among 

temporally abstract actions,” Advances in Neural Information Processing Systems 11 

(Proceedings of the 1998 conference), pp. 1066–1072. MIT Press, 1999. 

61. Moll, R., Barto, A. G., Perkins, T. J., Sutton, R. S., “Learning instance-independent value 

functions to enhance local search,” Advances in Neural Information Processing Systems 

11 (Proceedings of the 1998 conference), pp. 1017–1023. MIT Press, 1999. 

62. Sutton, R. S., Precup, D., Singh, S., “Intra-option learning about temporally abstract 

actions,” Proceedings of the 15th International Conference on Machine Learning, pp. 

556–564. Morgan Kaufmann, 1998. 

63. Precup, D., Sutton, R. S., Singh, S., “Theoretical results on reinforcement learning with 

temporally abstract options,” Machine Learning: ECML-98, Proceedings of the 10th 

European Conference on Machine Learning, pp. 382–393. Springer Verlag, 1998. 

64. Precup, D., Sutton, R. S., “Multi-time models for temporally abstract planning,” 

Advances in Neural Information Processing Systems 10 (Proceedings of the 1997 

conference), pp. 1050–1056. MIT Press, 1998. 

13

65. Precup, D., Sutton, R. S., “Exponentiated gradient methods for reinforcement learning,” 

Proceedings of the 14th International Conference on Machine Learning, pp. 272-277. 


66. Sutton, R. S., “Generalization in reinforcement learning: successful examples using 

sparse coarse coding,” Advances in Neural Information Processing Systems 8 

(Proceedings of the 1995 conference), pp. 1038–1044. MIT Press, 1996. 

67. Sutton, R. S. “TD models: modeling the world at a mixture of time scales,” Proceedings 

of the Twelfth International Conference on Machine Learning, pp. 531–539. Morgan 

Kaufmann, 1995. 

68. Sutton, R. S., Whitehead, S. “Online learning with random representations,” Proceedings 

of the Tenth International Conference on Machine Learning, pp. 314–321. Morgan 

Kaufmann, 1993. 

69. Gluck, M., Glauthier, P., Sutton, R. S. “Adaptation of cue-specific learning rates in 

network models of human category learning,” Proceedings of the Fourteenth Annual 

Conference of the Cognitive Science Society, pp. 540–545. Erlbaum, 1992. 

70. Sutton, R. S. “Adapting bias by gradient descent: an incremental version of delta-bardelta,” 

Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 171– 

176. MIT/AAAI Press, 1992. 

71. Sanger, T., Sutton, R. S., Matheus, C., “Iterative construction of sparse polynomial 

approximations,” Advances in Neural Information Processing Systems 4, pp. 1064–1071. 


72. Sutton, R. S., “Planning by incremental dynamic programming,” Proceedings of the 

Eighth International Workshop on Machine Learning, pp. 353–357. Morgan Kaufmann, 

1991. 

73. Sutton, R. S., Matheus, C., “Learning polynomial functions by feature construction,” 

Proceedings of the Eighth International Workshop on Machine Learning, pp. 208–212. 


74. Sutton, R. S., Barto, A. G., Williams, R., “Reinforcement learning is direct adaptive 

optimal control,” Proceedings of the American Control Conference, pp. 2143–2146, 

1991. Also published in IEEE Control Systems Magazine 12, No. 2, 19-22. 

75. Sutton, R. S., “Integrated modeling and control based on reinforcement learning and 

dynamic programming,” Advances in Neural Information Processing Systems 3, pp. 

471–478. MIT Press, 1991. 

76. Sutton, R. S., “Reinforcement learning architectures for animats,” Proceedings of the 

First International Conference on Simulation of Adaptive Behavior: From Animals to 

Animats, pp. 288–296, 1991. 

14

77. Sutton, R. S., “Integrated architectures for learning, planning, and reacting based on 

approximating dynamic programming,” Proceedings of the Seventh International 

Conference on Machine Learning, pp. 216–224, 1990. Reprinted in Artificial Neural 

Networks: Concepts and Theory, edited by P. Mehra and B. Wah, IEEE Computer Society 

Press, 1992. 

78. Barto, A. G., Sutton, R. S., Watkins, C., “Sequential decision problems and neural 

networks,” Advances in Neural Information Processing Systems 2, pp. 686–693. MIT 

Press, 1990. 

79. Sutton, R. S., Barto, A. G., “A temporal-difference model of classical conditioning,” 

Proceedings of the Ninth Annual Conference of the Cognitive Science Society, pp. 355– 

378, 1987. 

80. Sutton, R. S., “Two problems with backpropagation and other steepest-descent learning 

procedures for networks,” Proceedings of the Eighth Annual Conference of the Cognitive 

Science Society, pp. 823–831, 1986. Reprinted in Artificial Neural Networks: Concepts 

and Theory, edited by P. Mehra and B. Wah, IEEE Computer Society Press, 1992. 

81. Selfridge, O., Sutton, R. S., Barto, A. G., “Training and tracking in robotics,” 

Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pp. 

670–672, 1985. 

82. Sutton, R. S., Pinette, B., “The learning of world models by connectionist networks,” 

Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pp. 54– 

64, 1985. 

83. Moore, J., Desmond, J., Berthier, N., Blazis, D., Sutton, R. S., Barto, A. G., 

“Connectionist learning in real time: Sutton-Barto adaptive element and classical 

conditioning of the nictitating membrane response,” Proceedings of the Seventh Annual 

Conference of the Cognitive Science Society, pp. 318–322, 1985. 

Other Articles in Proceedings 

84. Modayil, J., Pilarski, P. M., White, A., Degris, T., Sutton, R. S., “Off-policy knowledge 

maintenance for robots.” Robotics Science and Systems Workshop (Towards Closing the 

Loop: Active Learning for Robotics). Extended abstract and poster, 2010. 

85. Sutton, R. S., “The grand challenge of predictive empirical abstract knowledge.” Working 

Notes of the IJCAI-09 Workshop on Grand Challenges for Reasoning from Experiences, 

2009. 

86. Sutton, R. S., “Open theoretical questions in reinforcement learning,” Proceedings of the 

Fourth European Conference on Computational Learning Theory, pp. 11–17. Springer- 

Verlag, 1999. 

15

87. McGovern, A., Precup, D., Ravindran, B., Singh, S., Sutton, R. S., “Hierarchical Optimal 

Control of MDPs,” Proceedings of the Tenth Yale Workshop on Adaptive and Learning 

Systems, pp. 186–191. Yale University, New Haven, CT, 1998. 

88. Precup, D., Sutton, R. S., “Multi-time models for reinforcement learning,” Proceedings of 

the ICML'97 Workshop on Modeling in Reinforcement Learning, 1997. 

89. McGovern, A., Sutton, R. S., Fagg, A. H., “Roles of macro-actions in accelerating 

reinforcement learning,” Proceedings of the 1997 Grace Hopper Celebration of Women 

in Computing, pp. 13–17, 1997. 

90. Precup, D., Sutton, R. S., Singh, S. P., “Planning with closed-loop macro actions,” 

Working notes of the 1997 AAAI Fall Symposium on Model-directed Autonomous 

Systems, pp. 70–76, 1997. 

91. Kuvayev, L., Sutton, R. S., “Model-based reinforcement learning with an approximate, 

learned model,” Proceedings of the Ninth Yale Workshop on Adaptive and Learning 

Systems, pp. 101–105. Yale University, New Haven, CT, 1996. 

92. Mehra, R. K., Ravichandran, B., Sutton, R. S., “Adaptive intelligent scheduling for ATM 

networks,” Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems, 

pp. 106–111. Yale University, New Haven, CT, 1996. 

93. Sutton, R. S., Singh, S. “On step size and bias in temporal-difference learning,” 

Proceedings of the Eighth Yale Workshop on Adaptive and Learning Systems, pp. 91–96. 

Yale University, New Haven, CT, 1994. 

94. Sutton, R. S., “Gain adaptation beats least squares,” Proceedings of the Seventh Yale 

Workshop on Adaptive and Learning Systems, pp. 161–166. Yale University, New 

Haven, CT, 1992. 

95. Sutton, R. S., “Reinforcement learning architectures,” Proceedings ISKIT'92 

International Symposium on Neural Information Processing, pp. 211–216. Fukuoka, 

Japan, 1992. 

96. Sutton, R. S., “Dyna, an integrated architecture for learning, planning and reacting,” 

Working Notes of the 1991 AAAI Spring Symposium on Integrated Intelligent 

Architectures and SIGART Bulletin 2, pp. 160-163, 1991. 

97. Whitehead, S., Sutton, R. S., Ballard, D., “Advances in reinforcement learning and their 

implications for intelligent control,” Proceedings of the Fifth IEEE International 

Symposium on Intelligent Control, pp. 1289–1297, 1990. 

98. Anderson, C., Franklin, J., Sutton, R. S., “Learning a nonlinear model of a manufacturing 

process using multilayer connectionist networks,” Proceedings of the Fifth IEEE 

International Symposium on Intelligent Control, pp. 404–409, 1990. 

16

99. Sutton, R. S., “Artificial intelligence by dynamic programming,” Proceedings of the Sixth 

Yale Workshop on Adaptive and Learning Systems, pp. 89–95. Yale University, New 

Haven, CT, 1990. 

100.Franklin, J., Sutton, R. S., Anderson, C., Selfridge, O., Schwartz, D., “Connectionist 

learning control at GTE laboratories,” Intelligent Control and Adaptive Systems, G. 

Rodriguez, Ed., Proc. SPIE 1196, pp. 242–253, 1990. 

101.Sutton, R. S., “Artificial intelligence as a control problem: comments on the relationship 

between machine learning and intelligent control,” Proceedings of the IEEE International 

Symposium on Intelligent Control 1988, pp. 500–507, 1989. 

102.Franklin, J., Sutton, R. S., Anderson, C. “Application of connectionist learning methods 

to manufacturing process monitoring,” Proceedings of the IEEE International 

Symposium on Intelligent Control 1988, pp. 709–712, 1989. 

103.Sutton, R. S., “Convergence theory for a new kind of prediction learning,” Proceedings 

of the 1988 Workshop on Computational Learning Theory, pp. 421–422 1988. 

104.Sutton, R. S., “Learning distributed, searchable, internal models,” Proceedings of the 

Distributed Artificial Intelligence Workshop, pp. 287–289, 1985. 

105.Barto, A. G., Sutton, R. S., Anderson, C., “Spatial learning simulation systems,” 

Proceedings of the 10th IMACS World Congress on Systems Simulation and Scientific 

Computation, pp. 204–206, 1982. 

Other Publications 

106.“Beyond Reward: The Problem of Knowledge and Data.” Extended abstract in the 

Proceedings of the 21st International Conference on Inductive Logic Programming, 

Muggleton, S., Watanabe, H., Tamaddoni-Nezhad, A., and Chen, J. Eds. Lecture Notes in 

Computer Science, Springer, 2011 

107.“Interview with Richard S. Sutton,” by Verena Heidrich-Meisner. Kuenstliche 

Intelligenz. (2009). pp. 41-43. 

108.Sutton, R. S., “Reinforcement learning: Past, present, and future.” Extended abstract in 

Simulated Evolution and Learning, McKay, B., Yao, X., Newton, C. S., Kim, J.-H., 

Furuhashi, T., Eds. Published as Lecture Notes in Computer Science 1585, pp. 195–197, 

Springer, 1999. 

109.McGovern, A., Sutton, R. S., “Macro-actions in reinforcement learning: an empirical 

analysis.” Technical Report 98-70, Department of Computer Science, University of 

Massachusetts, Amherst, MA 01003, 1998. 

17

110.Precup, D., Sutton, R. S., “Empirical comparison of gradient descent and exponentiated 

gradient descent in supervised and reinforcement learning.” Technical Report UM- 

CS-1996-070, Department of Computer Science, University of Massachusetts, Amherst, 

MA 01003, 1996. 

111.Sutton, R. S., “On the virtues of linear learning and trajectory distributions.” Abstract in 

Proceedings of the Workshop on Value Function Approximation, Conference on Neural 

Information Processing Systems, 1995. 

112.Sutton, R. S., “Machines that learn and mimic the brain.” In ACCESS, GTE's Journal of 

Science and Technology, 1992. Reprinted in Stethoscope Quarterly, Spring 1993. 

113.Sutton, R. S., “The challenge of reinforcement learning.” Introduction to a special issue 

on reinforcement learning, Machine Learning 8, No 3/4, pp. 225–227, 1992. 

114.Sutton, R. S., “Implementation details of the TD(lambda) procedure for the case of 

vector predictions and backpropagation.” GTE Laboratories Technical Report 

TR87-509.1, GTE Laboratories, 40 Sylvan Road, Waltham, MA 02254, 1989. 

115.Sutton, R. S., “NADALINE: a normalized adaptive linear element that learns 

efficiently.” GTE Laboratories Technical Report TR88-509.4, GTE Laboratories, 40 

Sylvan Road, Waltham, MA 02254, 1988. 

116.Sutton, R. S., “Temporal credit assignment in reinforcement learning,” Ph.D. 

dissertation, University of Massachusetts, Amherst, MA. Published as COINS Technical 

Report 84-2, 1984. 

117.Barto, A. G., Sutton, R. S., “Goal seeking components for adaptive intelligence: an initial 

assessment.” Air Force Wright Aeronautical Laboratories Technical Report, AFWAL- 

TR-81-1070, Wright-Patterson Air Force Base, Ohio, 1981. 

118.Sutton, R. S., “Single channel theory: a neuronal theory of learning,” Brain Theory 

Newsletter 3, No. 3/4, pp. 72–75, 1978. (earliest publication) 

Patent 

119.Sutton, R. S., “Apparatus for Machine Learning.” United States Patent No. 5,946,675, 

August 31, 1999. Assigned to GTE Laboratories Incorporated, Waltham, Mass. 

18

Richard S. Sutton - Webdocs Cs Ualberta - University of Alberta

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?