Kenji Doya 2001

More documents

Recommendations

Info

was instructed to change the movement sequence [39].These results suggest the possibility that modular organizationof internal models like MOSAIC is realized in a circuit includingthe cerebellum and the cerebral cortex.ConclusionWe reviewed three topics in motor control and learning: predictionof future rewards, the use of internal models, andmodular decomposition of a complex task using multiplemodels. We have seen that these three aspects of motor controlare related to the functions of the basal ganglia, the cerebellum,and the cerebral cortex, which are specialized forreinforcement, supervised, and unsupervised learning, respectively[1], [2].Our brain undoubtedly implements the most efficientand robust control system available to date. However, howit really works cannot be understood just by watching its activityor by breaking it down piece by piece. It was not untilthe development of reinforcement learning theory that aclear light was shed on the function of the basal ganglia.Theories of adaptive control and studies of artificial neuralnetworks were essential in understanding the function ofthe cerebellum. Such understanding provided new insightsfor the design of efficient learning and control systems. Thetheory of adaptive systems and the understanding of brainfunction are highly complementary developments.Rotating MouseIntegrating MouseAcknowledgmentsWe thank Raju Bapi, Hiroaki Gomi, Okihide Hikosaka,Hiroshi Imamizu, Jun Morimoto, Hiroyuki Nakahara, andKazuyuki Samejima for their collaboration on this article.Studies reported here were supported by the ERATO andCREST programs of Japan Science and Technology (JST)Corp.References[1] K. Doya, “What are the computations of the cerebellum, the basal ganglia,and the cerebral cortex,” Neural Net., vol. 12, pp. 961-974, 1999.[2] K. Doya, “Complementary roles of basal ganglia and cerebellum in learningand motor control,” Current Opinion in Neurobiology, vol. 10, pp. 732-739, 2000.[3] R.S. Sutton and A.G. Barto, Reinforcement Learning. Cambridge, MA: MITPress, 1998.[4] G. Tesauro, “TD-Gammon, a self-teaching backgammon program, achievesmaster-level play,” Neural Comput., vol. 6, pp. 215-219, 1994.[5] D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming. Belmont,MA: Athena Scientific, 1996.[6] K. Doya, “Reinforcement learning in continuous time and space,” NeuralComput., vol. 12, pp. 219-245, 2000.[7] W. Schultz, P. Dayan, and P.R. Montague, “A neural substrate of predictionand reward,” Science, vol. 275, pp. 1593-1599, 1997.[8] J.C. Houk, J.L. Adams, and A.G. Barto, “A model of how the basal gangliagenerate and use neural signals that predict reinforcement,” in Models of InformationProcessing in the Basal Ganglia, J.C. Houk, J.L. Davis, and D.G.Beiser, Eds. Cambridge, MA: MIT Press, 1995, pp. 249-270.[9] P.R. Montague, P. Dayan, and T.J. Sejnowski, “A framework formesencephalic dopamine systems based on predictive Hebbian learning,” J.Neurosci., vol. 16, pp. 1936-1947, 1996.[10] J.N. Kerr and J.R. Wickens, “Dopamine D-1/D-5 receptor activation is requiredfor long-term potentiation in the rat neostriatum in vitro,” J.Neurophysiol., vol. 85, pp. 117-124, 2001.[11] H. Nakahara, K. Doya, and O. Hikosaka, “Parallel cortico-basal gangliamechanisms for acquisition and execution of visuo-motor sequences—Acomputational approach,” J. Cognitive Neurosci., vol. 13, no. 5, 2001.[12] R.E. Suri and W. Schultz, “Temporal difference model reproduces anticipatoryneural activity,” Neural Comput., vol. 13, pp. 841-862, 2001.[13] J. Brown, D. Bullock, and S. Grossberg, “How the basal ganglia use parallelexcitatory and inhibitory learning pathways to selectively respond to unexpectedrewarding cues,” J. Neurosci., vol. 19, pp. 10502-10511, 1999.[14] H. Gomi and M. Kawato, “Equilibrium-point control hypothesis examinedby measured arm stiffness during multijoint movement,” Science, vol. 272, pp.117-120, 1996.[15] D.M. Wolpert, R.C. Miall, and M. Kawato, “Internal models in the cerebellum,”Trends in Cognitive Sciences, vol. 2, pp. 338-347, 1998.Figure 12. The activity in the cerebellum for two different kinds ofcomputer mouse: a rotating mouse (red), in which the direction ofthe cursor movement is rotated, and an integrating mouse (yellow),in which the mouse position specifies the velocity of the cursormovement [38]. The subjects were asked to track a complex cursormovement trajectory on the screen, alternately using the twodifferent mouse settings. Large areas in the cerebellum wereactivated initially. After several hours of training, activities wereseen in limited spots in the lateral cerebellum, which were differentfor different types of mouse.[16] M. Kawato, “Internal models for motor control and trajectory planning,”Current Opinion in Neurobiology, vol. 9, pp. 718-727, 1999.[17] D. Marr, “A theory of cerebellar cortex,” J. Physiol., vol. 202, pp. 437-470,1969.[18] J.S. Albus, “A theory of cerebellar function,” Math. Biosci., vol. 10, pp.25-61, 1971.[19] M. Ito, M. Sakurai, and P. Tongroach, “Climbing fibre induced depressionof both mossy fibre responsiveness and glutamate sensitivity of cerebellarPurkinje cells,” J. Physiol., vol. 324, pp. 113-134, 1982.[20] Y. Kobayashi, K. Kawano, A. Takemura, Y. Inoue, T. Kitama, H. Gomi, andM. Kawato, “Temporal firing patterns of Purkinje cells in the cerebellar ven-August 2001 IEEE Control Systems Magazine 53
tral paraflocculus during ocular following responses in monkeys. II. Complexspikes,” J. Neurophysiol., vol. 80, pp. 832-848, 1998.[21] S. Kitazawa, T. Kimura, and P.-B. Yin, “Cerebellar complex spikes encodeboth destinations and errors in arm movements,” Nature, vol. 392, pp.494-497, 1998.[22] M. Kawato, K. Furukawa, and R. Suzuki, “A hierarchical neural networkmodel for control and learning of voluntary movement,” Biol. Cybern., vol. 57,pp. 169-185, 1987.[23] M. Kawato, “The feedback-error-learning neural network for supervisedmotor learning,” in Neural Network for Sensory and Motor Systems, R.Eckmiller, Ed. Amsterdam: Elsevier, 1990, pp. 365-372.[24] K. Doya, H. Kimura, and A. Miyamura, “Motor control: Neural models andsystem theory,” Appl. Math. Comput. Sci., vol. 11, pp. 101-128, 2001.[25] A. Miyamura and H. Kimura, “Stability of feedback error learningscheme,” submitted for publication.[26] M. Lotze, P. Montoya, M. Erb, E. Hulsmann, H. Flor, U. Klose, N. Birbaumer,and W. Grodd, “Activation of cortical and cerebellar motor areas during executedand imagined hand movements: An fMRI study,” J. Cognitive Neurosci.,vol. 11, pp. 491-501, 1999.[27] O. Hikosaka, H. Nakahara, M.K. Rand, K. Sakai, X. Lu, K. Nakamura, S.Miyachi, and K. Doya, “Parallel neural networks for learning sequential procedures,”Trends Neurosci., vol. 22, pp. 464-471, 1999.[28] J.H. Gao, L.M. Parsons, J.M. Bower, J. Xiong, J. Li, and P.T. Fox, “Cerebellumimplicated in sensory acquisition and discrimination rather than motorcontrol,” Science, vol. 272, pp. 545-547, 1996.[29] S.J. Blakemore, D.M. Wolpert, and C.D. Frith, “Central cancellation ofself-produced tickle sensation,” Nature Neurosci., vol. 1, pp. 635-640, 1998.[30] M. Ito, “Movement and thought: Identical control mechanisms by the cerebellum,”Trends Neurosci., vol. 16, pp. 448-450, 1993.[31] Y.P. Shimansky, “Spinal motor control system incorporates an internalmodel of limb dynamics,” Biol. Cybern., vol. 83, pp. 379-389, 2000.[32] E. Nakano, H. Imamizu, R. Osu, Y. Uno, H. Gomi, T. Yoshioka, and M.Kawato, “Quantitative examinations of internal representations for arm trajectoryplanning: Minimum commanded torque change model,” J.Neurophysiol., vol. 81, pp. 2140-2155, 1999.[33] R.S. Bapi, K. Doya, and A.M. Harner, “Evidence for effector independentand dependent representations and their differential time course of acquisitionduring motor sequence learning,” Experimental Brain Res., vol. 132, pp.149-162, 2000.[34] J. Morimoto and K. Doya, “Acquisition of stand-up behavior by a real robotusing hierarchical reinforcement learning,” in 17th Int. Conf. MachineLearning, 2000, pp. 623-630.[35] D.M. Wolpert and M. Kawato, “Multiple paired forward and inverse modelsfor motor control,” Neural Net., vol. 11, pp. 1317-1329, 1998.[36] K. Doya, K. Samejima, K. Katagiri, and M. Kawato, “Multiple model-basedreinforcement learning,” Japan Sci. and Technol. Corp., Kawato DynamicBrain Project Tech. Rep. KDB-TR-08, 2000.[37] H. Imamizu, S. Miyauchi, T. Tamada, Y. Sasaki, R. Takino, B. Pütz, T.Yoshioka, and M. Kawato, “Human cerebellar activity reflecting an acquiredinternal model of a new tool,” Nature, vol. 403, pp. 192-195, 2000.[38] H. Imamizu, S. Miyauchi, Y. Sasaki, R. Takino, B. Pütz, and M. Kawato,“Separated modules for visuomotor control and learning in the cerebellum: Afunctional MRI study,” in NeuroImage: Third International Conference on FunctionalMapping of the Human Brain, vol. 5, A.W. Toga, R.S.J. Frackowiak, andJ.C. Mazziotta, Eds. Copenhagen, Denmark, 1997, pp. S598.[39] K. Shima, H. Mushiake, N. Saito, and J. Tanji, “Role for cells in thepresupplementary motor area in updating motor plans,” in Proc. Nat. Academyof Sciences, vol. 93, pp. 8694-8698, 1996.Kenji Doya received the Ph.D. in engineering from the Universityof Tokyo in 1991. He was a Research Associate at theUniversity of Tokyo in 1986, at the University of California,San Diego, in 1991, and at Salk Institute in 1993. He has beena Senior Researcher at ATR International since 1994, and theDirector of Metalearning, Neuromodulation, and EmotionResearch, CREST, at JST, since 1999. He serves as an actioneditor of Neural Networks and Neural Computation and as aboard member of the Japanese Neural Network Society. Hisresearch interests include reinforcement learning, the functionsof the basal ganglia and the cerebellum, and the rolesof neuromodulators in metalearning.Hidenori Kimura received the Ph.D. in engineering fromthe University of Tokyo in 1970. He was appointed a facultymember at Osaka University in 1970, a Professor with theDepartment of Mechanical Engineering for Computer-ControlledMachinery, Osaka University, in 1987, and a Professorin the Department of Mathematical Engineering andInformation Physics, University of Tokyo, in 1995. He hasbeen working on the theory and application of robust controland system identification. He received the IEEE-CSS outstandingpaper award in 1985 and the distinguishedmember award of the IEEE Control Systems Society in 1996.He is an IEEE Fellow.Mitsuo Kawato received the Ph.D. in engineering fromOsaka University in 1981. He became a faculty member atOsaka University in 1981, a Senior Researcher at ATR Auditoryand Visual Processing Research laboratories in 1988, adepartment head at ATR Human Information Processing ResearchLaboratories in 1992, and the leader of the ComputationalNeuroscience Project at Information Sciences Division,ATR International, in 2001. He has been the project leader ofthe Kawato Dynamic Brain Project, ERATO, JST, since 1996.He received an outstanding research award from the InternationalNeural Network Society in 1992 and an award from theMinistry of Science and Technology in 1993. He serves as aco-editor-in-chief of Neural Networks and a board member ofthe Japanese Neural Network Society. His research interestsinclude the functions of the cerebellum and the roles of internalmodels in motor control and cognitive functions.54 IEEE Control Systems Magazine August 2001
Page 10 and 11: ferent state representations [11].

Kenji Doya 2001

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?