11.07.2015 Views

Kenji Doya 2001

Kenji Doya 2001

Kenji Doya 2001

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

one by one, which takes a lot of time. On the other hand, aset of predictors can be evaluated simultaneously by runningthem in parallel and comparing their prediction outputswith the output of the real system. This simple factmotivated a modular control architecture in which eachcontroller is paired with a predictor [35].Fig. 10 shows the module selection and identification control(MOSAIC) architecture, based on the prediction error ofeach moduleE ( t) = x$ ( t) −x( t)iiwhere x$ i( t)is the output of the ith predictor. The responsibilitysignal is given by the soft-max function..θ [rad/s ]2..θ [rad/s ]220100−10−20620100−10−2060441Before Learning2 0 2θ [rad](a)After 200 Trials20θ [rad](b)23TimeFigure 11. A result of learning to swing up an underpoweredpendulum using a reinforcement learning version of the MOSAICarchitecture [36]. (a) The sinusoidal nonlinearity of the gravity termin the angular acceleration was approximated by two linear models,shown in different colors. In each module, a local quadratic rewardmodel was also learned and a linear feedback policy was derived bysolving a Riccati equation. (b) The resulting policy for the firstmodule made the downward position unstable. As the pendulummoved upward, based on the relative prediction errors of the twoprediction models, the second module was selected and stabilizedthe upward position.22,4446652( Ei( t)/σ )2( Ejt σ )expλi( t ) =∑ exp ( ) /jThis is used for weighting the outputs of multiple controllers,i.e.,u( t) =Σλ ( t) u ( t),i i iwhere ui ( t)is the output of the ith controller. The responsibilitysignal is also used to weight the learning rates of thepredictors and the controllers of the modules, which causesmodules to be specialized for different situations. The parameterσ, which controls the sharpness of module selection,is initially set large to avoid suboptimal specializationof modules.Fig. 11 shows an example of using this scheme in a simplenonlinear control task of swinging up a pendulum [36]. Eachmodule learns a locally linear dynamic model and a locallyquadratic reward model. Based on these models, the valuefunction and the corresponding control policy for eachmodule are derived by solving a Riccati equation, whichmakes learning much faster than by iterative estimation ofthe value function. After about 100 trials, each module successfullyapproximated the sinusoidal nonlinearity in thedynamics in either the bottom or top half of the state space.Accordingly, a controller that destabilizes the stable equilibriumat the bottom and another controller that stabilizesthe unstable equilibrium at the top were derived. They weresuccessfully switched based on the responsibility signal.The scheme has also been shown to be applicable to anonstationary control task. The results suggest the usefulnessof this biologically motivated modular control architecturein decomposing nonlinear and/or nonstationarycontrol tasks in space and time based on the predictabilityof the system dynamics. A careful theoretical study assessingthe conditions in which modular learning and controlmethods work reliably is still required.Imamizu and colleagues performed a series of visuomotoradaptation experiments using a computer mousewith rotated pointing direction (e.g., the cursor moves tothe right when the mouse is moved upward). Initially inlearning, a large part of the cerebellum was activated. However,as the subject became proficient in the rotated mousemovement, small spots of activation were found in the lateralcerebellum, which can be interpreted as the neural correlateof the internal model of the new tool [37].Furthermore, when the subject was asked to use two differentkinds of an unusual computer mouse, two different setsof activation spots were found in the lateral cerebellum (Fig.12) [38].Experiments of multiple sequential movement in monkeyshave shown that neurons in the supplementary motor area(SMA) are selectively activated during movements in particularsequences. Furthermore, in an adjacent area calledpre-SMA, some neurons were activated when the monkey.52 IEEE Control Systems Magazine August <strong>2001</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!