13.07.2015 Views

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

50 Chapter 3. State of the Artthe adaptive system. The goal system evaluates the behaviors selected and notifies when abehavior should be interrupted. In other words, it <strong>de</strong>termines the reinforcement and whenbehavior switching should occur. The performance of a behavior is measured in terms ofthe state of the homeostatic variables which must be maintained within a certain range. Inor<strong>de</strong>r to reflect the hedonic state of the agent, a wellbeing value is created which mainly<strong>de</strong>pends on the value of the homeostatic variables, their states, their transitions, and theirpredictions. This wellbeing value is used as the reinforcement function.The adaptive system is in charge of the learning process. It implements the Q-Learningalgorithm, so it learns the utility value for each action. These values are stored by neuralnetworks which are fed with the homeostatic variables and other sensory data. As a result,the agent will try to maximize the reinforcement received by selecting among all availableactions.Finally, the cognitive system is based on a set of rules extracted from the agent-environmentinteraction which represent particular successful behavior selections. These rulescan be updated, <strong>de</strong>leted, or even merged. When one of these rules fits the current state, thesuggested behavior is promoted by adding a constant value to the respective Q-value.As said before, following Tomkins’ i<strong>de</strong>a that the human <strong>de</strong>cision making process consistson maximizing the positive emotions and minimizing the negative ones, emotions inALEC architecture are related to pleasant/unpleasant feelings working as reinforcement.The wellbeing value plays this role and it also can be seen as an emotional feeling of theoverall state of the agent. Moreover, the learning process results on associating behaviorstatepairs expecting long-term wellbeing value which indicates the goodness of the availableoptions, similar to the somatic markers proposed by Damasio[132]. The performanceis measured in terms of the state of these homeostatic variables which must be maintainedwithin a certain range.3.3.4 Breazeal’s mo<strong>de</strong>l (2000)Probably, one of the most influential works in this area is the Cynthia Breazeal’s thesis [4].She continued Velásquez’s work and, as far as the author knows, she presented the firstsocial robot, Kismet (Figure 3.3(a)), endowed with a motivational system with emotionsand drives. Later, the system was also implemented in the robot Leonardo (Figure 3.3(b)).She proposes a rather complex net of intertwined systems (Figure 3.10): the Emotion Systemwhere the robot’s affective state is <strong>de</strong>termined, the drives that correspond to the innateneeds, the Behavior System which is in charge of the arbitration of the available behaviors,and other modules which are directly connected with the hardware.Breazeal thinks on emotions and drives as two related motivational systems. Drives areinvolved in the homeostatic regulation processes that maintain critical parameters within aboun<strong>de</strong>d range. Emotions are mo<strong>de</strong>ls of basic emotions which have particular functions.They arise un<strong>de</strong>r particular circumstances, and motivate the robot to react in an adaptive

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!