TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

More documents

Recommendations

Info

174 Chapter 8. Testing the experimental setupthe amplified one (Figure 8.4(a)) has higher values, the policy seems to be equal. However,focusing on the going to the player action, this is not equal. This action is required in orderto satisfy the need of entertainment. In Figure 8.4(a), the Q value associated to this actionis the forth highest positive value. In contrast, in Figure 8.4(b), this Q value is negative andother actions not related to the motivation of fun are over its value.Using the Amplified Reward the learned values are higher, therefore, the back-propagationalong all successive needed actions is stronger and it reaches farther actions faster.Probably, longer experiments will end with a positive value of the go to the playeraction. However, by means of Amplified Reward this is achieved in a shorter period oftime.Well-balanced explorationAs expressed in Section 6.3.1, an exhausted exploration of all situations in order to correctlylearn the proper behaviors is needed. Next, a situation where exploration is poorly achievedis shown. Figure 8.5 presents a four hundred iterations learning session where the WellbalancedExploration has not been considered. It corresponds to the dominant motivationrelax which associated drive is the slowest one (this has been explained in Section 5.4.1).The remarkable issue extracted from Figure 8.5 is the long periods where non of thevalues are updated. Roughly, these periods correspond to the iterations ranges from 0 to160 and from 250 to 390; this is about one hour and a half. These long lasting periods withstability of values during a learning session means that this motivation is not explored inthese periods. In other words, relax does not frequently become the dominant motivation.These circumstances lead to a set of state-action pairs that are not enough explored andtherefore they will not be properly learned in an acceptable amount of time.The effects of the Well-balanced Exploration when relax is the dominant motivationcan be observed in Figure 8.3(b). During the whole learning session, there is a frequentupdate of any state-action pair related to the relax motivation. There are not more of thoselong periods of undesired stability in a particular motivation.8.5 SummaryAt the beginning, this chapter introduces the structure of the experiments with two differentphases: the exploring phase where learning is achieved, and the exploiting phase where thelearned policy is employed. Moreover, the available active objects were introduced: theusers; two people will share the robot’s environment during the experiments: Perico (whoalways positively interacts) and Alvaro (he sporadically harms the robot).This chapter has proved the correct working of the DMS. Initially, how the intensitiesof motivations are formed due to the interconnections with internal and external stimuli hasbeen clarified and examined in a fragment of a real experiment.
8.5. Summary 175(a) Learning with the Amplified Reward(b) Learning without the Amplified RewardFigure 8.4: Effects of Amplified Reward on the learning process when the dominant motivationis fun
Page 1:
TESIS DOCTORALBIO-INSPIRED DECISION
Page 7 and 8:
AgradecimientosSon muchas las veces
Page 9 and 10:
AbstractRobotics is an emergent fie
Page 11 and 12:
ResumenLa robótica es un área eme
Page 13 and 14:
ContentsAgradecimientosAbstractResu
Page 15 and 16:
5 The social robot Maggie and its d
Page 18 and 19:
9.4 Harm/interactions with Alvaro d
Page 20 and 21:
3.10 An overview of the net of syst
Page 23:
List of Algorithms6.1 Object Q-Lear
Page 26 and 27:
xxii
Page 28 and 29:
2 Chapter 1. IntroductionFigure 1.1
Page 30 and 31:
4 Chapter 1. Introductionautonomous
Page 32 and 33:
6 Chapter 1. IntroductionAs in othe
Page 34 and 35:
8 Chapter 1. Introductiondesired ou
Page 36 and 37:
10 Chapter 1. Introduction1.4 Overv
Page 38 and 39:
12 Chapter 1. Introduction
Page 40 and 41:
14 Chapter 2. Biological foundation
Page 42 and 43:
Page 44 and 45:
Page 46 and 47:
Page 48 and 49:
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
Page 56 and 57:
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
Page 64 and 65:
38 Chapter 3. State of the Artand b
Page 66 and 67:
40 Chapter 3. State of the Art(a) R
Page 68 and 69:
42 Chapter 3. State of the Artpatie
Page 70 and 71:
44 Chapter 3. State of the Art(a) i
Page 72 and 73:
46 Chapter 3. State of the Artrange
Page 74 and 75:
48 Chapter 3. State of the Artwell
Page 76 and 77:
50 Chapter 3. State of the Artthe a
Page 78 and 79:
52 Chapter 3. State of the ArtThe e
Page 80 and 81:
54 Chapter 3. State of the Arttask.
Page 82 and 83:
56 Chapter 3. State of the Artthe r
Page 84 and 85:
58 Chapter 3. State of the Artit is
Page 86 and 87:
60 Chapter 3. State of the Artthe r
Page 88 and 89:
62 Chapter 3. State of the Artnon-l
Page 90 and 91:
64 Chapter 3. State of the ArtTAME
Page 92 and 93:
66 Chapter 3. State of the ArtMinsk
Page 94 and 95:
68 Chapter 3. State of the Art
Page 96 and 97:
70 Chapter 4. The Decision Making S
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
88 Chapter 5. The social robot Magg
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
100 Chapter 5. The social robot Mag
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
110 Chapter 6. Learning to make dec
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151: 124 Chapter 6. Learning to make dec
Page 152 and 153: 126 Chapter 6. Learning to make dec
Page 154 and 155: 128 Chapter 7. Implementing the dec
Page 192 and 193: 166 Chapter 8. Testing the experime
Page 204 and 205: 178 Chapter 9. Experimental Results
Page 224 and 225: 198 Chapter 10. Conclusions and Fut
Page 234 and 235: 208 Bibliography[8] M. A. Martínez
Page 236 and 237: 210 Bibliography[35] B. Hardy-Vall
Page 238 and 239: 212 Bibliography[63] J. LeDoux, “
Page 240 and 241: 214 Bibliography[90] C. Bartneck an
Page 242 and 243: 216 Bibliography[115] B. Graf, U. R
Page 244 and 245: 218 Bibliography[140] W. P. Lee, J.
Page 246 and 247: 220 Bibliography[166] C. Isbell, C.
Page 248 and 249: 222 Bibliography[190] M. A. Salichs
show all

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?