TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

More documents

Recommendations

Info

116 Chapter 6. Learning to make decisionsAlgorithm 6.1 Object Q-Learning algorithm1: procedure COMPUTE OBJECT Q-LEARNING2: Initialize all Q values to 13: repeat for each iterationRequire: s ← current stateRequire: a ← executed actionRequire: object i ← object the action is executed withRequire: s ′ ← new stateRequire: r ← reward4: collateral_effects ← 05: for all object j do6: if object j ≠ object i then ⊲ The collateral effects do not consider theobject that the action was executed with7: max_q_s ← max[Q obj j(s objj , a)]8: max_q_new_s ← max[Q obj j(s ′ obj j, a)]9: collateral_effects ← collateral_effects + (max_q_new_s −max_q_s)10: end if11: end for12: value_obj i _new_s ← max[Q(s ′ obj i, a)] + collateral_effects13: q ← Q(s obji , a sobji )14: new_q ← (1 − α) · q + α(r + δ · value_obj i _new_s)15: Q(s obji , a sobji ) ← new_q16: until learning ends17: end procedureScenario 1In this first scenario, the robot needs calm (i.e. relax is the dominant motivation), it isunplugged to the docking station, it is listening to music, the robot is close to the playerand there is not users around. Then, the robot decides to stop the music player. The statetransitions are shown in Table 6.1. This action affects three elements; first, the dominantmotivation changes: after the player is turned off, there is not a new dominant motivationbecause the need of calm has been satisfied and the intensity of the other motivations isnot high enough; also the states of the music player and the music have changed too. Thisaction is related to the object music player but also the object music is affected. The valueof the collateral effects is calculated in Table 6.2.In this particular case, the corresponding Q value, Q playerrelax(near-on, stop), is updatedas follows in Equations (6.12) and (6.13).
6.2. Object Q-Learning 117Table 6.1: State transitions due to the action stop music in Scenario 1anterior state (s t ) new state (s t+1 )dominant motivation relax nonedocking station unplugged unpluggedmusic player * near-on near-offmusic listening non-listeninguser absent absentQ playerrelax(near − on, stop) =(1 − α) · Q playerrelax (near-on, stop) + α · (r+ γ · Vnone player (near-off) ) (6.12)(Vnoneplayer (near-off) = max Qplayernone (near-off, a)) +a∈A player∑obj m≠player∆Q objmmax (6.13)Table 6.2: Collateral effects due to the action stop music in Scenario 1(Object m max a∈Aobjm Qobj m(s t+1 ,a) ) (max a∈Aobjm Qobj m(s t ,a) ) ∆Q objmmaxdocking station Q stationnone (unplugged, charge) = −1, 17895 Q station (unplugged, charge) = 1 -2,17895music∑user∆Q objmobj m≠playerrelaxQ musicrelaxQ musicnone (non-listening, −) = −(listening, dance) = 1 -1Q usernone (absent, −) = −Quser relax (absent, −) = − −max -3,17895The reward and the rest of the parameters which are required for updating the Q value,as well as the new Q value, are presented in Table 6.3. Since this is the first time this actionis executed in the state s t , its Q value corresponds to the initial value of 1. From the newstate (s t+1 ), the best thing to do with the music player is to turn it on, which has a calculatedvalue of 1, 154.Q playerTable 6.3: New Q value for Scenario 1relax (near-on,stop) reward V player ((s t+1 ) = Vnone player(near-off)new Q playermaxrelax (near-on,stop)Qplayernone (near-off,a)) Coll.Effectsa∈A player1 52,5399 Q playernone (near-off, play) = 1, 154 -3,17895 15,975982In this scenario, the most influent parameter is the reward. Stopping the music playerresults in the satisfaction of the drive calm. Therefore, the relax motivation is considerablyreduced and it ceases to be the dominant one. This is the reason of the high value
Page 1:
TESIS DOCTORALBIO-INSPIRED DECISION
Page 7 and 8:
AgradecimientosSon muchas las veces
Page 9 and 10:
AbstractRobotics is an emergent fie
Page 11 and 12:
ResumenLa robótica es un área eme
Page 13 and 14:
ContentsAgradecimientosAbstractResu
Page 15 and 16:
5 The social robot Maggie and its d
Page 18 and 19:
9.4 Harm/interactions with Alvaro d
Page 20 and 21:
3.10 An overview of the net of syst
Page 23:
List of Algorithms6.1 Object Q-Lear
Page 26 and 27:
xxii
Page 28 and 29:
2 Chapter 1. IntroductionFigure 1.1
Page 30 and 31:
4 Chapter 1. Introductionautonomous
Page 32 and 33:
6 Chapter 1. IntroductionAs in othe
Page 34 and 35:
8 Chapter 1. Introductiondesired ou
Page 36 and 37:
10 Chapter 1. Introduction1.4 Overv
Page 38 and 39:
12 Chapter 1. Introduction
Page 40 and 41:
14 Chapter 2. Biological foundation
Page 42 and 43:
Page 44 and 45:
Page 46 and 47:
Page 48 and 49:
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
Page 56 and 57:
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
Page 64 and 65:
38 Chapter 3. State of the Artand b
Page 66 and 67:
40 Chapter 3. State of the Art(a) R
Page 68 and 69:
42 Chapter 3. State of the Artpatie
Page 70 and 71:
44 Chapter 3. State of the Art(a) i
Page 72 and 73:
46 Chapter 3. State of the Artrange
Page 74 and 75:
48 Chapter 3. State of the Artwell
Page 76 and 77:
50 Chapter 3. State of the Artthe a
Page 78 and 79:
52 Chapter 3. State of the ArtThe e
Page 80 and 81:
54 Chapter 3. State of the Arttask.
Page 82 and 83:
56 Chapter 3. State of the Artthe r
Page 84 and 85:
58 Chapter 3. State of the Artit is
Page 86 and 87:
60 Chapter 3. State of the Artthe r
Page 88 and 89:
62 Chapter 3. State of the Artnon-l
Page 90 and 91:
64 Chapter 3. State of the ArtTAME
Page 92 and 93: 66 Chapter 3. State of the ArtMinsk
Page 94 and 95: 68 Chapter 3. State of the Art
Page 96 and 97: 70 Chapter 4. The Decision Making S
Page 114 and 115: 88 Chapter 5. The social robot Magg
Page 126 and 127: 100 Chapter 5. The social robot Mag
Page 136 and 137: 110 Chapter 6. Learning to make dec
Page 154 and 155: 128 Chapter 7. Implementing the dec
Page 192 and 193:
166 Chapter 8. Testing the experime
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
178 Chapter 9. Experimental Results
Page 206 and 207:
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
198 Chapter 10. Conclusions and Fut
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
208 Bibliography[8] M. A. Martínez
Page 236 and 237:
210 Bibliography[35] B. Hardy-Vall
Page 238 and 239:
212 Bibliography[63] J. LeDoux, “
Page 240 and 241:
214 Bibliography[90] C. Bartneck an
Page 242 and 243:
216 Bibliography[115] B. Graf, U. R
Page 244 and 245:
218 Bibliography[140] W. P. Lee, J.
Page 246 and 247:
220 Bibliography[166] C. Isbell, C.
Page 248 and 249:
222 Bibliography[190] M. A. Salichs
show all

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?