TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

More documents

Recommendations

Info

114 Chapter 6. Learning to make decisionsConsidering the example presented at the beginning of this section, if the objects therobot can interact with are limited to a music player and the docking station, the currentstates related to these objects are far from the player and plugged to the charger. Once theaction go to the player is executed, then the new states are close to the player and unpluggedfrom the docking station. Therefore, the Object Q-Learning is applied as follows 1 . FromEquations (6.10) and (6.11), the Q value is computed according to the next equation:where V player (close) is:Q player (far, go to) == (1 − α) · Q player (far, go to) + α · (r+ γ · V player (close) )V player (close) = maxa∈A player(Q player (close, a) ) +∑obj m≠playerand a can be any action with the player. The collateral effects are:∑obj m≠player∆Q objmmax = ∆Qcharger max =∆Q objmmax(= max Q charger (unplugged, a) ) (− max Q charger (plugged, a) )a∈A charger a∈A chargerwhere a is any action related to charger.6.2.4 The algorithmOnce the ideas of the algorithm have been stated, the algorithm itself has to be analyzed. Ina RL framework, an agent in a state executes an action, it transits to a new state, and a rewardis obtained. In an Object Q-Learning framework, the state is determined in relation tothe objects and the potential actions are restricted by the state: an agent is in a state relatedto a particular object i (s obji ) and it executes an action with this object (a obji ); this actioncan provoke a change in the state related to this object (s ′ obj i) and a reward (r); in addition,this action can also provoke changes in the state related to other objects (s objj , ∀j ≠ i),which have been called the collateral effect. All these elements are presented in Figure 6.1;the collateral effects are represented by dashed arrows.The algorithm updates the Q values after an action is executed. Then, these values arerefreshed according to the reward obtained, the anterior and new states, and the prior Q1 In order to keep the example simple, the state will be formed just by the external state, and the internalstate will not be considered.
6.2. Object Q-Learning 115s objia obji,rs objis objns′ objnFigure 6.1: The Object Q-Learning frameworkvalues. Every time a Q value is updated, it is referred as an iteration. The pseudo-codefor the algorithm is detailed in Algorithm 6.1. Initially, all Q values have to be set to arandom value, in this case they were fixed to 1 (line 2). Then, the algorithm iterates everytime the robot acts. First, the collateral effects are computed (lines 4-11). For each object,the difference between the best the robot can do from the new state with that object andthe best it could do from the anterior state is calculated and added to the collateral_effectvariable. Once the collateral effects for all items are calculated, the value for the object iin state s ′ is determined as the sum of the Q value corresponding to the best action withobject i from the state s ′ (line 12) and the collateral effects. With these values and the priorQ value, the new Q value for the object i in the state s when the action a is accomplishedis updated (lines 13-15).In order to provide a clear understanding of this algorithm, several real examples willbe analyzed step by step. The calculations shown in the next examples are the numbersresulting at single iterations during the experiments with the robot. Different experimentscould result in different values. In all these examples, the actions executed have been relatedto the object music player (it is marked with an asterisk in the state transition tables), butin different situations. Trying to keep the examples as clear as possible, no user has beenincluded in the following scenarios. Besides, when there are not feasible actions from aparticular state, this is represented in collateral effects tables with a hyphen.The learning rate and the discount factor for all the scenarios have been fixed to α = 0.3and δ = 0.8, respectively.
Page 1:
TESIS DOCTORALBIO-INSPIRED DECISION
Page 7 and 8:
AgradecimientosSon muchas las veces
Page 9 and 10:
AbstractRobotics is an emergent fie
Page 11 and 12:
ResumenLa robótica es un área eme
Page 13 and 14:
ContentsAgradecimientosAbstractResu
Page 15 and 16:
5 The social robot Maggie and its d
Page 18 and 19:
9.4 Harm/interactions with Alvaro d
Page 20 and 21:
3.10 An overview of the net of syst
Page 23:
List of Algorithms6.1 Object Q-Lear
Page 26 and 27:
xxii
Page 28 and 29:
2 Chapter 1. IntroductionFigure 1.1
Page 30 and 31:
4 Chapter 1. Introductionautonomous
Page 32 and 33:
6 Chapter 1. IntroductionAs in othe
Page 34 and 35:
8 Chapter 1. Introductiondesired ou
Page 36 and 37:
10 Chapter 1. Introduction1.4 Overv
Page 38 and 39:
12 Chapter 1. Introduction
Page 40 and 41:
14 Chapter 2. Biological foundation
Page 42 and 43:
Page 44 and 45:
Page 46 and 47:
Page 48 and 49:
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
Page 56 and 57:
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
Page 64 and 65:
38 Chapter 3. State of the Artand b
Page 66 and 67:
40 Chapter 3. State of the Art(a) R
Page 68 and 69:
42 Chapter 3. State of the Artpatie
Page 70 and 71:
44 Chapter 3. State of the Art(a) i
Page 72 and 73:
46 Chapter 3. State of the Artrange
Page 74 and 75:
48 Chapter 3. State of the Artwell
Page 76 and 77:
50 Chapter 3. State of the Artthe a
Page 78 and 79:
52 Chapter 3. State of the ArtThe e
Page 80 and 81:
54 Chapter 3. State of the Arttask.
Page 82 and 83:
56 Chapter 3. State of the Artthe r
Page 84 and 85:
58 Chapter 3. State of the Artit is
Page 86 and 87:
60 Chapter 3. State of the Artthe r
Page 88 and 89:
62 Chapter 3. State of the Artnon-l
Page 90 and 91: 64 Chapter 3. State of the ArtTAME
Page 92 and 93: 66 Chapter 3. State of the ArtMinsk
Page 94 and 95: 68 Chapter 3. State of the Art
Page 96 and 97: 70 Chapter 4. The Decision Making S
Page 114 and 115: 88 Chapter 5. The social robot Magg
Page 126 and 127: 100 Chapter 5. The social robot Mag
Page 136 and 137: 110 Chapter 6. Learning to make dec
Page 154 and 155: 128 Chapter 7. Implementing the dec
Page 190 and 191:
164 Chapter 7. Implementing the dec
Page 192 and 193:
166 Chapter 8. Testing the experime
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
178 Chapter 9. Experimental Results
Page 206 and 207:
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
198 Chapter 10. Conclusions and Fut
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
208 Bibliography[8] M. A. Martínez
Page 236 and 237:
210 Bibliography[35] B. Hardy-Vall
Page 238 and 239:
212 Bibliography[63] J. LeDoux, “
Page 240 and 241:
214 Bibliography[90] C. Bartneck an
Page 242 and 243:
216 Bibliography[115] B. Graf, U. R
Page 244 and 245:
218 Bibliography[140] W. P. Lee, J.
Page 246 and 247:
220 Bibliography[166] C. Isbell, C.
Page 248 and 249:
222 Bibliography[190] M. A. Salichs
show all

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?