TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

More documents

Recommendations

Info

112 Chapter 6. Learning to make decisionsthe robot. The set of the reduced external states, Sexternal red , is represented in Equation (6.5).S redexternal = {S obj1 , S obj2 , S obj3 , ...} (6.5)For example, following the example presented at the end of the previous section, the 10objects present in the world results in 10 × 16 = 160 external states, those ones related tothe objects. Therefore, the total number of utility values Q(s, a) would be greatly reduced.Finally, the total state of the robot in relation to each object i is defined as follows:s ǫ S i = S inner × S obji (6.6)where S i is the set of the reduced states in relation to the object i.Recalling the example, exposed in Section 6.2.1, where a robot is running out of battery,and considering the reduced state space just presented, the state of the robot is expressed inEquation (6.7).S = S inner × S external = S inner × {S obj1 , S obj2 , ...} =S dominant mot × {S person , S player , S charger , S music } =survival and {alone or far or plugged or listening}(6.7)Using this simplification, the robot learns what to do with every object for every innerstate. For example, the robot would learn what to do with the docking station when it needsto recharge, or what to do with the player when it is bored, and so on without consideringits relation to the rest of objects.Considering this simplification, the Equation (4.4) is adapted for the updating of theQ obj i(s, a) value of the state-action pairs for an inner state and an object i:Where:Q obj i(s, a) = (1 − α) · Q obj i(s, a) + α · (r+ γ · V obj i(s ′ ) ) (6.8)V obj i(s ′ ) = maxa∈A obji(Qobj i(s ′ , a) ) (6.9)The super-index obj i indicates that the learning process is made in relation to the objecti; therefore, s ∈ S i is the state of the robot in relation to the object i, A obji is the set ofthe actions related to the object i and s ′ ∈ S i is the new state in relation to the object i.Parameter r is the reinforcement received, γ is the discount factor and α is the learningrate.As a consequence of this simplification, the learned Q values, instead of being storedin a table of {total number of states × total number of actions} dimension, are stored for acertain inner state and for every object in a table of {number of states related to that object× number of actions related to that object} dimension.
6.2. Object Q-Learning 1136.2.3 Collateral effects and Object-Q learningThe simplification made in order to reduce the state space considers the objects in theenvironment as if they were independent. This assumption implies that the effects resultingfrom the execution of an action, in relation to a certain object, do not affect to the state of therobot in relation to the rest of objects. Let us give an example: if the robot decides to movetowards the music player, this action will not affect to the rest of objects. Nevertheless,if the robot was previously recharging its battery in the docking station, this action (to goto the music player), which is related to the object music player, has affected to its statein relation to the docking station. Moreover, if a person is nearby the robot, after it hasmoved towards the music player, now this person is not present anymore. As result, anaction (approaching the music player) related to a particular object (the music player) mayinfluence other items (the docking station and a person). This is exactly what happens inreal life: a person, who is close to water, goes for food, and the resulting state is that nowthe person is close to food but far from water. Therefore, the assumption of that objects areindependent among them is not totally true. The consideration of collateral effects in thelearning algorithm deals with this problem.The collateral effects are those effects produced by the robot in the rest of the objectswhen interacting with a certain object. Therefore, in order to take into account these collateraleffects, the Object Q-learning has to consider how the action with a particular objectaffects the rest of objects. Using this viewpoint, the Q values are still updated according toEquation (6.8) but, now, V obj i(s ′ ) is calculated according to Equation (6.10).(V obj i(s ′ ) = max Qobj i(s ′ , a) ) + ∑ ∆Q objmmax (6.10)a∈A objim≠iThis is the value of the object i in the new state s ′ considering the possible effects ofthe action a executed with the object i on the rest of objects. For this reason, the sum ofthe variations of the values of every other object is added to the value of the object i in thenew state, previously defined in Equation (6.9).These increments are calculated as follows in Equation (6.11).∆Q objmmax = maxa∈A objm(Qobj m(s ′ , a) ) − maxa∈A objm(Qobj m(s, a) ) (6.11)Each of these increments measures, for every object, the difference between the bestthe robot can do in the new state, and the best the robot could do in the previous state.Then, when the robot executes an action in relation to a certain object, the increment ordecrement of the value of the rest of objects is considered. In other words, it measures ifthe value of the new state is better or worse than the value of the previous state in relationto each object. This algorithm has been introduced in previous works [158, 207], where itwas successfully implemented in virtual agents.
Page 1:
TESIS DOCTORALBIO-INSPIRED DECISION
Page 7 and 8:
AgradecimientosSon muchas las veces
Page 9 and 10:
AbstractRobotics is an emergent fie
Page 11 and 12:
ResumenLa robótica es un área eme
Page 13 and 14:
ContentsAgradecimientosAbstractResu
Page 15 and 16:
5 The social robot Maggie and its d
Page 18 and 19:
9.4 Harm/interactions with Alvaro d
Page 20 and 21:
3.10 An overview of the net of syst
Page 23:
List of Algorithms6.1 Object Q-Lear
Page 26 and 27:
xxii
Page 28 and 29:
2 Chapter 1. IntroductionFigure 1.1
Page 30 and 31:
4 Chapter 1. Introductionautonomous
Page 32 and 33:
6 Chapter 1. IntroductionAs in othe
Page 34 and 35:
8 Chapter 1. Introductiondesired ou
Page 36 and 37:
10 Chapter 1. Introduction1.4 Overv
Page 38 and 39:
12 Chapter 1. Introduction
Page 40 and 41:
14 Chapter 2. Biological foundation
Page 42 and 43:
Page 44 and 45:
Page 46 and 47:
Page 48 and 49:
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
Page 56 and 57:
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
Page 64 and 65:
38 Chapter 3. State of the Artand b
Page 66 and 67:
40 Chapter 3. State of the Art(a) R
Page 68 and 69:
42 Chapter 3. State of the Artpatie
Page 70 and 71:
44 Chapter 3. State of the Art(a) i
Page 72 and 73:
46 Chapter 3. State of the Artrange
Page 74 and 75:
48 Chapter 3. State of the Artwell
Page 76 and 77:
50 Chapter 3. State of the Artthe a
Page 78 and 79:
52 Chapter 3. State of the ArtThe e
Page 80 and 81:
54 Chapter 3. State of the Arttask.
Page 82 and 83:
56 Chapter 3. State of the Artthe r
Page 84 and 85:
58 Chapter 3. State of the Artit is
Page 86 and 87:
60 Chapter 3. State of the Artthe r
Page 88 and 89: 62 Chapter 3. State of the Artnon-l
Page 90 and 91: 64 Chapter 3. State of the ArtTAME
Page 92 and 93: 66 Chapter 3. State of the ArtMinsk
Page 94 and 95: 68 Chapter 3. State of the Art
Page 96 and 97: 70 Chapter 4. The Decision Making S
Page 114 and 115: 88 Chapter 5. The social robot Magg
Page 126 and 127: 100 Chapter 5. The social robot Mag
Page 136 and 137: 110 Chapter 6. Learning to make dec
Page 154 and 155: 128 Chapter 7. Implementing the dec
Page 188 and 189:
162 Chapter 7. Implementing the dec
Page 190 and 191:
164 Chapter 7. Implementing the dec
Page 192 and 193:
166 Chapter 8. Testing the experime
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
178 Chapter 9. Experimental Results
Page 206 and 207:
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
198 Chapter 10. Conclusions and Fut
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
208 Bibliography[8] M. A. Martínez
Page 236 and 237:
210 Bibliography[35] B. Hardy-Vall
Page 238 and 239:
212 Bibliography[63] J. LeDoux, “
Page 240 and 241:
214 Bibliography[90] C. Bartneck an
Page 242 and 243:
216 Bibliography[115] B. Graf, U. R
Page 244 and 245:
218 Bibliography[140] W. P. Lee, J.
Page 246 and 247:
220 Bibliography[166] C. Isbell, C.
Page 248 and 249:
222 Bibliography[190] M. A. Salichs
show all

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

Create successful ePaper yourself

Delete template?

Save as template?