13.07.2015 Views

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

116 Chapter 6. Learning to make <strong>de</strong>cisionsAlgorithm 6.1 Object Q-Learning algorithm1: procedure COMPUTE OBJECT Q-LEARNING2: Initialize all Q values to 13: repeat for each iterationRequire: s ← current stateRequire: a ← executed actionRequire: object i ← object the action is executed withRequire: s ′ ← new stateRequire: r ← reward4: collateral_effects ← 05: for all object j do6: if object j ≠ object i then ⊲ The collateral effects do not consi<strong>de</strong>r theobject that the action was executed with7: max_q_s ← max[Q obj j(s objj , a)]8: max_q_new_s ← max[Q obj j(s ′ obj j, a)]9: collateral_effects ← collateral_effects + (max_q_new_s −max_q_s)10: end if11: end for12: value_obj i _new_s ← max[Q(s ′ obj i, a)] + collateral_effects13: q ← Q(s obji , a sobji )14: new_q ← (1 − α) · q + α(r + δ · value_obj i _new_s)15: Q(s obji , a sobji ) ← new_q16: until learning ends17: end procedureScenario 1In this first scenario, the robot needs calm (i.e. relax is the dominant motivation), it isunplugged to the docking station, it is listening to music, the robot is close to the playerand there is not users around. Then, the robot <strong>de</strong>ci<strong>de</strong>s to stop the music player. The statetransitions are shown in Table 6.1. This action affects three elements; first, the dominantmotivation changes: after the player is turned off, there is not a new dominant motivationbecause the need of calm has been satisfied and the intensity of the other motivations isnot high enough; also the states of the music player and the music have changed too. Thisaction is related to the object music player but also the object music is affected. The valueof the collateral effects is calculated in Table 6.2.In this particular case, the corresponding Q value, Q playerrelax(near-on, stop), is updatedas follows in Equations (6.12) and (6.13).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!