13.07.2015 Views

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

114 Chapter 6. Learning to make <strong>de</strong>cisionsConsi<strong>de</strong>ring the example presented at the beginning of this section, if the objects therobot can interact with are limited to a music player and the docking station, the currentstates related to these objects are far from the player and plugged to the charger. Once theaction go to the player is executed, then the new states are close to the player and unpluggedfrom the docking station. Therefore, the Object Q-Learning is applied as follows 1 . FromEquations (6.10) and (6.11), the Q value is computed according to the next equation:where V player (close) is:Q player (far, go to) == (1 − α) · Q player (far, go to) + α · (r+ γ · V player (close) )V player (close) = maxa∈A player(Q player (close, a) ) +∑obj m≠playerand a can be any action with the player. The collateral effects are:∑obj m≠player∆Q objmmax = ∆Qcharger max =∆Q objmmax(= max Q charger (unplugged, a) ) (− max Q charger (plugged, a) )a∈A charger a∈A chargerwhere a is any action related to charger.6.2.4 The algorithmOnce the i<strong>de</strong>as of the algorithm have been stated, the algorithm itself has to be analyzed. Ina RL framework, an agent in a state executes an action, it transits to a new state, and a rewardis obtained. In an Object Q-Learning framework, the state is <strong>de</strong>termined in relation tothe objects and the potential actions are restricted by the state: an agent is in a state relatedto a particular object i (s obji ) and it executes an action with this object (a obji ); this actioncan provoke a change in the state related to this object (s ′ obj i) and a reward (r); in addition,this action can also provoke changes in the state related to other objects (s objj , ∀j ≠ i),which have been called the collateral effect. All these elements are presented in Figure 6.1;the collateral effects are represented by dashed arrows.The algorithm updates the Q values after an action is executed. Then, these values arerefreshed according to the reward obtained, the anterior and new states, and the prior Q1 In or<strong>de</strong>r to keep the example simple, the state will be formed just by the external state, and the internalstate will not be consi<strong>de</strong>red.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!