13.07.2015 Views

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.2. Object Q-Learning 117Table 6.1: State transitions due to the action stop music in Scenario 1anterior state (s t ) new state (s t+1 )dominant motivation relax nonedocking station unplugged unpluggedmusic player * near-on near-offmusic listening non-listeninguser absent absentQ playerrelax(near − on, stop) =(1 − α) · Q playerrelax (near-on, stop) + α · (r+ γ · Vnone player (near-off) ) (6.12)(Vnoneplayer (near-off) = max Qplayernone (near-off, a)) +a∈A player∑obj m≠player∆Q objmmax (6.13)Table 6.2: Collateral effects due to the action stop music in Scenario 1(Object m max a∈Aobjm Qobj m(s t+1 ,a) ) (max a∈Aobjm Qobj m(s t ,a) ) ∆Q objmmaxdocking station Q stationnone (unplugged, charge) = −1, 17895 Q station (unplugged, charge) = 1 -2,17895music∑user∆Q objmobj m≠playerrelaxQ musicrelaxQ musicnone (non-listening, −) = −(listening, dance) = 1 -1Q usernone (absent, −) = −Quser relax (absent, −) = − −max -3,17895The reward and the rest of the parameters which are required for updating the Q value,as well as the new Q value, are presented in Table 6.3. Since this is the first time this actionis executed in the state s t , its Q value corresponds to the initial value of 1. From the newstate (s t+1 ), the best thing to do with the music player is to turn it on, which has a calculatedvalue of 1, 154.Q playerTable 6.3: New Q value for Scenario 1relax (near-on,stop) reward V player ((s t+1 ) = Vnone player(near-off)new Q playermaxrelax (near-on,stop)Qplayernone (near-off,a)) Coll.Effectsa∈A player1 52,5399 Q playernone (near-off, play) = 1, 154 -3,17895 15,975982In this scenario, the most influent parameter is the reward. Stopping the music playerresults in the satisfaction of the drive calm. Therefore, the relax motivation is consi<strong>de</strong>rablyreduced and it ceases to be the dominant one. This is the reason of the high value

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!