13.07.2015 Views

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

TESIS DOCTORAL - Robotics Lab - Universidad Carlos III de Madrid

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

112 Chapter 6. Learning to make <strong>de</strong>cisionsthe robot. The set of the reduced external states, Sexternal red , is represented in Equation (6.5).S re<strong>de</strong>xternal = {S obj1 , S obj2 , S obj3 , ...} (6.5)For example, following the example presented at the end of the previous section, the 10objects present in the world results in 10 × 16 = 160 external states, those ones related tothe objects. Therefore, the total number of utility values Q(s, a) would be greatly reduced.Finally, the total state of the robot in relation to each object i is <strong>de</strong>fined as follows:s ǫ S i = S inner × S obji (6.6)where S i is the set of the reduced states in relation to the object i.Recalling the example, exposed in Section 6.2.1, where a robot is running out of battery,and consi<strong>de</strong>ring the reduced state space just presented, the state of the robot is expressed inEquation (6.7).S = S inner × S external = S inner × {S obj1 , S obj2 , ...} =S dominant mot × {S person , S player , S charger , S music } =survival and {alone or far or plugged or listening}(6.7)Using this simplification, the robot learns what to do with every object for every innerstate. For example, the robot would learn what to do with the docking station when it needsto recharge, or what to do with the player when it is bored, and so on without consi<strong>de</strong>ringits relation to the rest of objects.Consi<strong>de</strong>ring this simplification, the Equation (4.4) is adapted for the updating of theQ obj i(s, a) value of the state-action pairs for an inner state and an object i:Where:Q obj i(s, a) = (1 − α) · Q obj i(s, a) + α · (r+ γ · V obj i(s ′ ) ) (6.8)V obj i(s ′ ) = maxa∈A obji(Qobj i(s ′ , a) ) (6.9)The super-in<strong>de</strong>x obj i indicates that the learning process is ma<strong>de</strong> in relation to the objecti; therefore, s ∈ S i is the state of the robot in relation to the object i, A obji is the set ofthe actions related to the object i and s ′ ∈ S i is the new state in relation to the object i.Parameter r is the reinforcement received, γ is the discount factor and α is the learningrate.As a consequence of this simplification, the learned Q values, instead of being storedin a table of {total number of states × total number of actions} dimension, are stored for acertain inner state and for every object in a table of {number of states related to that object× number of actions related to that object} dimension.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!