Views
3 years ago

A State-Space Representation Model and Learning Algorithm for ...

A State-Space Representation Model and Learning Algorithm for ...

Cartesian product of the

Cartesian product of the finite state space and action space ofthe MDP:H: ( S× A) × ( S× A) →S , (10)where S = { si| i = 1,2,..., N},N∈Ndenotes the Markov statespace, and A = ∪ sAs ( ),i∈Si∀si∈Sstands for a finite actionspace. This mapping generates an indexed family ofsubsets, S s i, for each state s i∈S , defined as PredictiveRepresentation Nodes (PRNs). Each PRN is constituted by aiset of POD states, s ∈S ,msiiSs = { s| , 1,2,..., , | | }i mi m= N N = S ∈N , (11)Each POD state s i ∈S messentially represents a Markov statetransition from s i∈S to s m∈S . PRNs partition the PODdomain insofar as the POD underlying structure captures thestate transitions in the Markov domain. Consequently, a PRN isdefined asNi im m i m ∑ m iµiµ ( si) ∈A( si)m=1S s = { s | s ≡ s → s , p ( s | s , ( s )) = 1, N = | S |},i∀si, sm ∈S , ∀µ( si) ∈A( si),(12)the union of which defines the POD domainS= ∪ i s i, withs∈SS (13)s ims i∩ S =∅.(14)Ssim∈s is iEach PRN, S , corresponds to a Markov state, si∈S , andportrays all possible transitions occurring from this state s i tothe other states s m∈S . PRNs, constituting the fundamentalaspect of the POD state representation, provide an assessmentof the Markov state transitions along with the actions executedat each state. This assessment aims to establish a necessaryembedded property of the new state representation so as toconsider the potential transitions that can occur in subsequentdecision epochs. The assessment is expressed by means of theiPRN value, Rs ( s| ( ))i mµ si, which accounts for the maximumaverage expected reward that can be achieved by transitionsoccurring inside a PRN. Consequently, the PRN value isdefined asN⎛⎞⎜∑p( sm | si, µ ( si)) ⋅ R( sm | si, µ ( si))⎟im=1Rs ( s| ( )) max ,i mµ si= ⎜⎟µ ( si) ∈A⎜N⎟⎜⎟⎝⎠i∀s m∈S, ∀si, sm ∈S, ∀µ( si) ∈ A( si),and N = | S |. (15)The PRN value is exploited by POD state representation asan evaluation metric to estimate the subsequent Markov statetransitions. The estimation property is founded on theassessment of POD states by means of an expected evaluationi ifunction, R ( s, µ ( s )), defined asPRN m i{i iRPRN ( sm, µ ( si )) = p( sm | si , µ ( si )) ⋅ R( sm | si , µ ( si)) +(16)m+ Rs ( s| ( ))},m jµ smi m∀sm, sj∈S, ∀si, sm ∈S, ∀µ ( si) ∈A( si), ∀µ( sm) ∈A( sm).Consequently, employing the POD evaluation function throughiEq. (16), each POD state, sm∈S s , is comprised of an overallireward corresponding to: (a) the expected reward of transitingfrom state s i to s m (implying also the transition from the PRNSs itoSs m); and (b) the maximum average expected rewardwhen transiting from s m to any other Markov state (transitionoccurring into S s m).While the system interacts with its environment, the PODmodel learns the system dynamics in terms of the Markov statetransitions. The POD state representation attempts to provide aprocess in realizing the sequences of state transitions thatoccurred in the Markov domain, as infused in PRNs. Thedifferent sequences of the Markov state transitions are capturedby the POD states and evaluated through the expectedevaluation functions given in Eq. (16). Consequently, thehighest value of the expected evaluation function at each PODstate essentially estimates the subsequent Markov statetransitions with respect to the actions taken. As the process isstochastic, however, it is still necessary for the real-timelearning method to build a decision-making mechanism of howto select actions.The learning performance is closely related to theexploration-exploitation strategy of the action space. Moreprecisely, the decision maker has to exploit what is alreadyknown regarding the correlation involving the admissible stateactionpairs that maximize the rewards, and also to explorethose actions that have not yet been tried for these pairs toassess whether these actions may result in higher rewards. Abalance between an exhaustive exploration of the environmentand the exploitation of the learned policy is fundamental toreach nearly optimal solutions in few decision epochs and,thus, enhancing the learning performance. This explorationexploitationdilemma has been extensively reported in theliterature. Iwata et al. [25] proposed a model-based learningmethod extending Q-learning and introducing two separatedfunctions based on statistics and on information by applyingexploration and exploitation strategies. Ishii et al. [26]developed a model-based reinforcement learning methodutilizing a balance parameter, controlled through variation ofaction rewards and perception of environmental change. Chan-Geon et al. [27] proposed an exploration-exploitation policy inQ-learning consisting of an auxiliary Markov process and theoriginal Markov process. Miyazaki et al. [28] developed aunified learning system realizing the tradeoff betweenexploration and exploitation. Hernandez-Aguirre et al. [29]analyzed the problem of exploration-exploitation in the contextof the probably approximately correct framework and studiedwhether it is possible to give bounds on the complexity of the4 Copyright © 2007 by ASME

Algorithms for Effective Contact Modeling in Robotic Simulation
State Space Models for Branching Processes and ... - SAMSI
Space Modeling and Simulation Focus Area Collaborative Team
Learning hierarchical representations of natural images - Cognitive ...
Learning Motor Skills: From Algorithms to Robot Experiments
Geometric-Edge Random Graph Model for Image Representation
particle filters and state space models of wildlife - CREEM ...
NANO-D Algorithms for Modeling and Simulation of Nanosystems
Diagonalization algorithms in real space methods for electronic ...
Applications of state space models in finance
Performance evaluation of learning algorithms - Mohak Shah
mixed model representation of state space models - Institute of ...
Algorithms and Representations for Reinforcement Learning
Smoothing Algorithms for State-Space Models
Dictionary Learning Algorithms for Sparse Representation
Representation of an ARMA(p,q) as state space model
Representation Schemes and Learning Algorithms for Predictive ...
Comparison of Two Learning Algorithms in Modelling the ...
Differentiating Qualitative Representations into Learning Spaces
A spectral algorithm for learning hidden Markov models - Researcher
statistical algorithms for models in state space - Feweb
Provably Effective Algorithms for Learning Visual Data Representations
Expert representation of design repository space - Learning ...
State Space Representations AMETIST DELIVERABLE 2.3.a
Statistical algorithms for models in state space using SsfPack 2.2
A Model of Building Representations for Category Learning
Statistical algorithms for models in state space using SsfPack 2.2
A Spectral Algorithm for Learning Hidden Markov Models
Integrating Reinforcement Learning with Models of Representation ...