23.08.2015 Views

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

very much tied to the problem at hand and does not necessarily generalise toother related problems. For instance, the learning from four-block problems doesnot generalise to six-block problems and the agent programmer should be awarethat one cannot simply plug-and-play learnt values between problems. While thisfeature may be desirable in many domains, it is nevertheless a shortcoming thatcomes with the ease <strong>of</strong> use <strong>of</strong> the programming model that completely insulatesthe programmer from the knowledge representation used for learning.6 Discussion and ConclusionIn this paper we have shown how the mental state representation <strong>of</strong> an agentprogram may be exploited to significantly increase the effectiveness <strong>of</strong> the programthrough ongoing learning. The novelty is that this performance improvementcomes almost for free since the programming model remains relativelyunchanged. In particular, we presented an enhancement to the Goal agent programminglanguage that allows adaptive behaviours to be easily programmed.The new language primitive is implemented using a Q-Learning mechanism underthe hood, and allows action choices resulting from programmed rules to beimproved over time based on the ongoing experience <strong>of</strong> the agent. A key feature<strong>of</strong> this enhancement is that it can be readily used by agent programmers who arenon-experts in machine learning, since the learning feature has little impact onthe programming model. We demonstrated the usability <strong>of</strong> the framework in theBlocks World domain and analysed the programmer’s role in balancing betweenfixed and flexible behaviour using three sample solutions for the problem.The results in Section 5, however, also indicate that scalability (i.e. managingthe size <strong>of</strong> the state space) remains an important challenge. The main tool aprogrammer currently has in our approach to integrating learning into Goal toreduce the state space is to add and exploit knowledge about the environment inthe agent program. Even though the use <strong>of</strong> domain knowledge may reduce thesize <strong>of</strong> the state space, which corresponds one-to-one with the number <strong>of</strong> beliefsand goals <strong>of</strong> the agent, the state space still quickly becomes very large in theBlocks World environment with an increasing number <strong>of</strong> blocks [33].We have used and integrated a standard Q-Learning approach to reinforcementlearning. It is well-known that such an approach is unable to handle allbut the smallest state spaces [14]. Our approach, however, does not depend onthis particular choice <strong>of</strong> learning technique that has been used here mainly todemonstrate the viability <strong>of</strong> the approach. In order to handle bigger state spacesit is clear that we need some abstraction technique.The ease <strong>of</strong> use <strong>of</strong> the new adaptive functionality in Goal is appealing froma programming point <strong>of</strong> view as shown in this study. The downside is that aprogrammer may waste valuable time in trying to improve performance where itis simply not possible within the constraints <strong>of</strong> the learning framework and themental state representation used. For example, in a maze world, the only wayto distinguish between two T-junctions that “look” identical is to trace back thehistory <strong>of</strong> actions that led to the junctions. <strong>Here</strong> the underlying reinforcementlearning framework is inadequate for learning if the mental state only consists <strong>of</strong>161

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!