23.08.2015 Views

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

section <strong>of</strong> the module indicates that rule evaluation occurs in random order(other options are to evaluate in linear order and to evaluate all rules in eitherlinear or random order). In the example agent <strong>of</strong> Figure 1 there is only one rulethat is always applicable because the condition bel(true) always holds andthere is always an action enabled in a Blocks World environment. Note that arule is evaluated by evaluating its condition and the precondition <strong>of</strong> the (first,if there are more) action <strong>of</strong> the rule. A rule that is applicable generates a nonemptyset <strong>of</strong> options which are actions that may be performed by the agent.Only one <strong>of</strong> these actions is performed in a rule <strong>of</strong> the form if ... then ...and the action is randomly selected in that case. One might say that the programunderspecifies what the agent should do. This may be useful if a programmerdoes not know how to select or care for a unique action and may be exploited bya learning mechanism to optimise the behavior <strong>of</strong> the agent, which is exactly thefocus <strong>of</strong> this paper. Where more than one action is applicable, the agent mustdecide which one to choose and execute. In this work we are concerned withimproving this action selection based on the ongoing experiences <strong>of</strong> the agent.Implementing Adaptive Behaviours in GOALThere are two key aspects <strong>of</strong> the adaptive framework in Goal that we describenow, (i) the programming model that describes how adaptive behaviour modulescan be specified by the agent programmer using the language <strong>of</strong> Goal,and (ii) the underlying reinforcement learning implementation that makes it allpossible.Let us start with how adaptive behaviours can be specified in the programmingmodel <strong>of</strong> Goal. An important consideration for us when deciding whatthe programming interface should consist <strong>of</strong> was to keep as much <strong>of</strong> the machinelearning technologies insulated from the programmer as possible. The motivationfor this was to keep the programming model as simple and as close to the existingmodel as possible in order to allow easy uptake by existing Goal programmers.Our stance here has been that agent programmers are not experts in machinelearning and so they will be reluctant to try a new feature <strong>of</strong> the language if itrequired new expertise and significant changes to their programming style.The first design choice for us was how the knowledge representation aspect<strong>of</strong> machine learning should be combined in the agent programming model withoutsacrificing programming flexibility and avoiding significant overhead. Weachieve this by making the knowledge representation for learning to be the sameas the knowledge representation for the agent program. That is to say that the“state” in the reinforcement learning sense is the mental state <strong>of</strong> the agent thatcomprises <strong>of</strong> its beliefs and goals. All that is required is to provide a translationfunction that automatically maps the mental state <strong>of</strong> the agent to a suitablestate id (a number) used for reinforcement learning. The second decision was tomake learning a modular feature in line with the Goal philosophy <strong>of</strong> modularprogramming, to allow regular and adaptive behaviours to be easily combined.The result is a very easy to use programming model where learning can simplybe enabled as desired using a new order=adaptive option in the program section155

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!