23.08.2015 Views

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

choosing actions is to find a good balance between exploiting current knowledgeto get the best reward known so far, and exploring new actions in the hope <strong>of</strong>finding better rewards. A simple way to achieve this is to select the best knownaction most <strong>of</strong> the time, but every once in a while choose a random action witha small probability, say ɛ. This strategy is well known as ɛ-greedy and is the onewe use in this work. In future work we plan to experiment with more advancedstrategies. For instance, in the so-called Boltzmann selection strategy, instead<strong>of</strong> picking actions randomly, weights are assigned to the available actions basedon their existing action-value estimates, so that actions that perform well havea higher chance <strong>of</strong> being selected in the exploration phase.In this study we use a Q-Learning implementation where the precise actionvaluefunction is maintained in memory. It should be noted here that this implementationdoes not scale well to large state spaces. Of course, we could use anapproximation <strong>of</strong> the action-value function, such as a neural network, to storethis in a compact manner. However, our focus here is not so much to use anefficient reinforcement learning technology as it is to see how such learning canbe integrated into agent programming in a seamless manner. For this reason, inthis version <strong>of</strong> the work, we have kept the basic Q-Learning implementation.3 Related WorkIn most languages for partial reinforcement learning programs, the programmerspecifies a program containing choice points [21]. Because <strong>of</strong> the underspecificationpresent in agent programming languages, there is no need to add such choicepoints as multiple options are generated automatically by the agent program itself.There is little existing work in integrating learning capabilities within agentprogramming languages. In PRS-like cognitive architectures [2, 4, 22, 3] that arebased in the BDI tradition, standard operating knowledge is programmed asabstract recipes or plans, <strong>of</strong>ten in a hierarchical manner. Plans whose preconditionshold in any runtime situation are considered applicable in that situationand may be chosen for execution. While such frameworks do not typically supportlearning, there has been recent work in this area. For instance, in [23] thelearning process that decides when and how learning should proceed, is itself describedwithin plans that can be invoked in the usual manner. Our own previousinvestigations in this area include [24–26] where decision tree learning was usedto improve hierarchical plan selection in the JACK [3] agent programming language.That work bears some resemblance here in that the aim was to improvechoice <strong>of</strong> instantiated plans as we do for bound action options in this study.In [17] we integrated Goal and reinforcement learning as we do in this paper,with the key difference that now (i) a learning primitive has been added to theGoal language to explicitly support adaptive behaviours, and (ii) a much richerstate representation is used, i.e., the mental state <strong>of</strong> the agent.Among other rule-based systems, ACT-R [27, 28] is a cognitive architectureprimarily concerned with modelling human behaviour, where programming consists<strong>of</strong> writing production rules [29] that are condition-action pairs to describepossible responses to various situations. Learning in ACT-R consists <strong>of</strong> forming152

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!