very much tied to the problem at hand and does not necessarily generalise toother related problems. For instance, the learning from four-block problems doesnot generalise to six-block problems and the agent programmer should be awarethat one cannot simply plug-and-play learnt values between problems. While thisfeature may be desirable in many domains, it is nevertheless a shortcoming thatcomes with the ease <strong>of</strong> use <strong>of</strong> the programming model that completely insulatesthe programmer from the knowledge representation used for learning.6 Discussion and ConclusionIn this paper we have shown how the mental state representation <strong>of</strong> an agentprogram may be exploited to significantly increase the effectiveness <strong>of</strong> the programthrough ongoing learning. The novelty is that this performance improvementcomes almost for free since the programming model remains relativelyunchanged. In particular, we presented an enhancement to the Goal agent programminglanguage that allows adaptive behaviours to be easily programmed.The new language primitive is implemented using a Q-Learning mechanism underthe hood, and allows action choices resulting from programmed rules to beimproved over time based on the ongoing experience <strong>of</strong> the agent. A key feature<strong>of</strong> this enhancement is that it can be readily used by agent programmers who arenon-experts in machine learning, since the learning feature has little impact onthe programming model. We demonstrated the usability <strong>of</strong> the framework in theBlocks World domain and analysed the programmer’s role in balancing betweenfixed and flexible behaviour using three sample solutions for the problem.The results in Section 5, however, also indicate that scalability (i.e. managingthe size <strong>of</strong> the state space) remains an important challenge. The main tool aprogrammer currently has in our approach to integrating learning into Goal toreduce the state space is to add and exploit knowledge about the environment inthe agent program. Even though the use <strong>of</strong> domain knowledge may reduce thesize <strong>of</strong> the state space, which corresponds one-to-one with the number <strong>of</strong> beliefsand goals <strong>of</strong> the agent, the state space still quickly becomes very large in theBlocks World environment with an increasing number <strong>of</strong> blocks [33].We have used and integrated a standard Q-Learning approach to reinforcementlearning. It is well-known that such an approach is unable to handle allbut the smallest state spaces [14]. Our approach, however, does not depend onthis particular choice <strong>of</strong> learning technique that has been used here mainly todemonstrate the viability <strong>of</strong> the approach. In order to handle bigger state spacesit is clear that we need some abstraction technique.The ease <strong>of</strong> use <strong>of</strong> the new adaptive functionality in Goal is appealing froma programming point <strong>of</strong> view as shown in this study. The downside is that aprogrammer may waste valuable time in trying to improve performance where itis simply not possible within the constraints <strong>of</strong> the learning framework and themental state representation used. For example, in a maze world, the only wayto distinguish between two T-junctions that “look” identical is to trace back thehistory <strong>of</strong> actions that led to the junctions. <strong>Here</strong> the underlying reinforcementlearning framework is inadequate for learning if the mental state only consists <strong>of</strong>161
the current percepts <strong>of</strong> the agent. Keeping the history in the mental state wouldhelp but will make the learning impractical even for simple problems. This drawbackalso highlights the need for future work to better understand how aware aprogrammer needs to be <strong>of</strong> the learning model. It would be useful in this contextto develop design patterns that serve as guidelines for implementing adaptive behavioursin typical scenarios. Another avenue for future work is in deciding whichmental state atoms are more relevant than others, in order to improve learningtimes in large state spaces. One option is to automatically learn such useful“features” <strong>of</strong> the agent’s mental state using regularization techniques [35].Acknowledgments. This research is supported by the 2011 Endeavour ResearchFellowship program <strong>of</strong> the Australian government.References1. Rao, A., Georgeff, M.: Modeling rational agents within a BDI-architecture. In: InternationalConference on Principles <strong>of</strong> Knowledge Representation and Reasoning(KR), Morgan Kaufmann (1991) 473–4842. Rao, A.: <strong>Agents</strong>peak(l): Bdi agents speak out in a logical computable language.In: <strong>Agents</strong> Breaking Away. Volume 1038 <strong>of</strong> Lecture Notes in Computer Science.Springer (1996) 42–553. Busetta, P., Rönnquist, R., Hodgson, A., Lucas, A.: JACK intelligent agents:Components for intelligent agents in Java. AgentLink Newsletter 2 (January 1999)2–5 Agent Oriented S<strong>of</strong>tware Pty. Ltd.4. Bordini, R., Hübner, J., Wooldridge, M.: Programming multi-agent systems inAgentSpeak using Jason. Wiley-Interscience (2007)5. Pokahr, A., Braubach, L., Lamersdorf, W.: JADEX: Implementing a BDIinfrastructurefor JADE agents. EXP - in search <strong>of</strong> innovation (Special Issueon JADE) 3(3) (September 2003) 76–856. Sardina, S., Padgham, L.: A BDI agent programming language with failure recovery,declarative goals, and planning. Autonomous <strong>Agents</strong> and Multi-Agent Systems23(1) (2010) 18–707. Hindriks, K., Boer, F.D., Hoek, W.V.D., Meyer, J.: Agent programming in 3APL.Autonomous <strong>Agents</strong> and Multi-Agent Systems 2(4) (1999) 357–4018. Dastani, M.: 2APL: A practical agent programming language. Autonomous <strong>Agents</strong>and Multi-Agent Systems 16(3) (June 2008) 214–2489. Hindriks, K.: Programming Rational <strong>Agents</strong> in GOAL. Multi-Agent Tools: Languages,Platforms and Applications (2009) 119–15710. Hindriks, K.V., van Riemsdijk, B., Behrens, T.M., Korstanje, R., Kraayenbrink, N.,Pasman, W., de Rijk, L.: UNREAL GOAL bots - conceptual design <strong>of</strong> a reusableinterface. In: <strong>Agents</strong> for Games and Simulations II. (2011) 1–1811. Hindriks, K.V., Neerincx, M.A., Vink, M.: The iCat as a natural interaction partner:Playing go fish with a robot. In: Autonomous Robots and Multi-Agent SystemsWorkshop, Taipei, Taiwan (2011)12. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MITPress (1998)13. Rao, A., Georgeff, M.: BDI agents: From theory to practice. In: Proceedings <strong>of</strong>the first international conference on multi-agent systems (ICMAS), San Francisco(1995) 312–319162
- Page 2 and 3:
Proceedings of the Tenth Internatio
- Page 4 and 5:
OrganisationOrganising CommitteeMeh
- Page 6:
Table of ContentseJason: an impleme
- Page 10 and 11:
in Sect. 3 the translation of the J
- Page 12 and 13:
init_count(0).max_count(2000).(a)(b
- Page 14 and 15:
For instance, a failure in the ERES
- Page 16 and 17:
{plan, fun start_count_trigger/1,fu
- Page 18 and 19:
single parameter, an Erlang record
- Page 20 and 21:
1. Belief annotations. Even though
- Page 22 and 23:
decisions taken during the design a
- Page 24 and 25:
Conceptual Integration of Agents wi
- Page 26 and 27:
Fig. 2. Active component structurep
- Page 28 and 29:
the service provider component. As
- Page 30 and 31:
Fig. 4. Web Service Invocationretri
- Page 32 and 33:
01: public interface IBankingServic
- Page 34 and 35:
tate them in the same way as in the
- Page 36 and 37:
01: public interface IChartService
- Page 38 and 39:
implementations being available for
- Page 41:
deliberative behavior in BDI archit
- Page 44 and 45:
layer modules (i.e. nodes) can be d
- Page 46 and 47:
different methods to choose the cur
- Page 48 and 49:
also a single scheduler module, imp
- Page 50 and 51:
andom choice (OR), conditional choi
- Page 52 and 53:
- Dealing with conflicts based on p
- Page 54 and 55:
5. Brooks, R. A. (1991) Intelligenc
- Page 56 and 57:
An Agent-Based Cognitive Robot Arch
- Page 58 and 59:
It has been argued that building ro
- Page 60 and 61:
EnvironmentHardwareLocal SoftwareC+
- Page 62 and 63:
a cognitive layer can connect as a
- Page 64 and 65:
can reliably be differentiated and
- Page 66 and 67:
4 ExperimentTo evaluate the feasibi
- Page 68 and 69:
learn or gain knowledge from experi
- Page 70 and 71:
A Programming Framework for Multi-A
- Page 72 and 73:
exchange and storage of tuples (key
- Page 74 and 75:
Although some success [13] [14] hav
- Page 76 and 77:
as well as important non-functional
- Page 78 and 79:
component plans have been instantia
- Page 80 and 81:
A in the example) can evaluate all
- Page 83 and 84:
1. robot-1 issues a Localization(ro
- Page 85 and 86:
ACKNOWLEDGMENTThis work has been su
- Page 87 and 88:
The code was analysed both objectiv
- Page 89 and 90:
a conversation is following. Additi
- Page 91 and 92:
the context of a communication-heav
- Page 93 and 94:
Table 1. Core Agent ProtocolsAgent
- Page 95 and 96:
statistically significant using an
- Page 97 and 98:
to the conversation and has a perfo
- Page 99 and 100:
principal reasons. Firstly, it is a
- Page 101 and 102:
2. Muldoon, C., O’Hare, G.M.P., C
- Page 103 and 104:
In the following section we will at
- Page 105 and 106:
DevelopmentProductionHuman Readabil
- Page 107 and 108:
will then evaluate this new format
- Page 109 and 110:
encoder, it is first checked if the
- Page 111 and 112: nents themselves. However, since th
- Page 113 and 114: optimized for this format feature s
- Page 115 and 116: Java serialization and Jadex Binary
- Page 117 and 118: 10. P. Hoffman and F. Yergeau, “U
- Page 119 and 120: Caching the results of previous que
- Page 121 and 122: querying an agent’s beliefs and g
- Page 123 and 124: or relative performance of each pla
- Page 125 and 126: were run for 1.5 minutes; 1.5 minut
- Page 127 and 128: Size N K n p c qry U c upd Update c
- Page 129 and 130: epresentation. The cache simply act
- Page 131 and 132: 6 ConclusionWe presented an abstrac
- Page 133 and 134: Typing Multi-Agent Programs in simp
- Page 135 and 136: 1 // agent ag02 iterations (" zero
- Page 137 and 138: 3.1 simpAL OverviewThe main inspira
- Page 139 and 140: 3.2 Typing Agents with Tasks and Ro
- Page 141 and 142: Defining Agent Scripts in simpAL (F
- Page 143 and 144: that sends a message to the receive
- Page 145 and 146: * error: wrong type for the param v
- Page 147 and 148: Given an organization model, it is
- Page 149 and 150: Learning to Improve Agent Behaviour
- Page 151 and 152: 2.1 Agent Programming LanguagesAgen
- Page 153 and 154: choosing actions is to find a good
- Page 155 and 156: 1 init module {2 knowledge{3 block(
- Page 157 and 158: of a module. For example, to change
- Page 159 and 160: if bel(on(X,Y), clear(X)), a-goal(c
- Page 161: mance. Figure 2d shows the same A f
- Page 165: Author IndexAbdel-Naby, S., 69Alelc