5 Experiments<strong>Here</strong> we describe the Blocks World domain that we used as a testbed for ourexperiments, and then the three different programs to solve it. We analyse theresults quantitatively in terms <strong>of</strong> the average number <strong>of</strong> steps taken by theagent to achieve its goal, as well as qualitatively in terms <strong>of</strong> how the design <strong>of</strong>the program impacts learning performance.We have chosen the Blocks World domain for our experiments for severalreasons. First, the domain is simple to understand and programming strategiesare easy to describe and compare at a conceptual level. Second, despite itssimplicity, finding optimal solutions in this domain is known to be an NP-hardproblem [34]. Finally, decisions in this domain <strong>of</strong>ten involve choosing betweenseveral options that could potentially be optimised using learning.There are various ways <strong>of</strong> programming a strategy for solving the BlocksWorld. For example, one way would be to dismantle all blocks onto the tableone by one, and then stack them into the desired configuration from there. Thisis in fact a reasonable “baseline” strategy because it is easy to see that theupper bound for the number <strong>of</strong> steps needed to solve a problem with n blocks is2(n − 1) which is the case when one must dismantle a single tower (which takesn − 1 moves for a tower <strong>of</strong> height n) to construct a different single tower (thattakes another n − 1 moves). The average number <strong>of</strong> steps for this algorithmis less intuitive but has been shown to be 2(n − √ n) [33]. For this work, wewill compare three other solutions to the problem, and see how they compareamongst themselves and against this baseline strategy.Program A A very simple strategy for solving the Blocks World is to randomlyselect some block that is clear and move it to some randomly chosen placeon top <strong>of</strong> another object. Effectively, this strategy tries to achieve the finalconfiguration by randomly changing the current configuration for as long asneeded until it eventually stumbles upon the solution. This strategy is givenby the program listing in Figure 1, and is contained in the following codesegment:main module { program[order=random] {if bel(true) then move(X,Y).}}This is certainly not the most effective way to solve the problem, and whileit works reasonably well for small problems <strong>of</strong> two to four blocks, it quicklybecomes unusable beyond six blocks. Nevertheless it is useful for this studysince we are interested in improving action selection using learning, and onewould imagine there is a lot <strong>of</strong> room for improvment in this strategy.Program B An improvement on the random strategy is this actual BlocksWorld program written in Goal by an agent programmer:main module { program[order=random] {if bel(on(X,Y), clear(X), clear(Z)), a-goal(on(X,Z)) then move(X,Z).if bel(on(X,Y), not(clear(X))), a-goal(on(X,Z)) then adopt(clear(X)).if a-goal(on(X,Z)), bel(on(X,Y), not(clear(Z))) then adopt(clear(Z)).157
if bel(on(X,Y), clear(X)), a-goal(clear(Y)) then move(X,table).if bel(on(X,Y), not(clear(X))), a-goal(clear(Y)) then adopt(clear(X)).}}This strategy uses the following line <strong>of</strong> thought: If the agent has a goal tohave some block X on top <strong>of</strong> Z, then move X onto Z if possible. If not possiblebecause X cannot be moved, then clear whatever block is obstructing X. Onthe other hand, if it is Z that is blocked then clear it first. Finally, repeatedlyclear blocks that are obstructing other blocks that are to be cleared.Program C A more sophisticated solution that comes bundled with the Goaldistribution uses a higher level notion <strong>of</strong> misplaced blocks to decide if a blockshould be moved. To do this it provides a recursive definition <strong>of</strong> a Tower.Then a block is considered misplaced if the agent still has a goal to have atower with block X on top. Given these definitions, the strategy is relativelysimple and uses only two rules. The idea is to either move a misplaced blockonto the table, or move a block onto another block if the move is constructive,i.e., results in a desired tower configuration.knowledge{...tower([X]) :- on(X, table).tower([X, Y| T]) :- on(X, Y), tower([Y| T]).}program[order=linear] {#define misplaced(X) a-goal(tower([X| T])).#define constructiveMove(X,Y) a-goal(tower([X, Y| T])), bel(tower([Y| T])).if constructiveMove(X, Y) then move(X, Y).if misplaced(X) then move(X, table).}We conducted several experiments with the three example programs A, B,and C, for problems with upto 10 blocks. Each run <strong>of</strong> the experiment consisted<strong>of</strong> a series <strong>of</strong> randomly generated problems that were solved using the programfirst in its original form and then using adaptive ordering (i.e., by substituting[order=adaptive] in the program module options). Since problems are randomlygenerated and the number <strong>of</strong> moves required to solve them can varysignificantly, we used a moving average <strong>of</strong> 20 results over the series <strong>of</strong> generatedproblems to get the average number <strong>of</strong> steps for any problem <strong>of</strong> a given size.Finally, we ran 20 repeats <strong>of</strong> each experiment and report the average number <strong>of</strong>moves taken to achieve the fixed goal <strong>of</strong> building a given tower configuration.For all <strong>of</strong> our experiments, we used the following parameters’ settings. Theɛ value for the action selection strategy was set to always explore 10% <strong>of</strong> thetime. For Q-Learning we set the learning rate α to 1.0 and the discount factorγ to 0.9. It should be noted that these settings will obviously impact learning,and these default values may not work as well in other domains. An option inthe future might be to setup learning “pr<strong>of</strong>iles” that the programmer can selectbetween based on some basic usage guidelines.158
- Page 2 and 3:
Proceedings of the Tenth Internatio
- Page 4 and 5:
OrganisationOrganising CommitteeMeh
- Page 6:
Table of ContentseJason: an impleme
- Page 10 and 11:
in Sect. 3 the translation of the J
- Page 12 and 13:
init_count(0).max_count(2000).(a)(b
- Page 14 and 15:
For instance, a failure in the ERES
- Page 16 and 17:
{plan, fun start_count_trigger/1,fu
- Page 18 and 19:
single parameter, an Erlang record
- Page 20 and 21:
1. Belief annotations. Even though
- Page 22 and 23:
decisions taken during the design a
- Page 24 and 25:
Conceptual Integration of Agents wi
- Page 26 and 27:
Fig. 2. Active component structurep
- Page 28 and 29:
the service provider component. As
- Page 30 and 31:
Fig. 4. Web Service Invocationretri
- Page 32 and 33:
01: public interface IBankingServic
- Page 34 and 35:
tate them in the same way as in the
- Page 36 and 37:
01: public interface IChartService
- Page 38 and 39:
implementations being available for
- Page 41:
deliberative behavior in BDI archit
- Page 44 and 45:
layer modules (i.e. nodes) can be d
- Page 46 and 47:
different methods to choose the cur
- Page 48 and 49:
also a single scheduler module, imp
- Page 50 and 51:
andom choice (OR), conditional choi
- Page 52 and 53:
- Dealing with conflicts based on p
- Page 54 and 55:
5. Brooks, R. A. (1991) Intelligenc
- Page 56 and 57:
An Agent-Based Cognitive Robot Arch
- Page 58 and 59:
It has been argued that building ro
- Page 60 and 61:
EnvironmentHardwareLocal SoftwareC+
- Page 62 and 63:
a cognitive layer can connect as a
- Page 64 and 65:
can reliably be differentiated and
- Page 66 and 67:
4 ExperimentTo evaluate the feasibi
- Page 68 and 69:
learn or gain knowledge from experi
- Page 70 and 71:
A Programming Framework for Multi-A
- Page 72 and 73:
exchange and storage of tuples (key
- Page 74 and 75:
Although some success [13] [14] hav
- Page 76 and 77:
as well as important non-functional
- Page 78 and 79:
component plans have been instantia
- Page 80 and 81:
A in the example) can evaluate all
- Page 83 and 84:
1. robot-1 issues a Localization(ro
- Page 85 and 86:
ACKNOWLEDGMENTThis work has been su
- Page 87 and 88:
The code was analysed both objectiv
- Page 89 and 90:
a conversation is following. Additi
- Page 91 and 92:
the context of a communication-heav
- Page 93 and 94:
Table 1. Core Agent ProtocolsAgent
- Page 95 and 96:
statistically significant using an
- Page 97 and 98:
to the conversation and has a perfo
- Page 99 and 100:
principal reasons. Firstly, it is a
- Page 101 and 102:
2. Muldoon, C., O’Hare, G.M.P., C
- Page 103 and 104:
In the following section we will at
- Page 105 and 106:
DevelopmentProductionHuman Readabil
- Page 107 and 108: will then evaluate this new format
- Page 109 and 110: encoder, it is first checked if the
- Page 111 and 112: nents themselves. However, since th
- Page 113 and 114: optimized for this format feature s
- Page 115 and 116: Java serialization and Jadex Binary
- Page 117 and 118: 10. P. Hoffman and F. Yergeau, “U
- Page 119 and 120: Caching the results of previous que
- Page 121 and 122: querying an agent’s beliefs and g
- Page 123 and 124: or relative performance of each pla
- Page 125 and 126: were run for 1.5 minutes; 1.5 minut
- Page 127 and 128: Size N K n p c qry U c upd Update c
- Page 129 and 130: epresentation. The cache simply act
- Page 131 and 132: 6 ConclusionWe presented an abstrac
- Page 133 and 134: Typing Multi-Agent Programs in simp
- Page 135 and 136: 1 // agent ag02 iterations (" zero
- Page 137 and 138: 3.1 simpAL OverviewThe main inspira
- Page 139 and 140: 3.2 Typing Agents with Tasks and Ro
- Page 141 and 142: Defining Agent Scripts in simpAL (F
- Page 143 and 144: that sends a message to the receive
- Page 145 and 146: * error: wrong type for the param v
- Page 147 and 148: Given an organization model, it is
- Page 149 and 150: Learning to Improve Agent Behaviour
- Page 151 and 152: 2.1 Agent Programming LanguagesAgen
- Page 153 and 154: choosing actions is to find a good
- Page 155 and 156: 1 init module {2 knowledge{3 block(
- Page 157: of a module. For example, to change
- Page 161 and 162: mance. Figure 2d shows the same A f
- Page 163 and 164: the current percepts of the agent.
- Page 165: Author IndexAbdel-Naby, S., 69Alelc