23.08.2015 Views

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

if bel(on(X,Y), clear(X)), a-goal(clear(Y)) then move(X,table).if bel(on(X,Y), not(clear(X))), a-goal(clear(Y)) then adopt(clear(X)).}}This strategy uses the following line <strong>of</strong> thought: If the agent has a goal tohave some block X on top <strong>of</strong> Z, then move X onto Z if possible. If not possiblebecause X cannot be moved, then clear whatever block is obstructing X. Onthe other hand, if it is Z that is blocked then clear it first. Finally, repeatedlyclear blocks that are obstructing other blocks that are to be cleared.Program C A more sophisticated solution that comes bundled with the Goaldistribution uses a higher level notion <strong>of</strong> misplaced blocks to decide if a blockshould be moved. To do this it provides a recursive definition <strong>of</strong> a Tower.Then a block is considered misplaced if the agent still has a goal to have atower with block X on top. Given these definitions, the strategy is relativelysimple and uses only two rules. The idea is to either move a misplaced blockonto the table, or move a block onto another block if the move is constructive,i.e., results in a desired tower configuration.knowledge{...tower([X]) :- on(X, table).tower([X, Y| T]) :- on(X, Y), tower([Y| T]).}program[order=linear] {#define misplaced(X) a-goal(tower([X| T])).#define constructiveMove(X,Y) a-goal(tower([X, Y| T])), bel(tower([Y| T])).if constructiveMove(X, Y) then move(X, Y).if misplaced(X) then move(X, table).}We conducted several experiments with the three example programs A, B,and C, for problems with upto 10 blocks. Each run <strong>of</strong> the experiment consisted<strong>of</strong> a series <strong>of</strong> randomly generated problems that were solved using the programfirst in its original form and then using adaptive ordering (i.e., by substituting[order=adaptive] in the program module options). Since problems are randomlygenerated and the number <strong>of</strong> moves required to solve them can varysignificantly, we used a moving average <strong>of</strong> 20 results over the series <strong>of</strong> generatedproblems to get the average number <strong>of</strong> steps for any problem <strong>of</strong> a given size.Finally, we ran 20 repeats <strong>of</strong> each experiment and report the average number <strong>of</strong>moves taken to achieve the fixed goal <strong>of</strong> building a given tower configuration.For all <strong>of</strong> our experiments, we used the following parameters’ settings. Theɛ value for the action selection strategy was set to always explore 10% <strong>of</strong> thetime. For Q-Learning we set the learning rate α to 1.0 and the discount factorγ to 0.9. It should be noted that these settings will obviously impact learning,and these default values may not work as well in other domains. An option inthe future might be to setup learning “pr<strong>of</strong>iles” that the programmer can selectbetween based on some basic usage guidelines.158

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!