23.08.2015 Views

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

Here - Agents Lab - University of Nottingham

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

mance. Figure 2d shows the same A for problems with six blocks using adaptiveordering. We have not included results for the original program since it takesover 30000 moves on average per problem. For adaptive mode, this number improvesto about 60 moves by the 100th episode, and progressively to around 12moves by 4000 episodes. This gets us close to the baseline <strong>of</strong> 2(n − √ n) = 7.1but not quite there. It would be possible to improve further if the program wasallowed to run for more episodes, but the improvement will occur very slowly.We also did not run this program for problems with more than six blocks assolving larger problems becomes impractical with this strategy.Program B: Figure 2b shows the performance <strong>of</strong> the original B for problemswith four blocks at around 11 moves. The performance is already reasonable tostart with as it is a more informed programmed strategy than A. With adaptiveordering, the performance improves to around five moves per problem by 100episodes. This is on par with the performance <strong>of</strong> A at 2000 episodes. At the end<strong>of</strong> the experiment, the program performs slightly above 4.5 moves and is closeto optimal. For six blocks, the original program averages around 28 moves perproblem as shown in Figure 2e. In adaptive mode, this improves to around 58moves by 100 episodes, and at the end <strong>of</strong> the experiment to around 10 moves.This is higher than the baseline <strong>of</strong> 7.1 moves but slightly better than adaptiveperformance with A that averages around 12 moves in that timeframe. Overall,B performs far better than A due to its informed strategy, and this performancealso translates to faster and better learning.Program C : In contrast to the other programs, C is already known to performclose to optimal, and achieves around 4.5 moves on average per problem <strong>of</strong> fourblocks as shown in Figure 2c. With adaptive ordering, this does not seem toimprove in the 2000 episodes that we ran the experiment for. This is expectedsince the program is already performing close to optimal. However, interestinglywe know from previous studies that C does not perform optimally for certain“deadlock” cases. We would have hoped to overcome this using learning but fromthe averaged results this is not evident as there is no significant difference in theperformance with and without learning. Importantly, for six blocks, for the firsttime in the experiments we see that the adaptive ordering actually performsworse than the original program in Figure 2f, albeit by only 0.25 moves perproblem on average at its worst. On closer analysis this seems to be becausewe simply have not run the experiment long enough. Certainly the differencebetween the two modes <strong>of</strong> execution is diminishing as the experiment progressesand is evident in Figure 2f. We should note that regardless, the performance<strong>of</strong> C with or without learning is significantly better than the other programs ataround 8 moves and only slightly higher than the baseline case <strong>of</strong> 2(n− √ n) = 7.1.Overall, we can conclude that C is already very informed about the domain, solearning is not very useful in this case.Interestingly, in all experiments, adaptive mode does not do any worse thanthe default behaviour. This is a useful insight for agent programmers who mayotherwise feel reluctant to try a “black box” technology that directly impactsthe performance <strong>of</strong> the agent but that they do not really understand. Anotherimportant point is that the performance improvement with adaptive mode is160

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!