The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
12<br />
<strong>Search</strong> Strategies %<br />
100%<br />
90%<br />
80%<br />
70%<br />
60%<br />
50%<br />
40%<br />
30%<br />
20%<br />
10%<br />
0%<br />
1 2 3 4 5<br />
par max off<br />
<strong>Search</strong> Depth<br />
<strong>Search</strong> Strategies %<br />
100%<br />
90%<br />
80%<br />
70%<br />
60%<br />
50%<br />
40%<br />
30%<br />
20%<br />
10%<br />
0%<br />
1 2 3 4 5<br />
par max off<br />
<strong>Search</strong> Depth<br />
Fig. 17.<br />
Percentage of search strategies {<strong>MP</strong>-Mix vs. MaxN}<br />
Fig. 19.<br />
Percentage of search strategies {<strong>MP</strong>-Mix vs. 3 RND}<br />
<strong>Search</strong> Strategies %<br />
100%<br />
90%<br />
80%<br />
70%<br />
60%<br />
50%<br />
40%<br />
30%<br />
20%<br />
10%<br />
Fig. 18.<br />
0%<br />
1 2 3 4 5<br />
par max off<br />
<strong>Search</strong> Depth<br />
Percentage of search strategies {<strong>MP</strong>-Mix vs. Paranoid}<br />
evaluate and compare the <strong>in</strong>dividual performance of the different<br />
search <strong>algorithm</strong>s aga<strong>in</strong>st three RND opponents. Recall<br />
that a RND player is a player whose preferences were randomized<br />
to reflect a wider range of social behavior than the<br />
simple rational, self-maximiz<strong>in</strong>g players, as is often observed<br />
<strong>in</strong> human play<strong>in</strong>g strategies. Moreover, runn<strong>in</strong>g aga<strong>in</strong>st 3 RND<br />
players allow us to explore one depth deeper (that is to depth<br />
6), and observe the performances at this depth.<br />
<strong>The</strong> results are depicted <strong>in</strong> Figure 16. Each column represents<br />
the w<strong>in</strong>n<strong>in</strong>g percentage of the respective player aga<strong>in</strong>st<br />
3 RND players. We can see that the <strong>MIX</strong> player atta<strong>in</strong>s higher<br />
w<strong>in</strong>n<strong>in</strong>g percentage than the other two <strong>algorithm</strong>s (except for<br />
depth 3, <strong>in</strong> which MAXN’s with 45.2%, was slightly higher<br />
than <strong>MIX</strong>’s, with 44.8%. However, these improvements do<br />
not appear to be as significant as <strong>in</strong> the direct match-up<br />
experiments, probably due to the limited competence of the<br />
opponents. We see a significant improvement <strong>in</strong> depth 6, where<br />
the <strong>MIX</strong> player managed to atta<strong>in</strong> 51% w<strong>in</strong>n<strong>in</strong>g compared to<br />
the 37% and 35.2% of MAXN and PAR, respectively.<br />
Experiment 5: Analyz<strong>in</strong>g <strong>MP</strong>-Mix’s behavior<br />
Another <strong>in</strong>terest<strong>in</strong>g question is regard<strong>in</strong>g the quantity of<br />
the different searches that the <strong>MP</strong>-Mix <strong>algorithm</strong> performed.<br />
In other words, how do the amount of Paranoid, MaxN<br />
and Offensive search procedures change with different search<br />
depths and opponents’ types. Figures 17,18, and 19 show<br />
the percentages of each of the different node propagation<br />
strategies <strong>in</strong> the Quoridor game, for the experiments depicted<br />
<strong>in</strong> Figures 14, 15 and 16, respectively. <strong>The</strong>se figures show<br />
a few <strong>in</strong>terest<strong>in</strong>g <strong>in</strong>sights: First, it seems that the MaxN<br />
procedure was the most frequently used <strong>in</strong> all three sett<strong>in</strong>gs,<br />
as it accounts for about 45% of the searches (with some<br />
variance to depth and opponents’ types). This seems <strong>in</strong>tuitive<br />
as this is the “default” search procedure embedded <strong>in</strong> the<br />
meta-decision <strong>algorithm</strong>. Second, it also seem that beyond<br />
search depth of 2, the rema<strong>in</strong><strong>in</strong>g 55% are equally split between<br />
Paranoid and Offensive node propagation strategies. F<strong>in</strong>ally,<br />
all the figures demonstrate a small, but monotonic <strong>in</strong>crease<br />
<strong>in</strong> the percentage of Paranoid searches with the <strong>in</strong>crease <strong>in</strong><br />
depth search. Naturally, these graphs will change with different<br />
heuristic function and threshold values. However, they do give<br />
us some sense of the amount of different searches that were<br />
conducted by the <strong>MP</strong>-Mix agent <strong>in</strong> this doma<strong>in</strong>.<br />
V. OPPONENT I<strong>MP</strong>ACT<br />
<strong>The</strong> experimental results clearly <strong>in</strong>dicate that <strong>MP</strong>-Mix improved<br />
the players’ performances. However, we can see that<br />
the improvement <strong>in</strong> Risk and Quoridor were much more<br />
impressive than <strong>in</strong> the Hearts doma<strong>in</strong>. As such, an emerg<strong>in</strong>g<br />
question is under what conditions will the <strong>MP</strong>-Mix <strong>algorithm</strong><br />
be more effective and advantageous? To try and answer this<br />
question, we present the Opponent Impact (OI) factor, which<br />
measures the impact that a s<strong>in</strong>gle player has on the outcome<br />
of the other players. <strong>The</strong> Opponent Impact factor is a meta<br />
property which characterizes a game accord<strong>in</strong>g to its players<br />
ability to take actions that will reduce the evaluation of their<br />
opponents. This property is related both to the game and to<br />
the evaluation function that is used.<br />
A. Intuition and Def<strong>in</strong>ition<br />
<strong>The</strong> <strong>in</strong>tuition beh<strong>in</strong>d the opponent impact is that certa<strong>in</strong><br />
games were designed <strong>in</strong> a way that it is reasonably easy to<br />
impede the efforts of an opponent from w<strong>in</strong>n<strong>in</strong>g the game,<br />
while <strong>in</strong> other games those possibilities are fairly limited.<br />
For <strong>in</strong>stance, <strong>in</strong> Quoridor, as long as the player owns wall<br />
pieces, he can explicitly impede an opponent by plac<strong>in</strong>g them<br />
<strong>in</strong> his path. In contrast, <strong>in</strong> the game of Hearts, giv<strong>in</strong>g po<strong>in</strong>ts<br />
to a specific opponent often requires the fulfillment of several<br />
preconditions, that are often atta<strong>in</strong>able only through implicit<br />
cooperative strategic behavior among players.<br />
Before present<strong>in</strong>g our proposed def<strong>in</strong>ition of the Opponent<br />
Impact, let us first take a step back and discuss the notion