The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
8<br />
35.00%<br />
30.00%<br />
W<strong>in</strong>n<strong>in</strong>g percentage<br />
25.00%<br />
20.00%<br />
15.00%<br />
10.00%<br />
5.00%<br />
0.00%<br />
O<strong>MIX</strong> PAR MAXN D<strong>MIX</strong> <strong>MIX</strong><br />
Fig. 8.<br />
Experiment 2 - W<strong>in</strong>n<strong>in</strong>g percentage per player<br />
Fig. 9.<br />
A typical Risk Game board<br />
oriented <strong>MP</strong>-Mix player with T o = ∞, T d = 20. <strong>The</strong> <strong>MIX</strong><br />
player will be set with T o = 20, T d = 20. Overall, we used the<br />
follow<strong>in</strong>g set of players {MAXN, PAR, O<strong>MIX</strong>, D<strong>MIX</strong>, <strong>MIX</strong>}.<br />
<strong>The</strong> environment was fixed with 3 players of the MAXN type<br />
and for the fourth player we plugged <strong>in</strong> each of the <strong>MP</strong>-<br />
Mix players described above. In addition, we changed the<br />
fixed depth limitation to a 50K node limit, so the Paranoid<br />
search would be able to perform its deep prun<strong>in</strong>g procedure<br />
and search deeper under the 50K node limit constra<strong>in</strong>t.<br />
<strong>The</strong> results from runn<strong>in</strong>g 500 tournaments for each <strong>MIX</strong><br />
player are presented <strong>in</strong> Figure 8. <strong>The</strong> best player was the<br />
<strong>MIX</strong> player that won over 32% of the tournaments, which<br />
is significantly better (P < 0.05) than the MAXN or PAR<br />
results. <strong>The</strong> D<strong>MIX</strong> came <strong>in</strong> second with 28%. <strong>The</strong> PAR player<br />
won slightly over 20% of the tournaments. Surpris<strong>in</strong>gly, the<br />
O<strong>MIX</strong> player was the worst one, w<strong>in</strong>n<strong>in</strong>g only 16% of the<br />
tournaments. <strong>The</strong> reason for this was that the O<strong>MIX</strong> player<br />
took offensive moves aga<strong>in</strong>st 3 MAXN players. This was not<br />
the best option due to the fact that when he attacks the lead<strong>in</strong>g<br />
player he weakens his own score but at the same time the other<br />
players advance faster towards the w<strong>in</strong>n<strong>in</strong>g state. Thus, <strong>in</strong> this<br />
situation the O<strong>MIX</strong> player sacrifices himself for the benefit<br />
of the others. We assume that O<strong>MIX</strong> is probably better when<br />
other players are us<strong>in</strong>g the same strategy.<br />
B. Experiments us<strong>in</strong>g Risk<br />
Our next experimental doma<strong>in</strong> is a multilateral <strong>in</strong>teraction<br />
<strong>in</strong> the form of the Risk board game.<br />
1) Game description: <strong>The</strong> game is a perfect-<strong>in</strong>formation<br />
strategy board game that <strong>in</strong>corporates probabilistic elements<br />
and strategic reason<strong>in</strong>g <strong>in</strong> various forms. <strong>The</strong> game is a<br />
sequential turn-based game for two to six players, which is<br />
played on a world map divided <strong>in</strong>to 42 territories and 6<br />
cont<strong>in</strong>ents. Each player controls an army, and the goal is to<br />
conquer the world, which is equivalent to elim<strong>in</strong>at<strong>in</strong>g the other<br />
players. Each turn consists of three phases:<br />
1) Re<strong>in</strong>forcement Phase — the player gets a new set of<br />
troops and places them <strong>in</strong>to his territories. <strong>The</strong> number<br />
of bonus troops is (number of owned territories / 3)<br />
+ cont<strong>in</strong>ent bonuses + card bonus. A player gets a<br />
cont<strong>in</strong>ent bonus for each cont<strong>in</strong>ent he controls at the<br />
beg<strong>in</strong>n<strong>in</strong>g of his turn, and card bonus gives additional<br />
troops for turn<strong>in</strong>g <strong>in</strong> sets. <strong>The</strong> card bonus works as follows:<br />
each card has a picture {cavalry, <strong>in</strong>fantry, cannon}<br />
and a country name. At the end of each turn, if the player<br />
conquered at least one country, he draws a card from the<br />
ma<strong>in</strong> pile. Three cards with the same picture, or three<br />
cards with each of the possible pictures can be turned<br />
<strong>in</strong> at this phase to get additional bonus troops.<br />
2) Attack Phase — the player decides from which countries<br />
to attack an opponent’s country. <strong>The</strong> attack can be<br />
between any adjacent countries, but the attacker must<br />
have at least two troops <strong>in</strong> the attack<strong>in</strong>g country; the<br />
battle’s outcome is decided by roll<strong>in</strong>g dice. This phase<br />
ends when the player is no longer capable of attack<strong>in</strong>g<br />
(i.e. he does not have any opponent’s adjacent country<br />
with more than two troops <strong>in</strong> it), or until he declares<br />
so (this phase can also end with zero attacks). After<br />
an attack is won the player selects how to divide the<br />
attack<strong>in</strong>g force between the orig<strong>in</strong> and source countries.<br />
3) Fortification Phase — <strong>in</strong> which the player can move<br />
armies from one of his countries to an adjacent country<br />
which he owns. This rules has many variations on the<br />
number of troops one can move and on the allowable<br />
dest<strong>in</strong>ation countries.<br />
Risk is too complicated to formalize and solve us<strong>in</strong>g classical<br />
search methods. First, each turn has a different number of<br />
possible actions which changes dur<strong>in</strong>g the turn, as the player<br />
can decide at any time to cease his attack or to cont<strong>in</strong>ue if he<br />
has territory with at least two troops. Second, as shown <strong>in</strong> [5],<br />
the number of different open<strong>in</strong>g moves for six players game<br />
is huge (≈ 3.3 ∗ 10 24 ) when compared to a classic bilateral<br />
board games (400 <strong>in</strong> Chess and 144, 780 <strong>in</strong> Go). State-of-theart<br />
search <strong>algorithm</strong>s cannot provide any decent solution for a<br />
game of this complexity. Previous attempts to play Risk used<br />
either a heuristic-based multiagent architecture where players<br />
control countries and bid for offensive and defensive moves<br />
[5] or a genetic <strong>algorithm</strong> classifier system that was able to<br />
play only at an extremely basic level [10].<br />
In order to cope with the branch<strong>in</strong>g factor problem <strong>in</strong> this<br />
complex game, we artificially reduced the branch<strong>in</strong>g factor<br />
of the search tree as follows. At each node we expanded only