07.03.2014 Views

The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...

The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...

The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8<br />

35.00%<br />

30.00%<br />

W<strong>in</strong>n<strong>in</strong>g percentage<br />

25.00%<br />

20.00%<br />

15.00%<br />

10.00%<br />

5.00%<br />

0.00%<br />

O<strong>MIX</strong> PAR MAXN D<strong>MIX</strong> <strong>MIX</strong><br />

Fig. 8.<br />

Experiment 2 - W<strong>in</strong>n<strong>in</strong>g percentage per player<br />

Fig. 9.<br />

A typical Risk Game board<br />

oriented <strong>MP</strong>-Mix player with T o = ∞, T d = 20. <strong>The</strong> <strong>MIX</strong><br />

player will be set with T o = 20, T d = 20. Overall, we used the<br />

follow<strong>in</strong>g set of players {MAXN, PAR, O<strong>MIX</strong>, D<strong>MIX</strong>, <strong>MIX</strong>}.<br />

<strong>The</strong> environment was fixed with 3 players of the MAXN type<br />

and for the fourth player we plugged <strong>in</strong> each of the <strong>MP</strong>-<br />

Mix players described above. In addition, we changed the<br />

fixed depth limitation to a 50K node limit, so the Paranoid<br />

search would be able to perform its deep prun<strong>in</strong>g procedure<br />

and search deeper under the 50K node limit constra<strong>in</strong>t.<br />

<strong>The</strong> results from runn<strong>in</strong>g 500 tournaments for each <strong>MIX</strong><br />

player are presented <strong>in</strong> Figure 8. <strong>The</strong> best player was the<br />

<strong>MIX</strong> player that won over 32% of the tournaments, which<br />

is significantly better (P < 0.05) than the MAXN or PAR<br />

results. <strong>The</strong> D<strong>MIX</strong> came <strong>in</strong> second with 28%. <strong>The</strong> PAR player<br />

won slightly over 20% of the tournaments. Surpris<strong>in</strong>gly, the<br />

O<strong>MIX</strong> player was the worst one, w<strong>in</strong>n<strong>in</strong>g only 16% of the<br />

tournaments. <strong>The</strong> reason for this was that the O<strong>MIX</strong> player<br />

took offensive moves aga<strong>in</strong>st 3 MAXN players. This was not<br />

the best option due to the fact that when he attacks the lead<strong>in</strong>g<br />

player he weakens his own score but at the same time the other<br />

players advance faster towards the w<strong>in</strong>n<strong>in</strong>g state. Thus, <strong>in</strong> this<br />

situation the O<strong>MIX</strong> player sacrifices himself for the benefit<br />

of the others. We assume that O<strong>MIX</strong> is probably better when<br />

other players are us<strong>in</strong>g the same strategy.<br />

B. Experiments us<strong>in</strong>g Risk<br />

Our next experimental doma<strong>in</strong> is a multilateral <strong>in</strong>teraction<br />

<strong>in</strong> the form of the Risk board game.<br />

1) Game description: <strong>The</strong> game is a perfect-<strong>in</strong>formation<br />

strategy board game that <strong>in</strong>corporates probabilistic elements<br />

and strategic reason<strong>in</strong>g <strong>in</strong> various forms. <strong>The</strong> game is a<br />

sequential turn-based game for two to six players, which is<br />

played on a world map divided <strong>in</strong>to 42 territories and 6<br />

cont<strong>in</strong>ents. Each player controls an army, and the goal is to<br />

conquer the world, which is equivalent to elim<strong>in</strong>at<strong>in</strong>g the other<br />

players. Each turn consists of three phases:<br />

1) Re<strong>in</strong>forcement Phase — the player gets a new set of<br />

troops and places them <strong>in</strong>to his territories. <strong>The</strong> number<br />

of bonus troops is (number of owned territories / 3)<br />

+ cont<strong>in</strong>ent bonuses + card bonus. A player gets a<br />

cont<strong>in</strong>ent bonus for each cont<strong>in</strong>ent he controls at the<br />

beg<strong>in</strong>n<strong>in</strong>g of his turn, and card bonus gives additional<br />

troops for turn<strong>in</strong>g <strong>in</strong> sets. <strong>The</strong> card bonus works as follows:<br />

each card has a picture {cavalry, <strong>in</strong>fantry, cannon}<br />

and a country name. At the end of each turn, if the player<br />

conquered at least one country, he draws a card from the<br />

ma<strong>in</strong> pile. Three cards with the same picture, or three<br />

cards with each of the possible pictures can be turned<br />

<strong>in</strong> at this phase to get additional bonus troops.<br />

2) Attack Phase — the player decides from which countries<br />

to attack an opponent’s country. <strong>The</strong> attack can be<br />

between any adjacent countries, but the attacker must<br />

have at least two troops <strong>in</strong> the attack<strong>in</strong>g country; the<br />

battle’s outcome is decided by roll<strong>in</strong>g dice. This phase<br />

ends when the player is no longer capable of attack<strong>in</strong>g<br />

(i.e. he does not have any opponent’s adjacent country<br />

with more than two troops <strong>in</strong> it), or until he declares<br />

so (this phase can also end with zero attacks). After<br />

an attack is won the player selects how to divide the<br />

attack<strong>in</strong>g force between the orig<strong>in</strong> and source countries.<br />

3) Fortification Phase — <strong>in</strong> which the player can move<br />

armies from one of his countries to an adjacent country<br />

which he owns. This rules has many variations on the<br />

number of troops one can move and on the allowable<br />

dest<strong>in</strong>ation countries.<br />

Risk is too complicated to formalize and solve us<strong>in</strong>g classical<br />

search methods. First, each turn has a different number of<br />

possible actions which changes dur<strong>in</strong>g the turn, as the player<br />

can decide at any time to cease his attack or to cont<strong>in</strong>ue if he<br />

has territory with at least two troops. Second, as shown <strong>in</strong> [5],<br />

the number of different open<strong>in</strong>g moves for six players game<br />

is huge (≈ 3.3 ∗ 10 24 ) when compared to a classic bilateral<br />

board games (400 <strong>in</strong> Chess and 144, 780 <strong>in</strong> Go). State-of-theart<br />

search <strong>algorithm</strong>s cannot provide any decent solution for a<br />

game of this complexity. Previous attempts to play Risk used<br />

either a heuristic-based multiagent architecture where players<br />

control countries and bid for offensive and defensive moves<br />

[5] or a genetic <strong>algorithm</strong> classifier system that was able to<br />

play only at an extremely basic level [10].<br />

In order to cope with the branch<strong>in</strong>g factor problem <strong>in</strong> this<br />

complex game, we artificially reduced the branch<strong>in</strong>g factor<br />

of the search tree as follows. At each node we expanded only

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!