07.03.2014 Views

The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...

The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...

The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6<br />

Property 3.1 (<strong>MP</strong>-Mix on high T values):<br />

If for every time stamp t and every player i, vi<br />

t ≤ T then<br />

N(G, T ) = 0<br />

When sett<strong>in</strong>g the threshold too high (larger than the maximal<br />

possible value of the v i ), <strong>MP</strong>-Mix behaves as a pure MaxN<br />

player, as no change <strong>in</strong> strategy will ever occur.<br />

Property 3.2 (<strong>MP</strong>-Mix on low T values):<br />

Let x be the number of times lead<strong>in</strong>gEdge ≥ 0, then if T =<br />

0, N(G, T ) = x<br />

In the other extreme case, when the threshold is set to zero,<br />

a paranoid or offensive behavior will occur every time the<br />

<strong>MP</strong>-Mix player leads (i.e., MaxN will never run). <strong>The</strong> above<br />

properties will come <strong>in</strong>to play <strong>in</strong> our experimental section as<br />

we experiment with different threshold values that converge<br />

to the orig<strong>in</strong>al <strong>algorithm</strong>s at the two extreme values.<br />

IV. EXPERIMENTAL RESULTS<br />

In order to evaluate the performance of <strong>MP</strong>-Mix, we<br />

implemented players that use MaxN, Paranoid and <strong>MP</strong>-Mix<br />

<strong>algorithm</strong>s <strong>in</strong> three popular games: the Hearts card game,<br />

Risk the strategic board game of world dom<strong>in</strong>ation, and the<br />

Quoridor board game. <strong>The</strong>se three games were chosen as they<br />

allow us to evaluate the <strong>algorithm</strong> <strong>in</strong> three different types of<br />

doma<strong>in</strong>s, and as such <strong>in</strong>crease the robustness of the evaluation.<br />

1) Hearts — is a four-player, imperfect-<strong>in</strong>formation, determ<strong>in</strong>istic<br />

card game.<br />

2) Risk — is a six-player, perfect-<strong>in</strong>formation, nondeterm<strong>in</strong>istic<br />

board game.<br />

3) Quoridor — is a four-player, perfect-<strong>in</strong>formation, determ<strong>in</strong>istic<br />

board game.<br />

In order to evaluate the <strong>MP</strong>-Mix <strong>algorithm</strong>, we performed a<br />

series of experiments with different sett<strong>in</strong>gs and environment<br />

variables. We used two methods to bound the search tree.<br />

• Fixed depth <strong>The</strong> first method was to perform a full<br />

width search up to a given depth. This provided a<br />

fair comparison to the logical behavior of the different<br />

strategies.<br />

• Fixed number of nodes <strong>The</strong> Paranoid strategy can<br />

benefit from deep prun<strong>in</strong>g while MaxN and Offensive<br />

can not. <strong>The</strong>refore, to provide a fair comparison we<br />

fixed the number of nodes N that can be visited, which<br />

will naturally allow the Paranoid to enjoy its prun<strong>in</strong>g<br />

advantage. To do this, we used iterative deepen<strong>in</strong>g to<br />

search for game trees as described by [8]. <strong>The</strong> player<br />

builds the search tree to <strong>in</strong>creas<strong>in</strong>gly larger depths, where<br />

at the end of each iteration he saves the current best move.<br />

Dur<strong>in</strong>g the iterations he keeps track of the number of<br />

nodes it visited, and if this number exceeds the node<br />

limit N, he immediately stops the search and retruns<br />

the current best move (which was found <strong>in</strong> the previous<br />

iteration).<br />

A. Experiments us<strong>in</strong>g Hearts<br />

1) Game description: Hearts is a multi-player, imperfect<strong>in</strong>formation,<br />

trick-tak<strong>in</strong>g card game designed to be played by<br />

exactly four players. A standard 52 card deck is used, with<br />

the cards <strong>in</strong> each suit rank<strong>in</strong>g <strong>in</strong> decreas<strong>in</strong>g order from Ace<br />

(highest) down to Two (lowest). At the beg<strong>in</strong>n<strong>in</strong>g of a game the<br />

cards are distributed evenly between the players, face down.<br />

<strong>The</strong> game beg<strong>in</strong>s when the player hold<strong>in</strong>g the Two of clubs<br />

card starts the first trick. <strong>The</strong> next trick is started by the w<strong>in</strong>ner<br />

of the previous trick. <strong>The</strong> other players, <strong>in</strong> clockwise order,<br />

must play a card of the same suit that started the trick, if they<br />

have any. If they do not have a card of that suit, they may<br />

play any card. <strong>The</strong> player who played the highest card of the<br />

suit which started the trick, w<strong>in</strong>s the trick.<br />

Each player scores penalty po<strong>in</strong>ts for some of the cards <strong>in</strong><br />

the tricks they won, therefore players usually want to avoid<br />

tak<strong>in</strong>g tricks. Each heart card scores one po<strong>in</strong>t, and the queen<br />

of spades card scores 13 po<strong>in</strong>ts. Tricks which conta<strong>in</strong> po<strong>in</strong>ts<br />

are called “pa<strong>in</strong>ted” tricks. 2 Each s<strong>in</strong>gle round has 13 tricks<br />

and distributes 26 po<strong>in</strong>ts among the players. Hearts is usually<br />

played as a tournament and the game does not end after the<br />

deck has been fully played. <strong>The</strong> game cont<strong>in</strong>ues until one of<br />

the players has reached or exceeded 100 po<strong>in</strong>ts (a predef<strong>in</strong>ed<br />

limit) at the conclusion of a trick. <strong>The</strong> player with the lowest<br />

score is declared the w<strong>in</strong>ner.<br />

While there are no formal partnerships <strong>in</strong> Hearts it is a very<br />

<strong>in</strong>terest<strong>in</strong>g doma<strong>in</strong> due to the specific po<strong>in</strong>t-tak<strong>in</strong>g rules. When<br />

play<strong>in</strong>g Hearts <strong>in</strong> a tournament, players might f<strong>in</strong>d that their<br />

best <strong>in</strong>terest is to help each other and oppose the leader. For<br />

example, when one of the players is lead<strong>in</strong>g by a large marg<strong>in</strong>,<br />

it will be <strong>in</strong> the best <strong>in</strong>terest of his opponents to give him<br />

po<strong>in</strong>ts, as it will decrease its advantage. Similarly, when there<br />

is a weak player whose po<strong>in</strong>t status is close to the tournament<br />

limit, his opponents might sacrifice by tak<strong>in</strong>g pa<strong>in</strong>ted tricks<br />

themselves, as a way to assure that the tournament will not end<br />

(which keeps their hopes of w<strong>in</strong>n<strong>in</strong>g). This <strong>in</strong>ternal structure<br />

of the game calls for use of the <strong>MP</strong>-Mix <strong>algorithm</strong>.<br />

2) Experiments’ design: We implemented a Hearts play<strong>in</strong>g<br />

environment and experimented with the follow<strong>in</strong>g players:<br />

1) Random (RND) - This player selects the next move<br />

randomly from the set of allowable moves.<br />

2) Weak rational (WRT) - This player picks the lowest<br />

possible card if he is start<strong>in</strong>g or follow<strong>in</strong>g a trick, and<br />

picks the highest card if it does not need to follow suit.<br />

3) MaxN (MAXN) - Runs the MaxN <strong>algorithm</strong>.<br />

4) Paranoid (PAR) - Runs the Paranoid <strong>algorithm</strong>.<br />

5) <strong>MP</strong>-Mix (<strong>MIX</strong>) - Runs the <strong>MP</strong>-Mix <strong>algorithm</strong> (thresholds<br />

are given as <strong>in</strong>put).<br />

In Hearts, players cannot view their opponent’s hands. In<br />

order to deal with the imperfect nature of the game the <strong>algorithm</strong><br />

uses a Monte-Carlo sampl<strong>in</strong>g based technique (adopted<br />

from [2]) with a uniform distribution function on the cards.<br />

It randomly simulates the opponent’s cards a large number<br />

of times (fixed to 1000 <strong>in</strong> our experiments), runs the search<br />

on each of the simulated hands and selects a card to play.<br />

<strong>The</strong> card f<strong>in</strong>ally played is the one that was selected the most<br />

among all simulations. <strong>The</strong> sampl<strong>in</strong>g technique is crucial <strong>in</strong><br />

2 In our variation of the game we did not use the “shoot the moon” rule <strong>in</strong><br />

order to simplify the heuristic construction process.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!