The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
The MP-MIX algorithm: Dynamic Search Strategy Selection in Multi ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6<br />
Property 3.1 (<strong>MP</strong>-Mix on high T values):<br />
If for every time stamp t and every player i, vi<br />
t ≤ T then<br />
N(G, T ) = 0<br />
When sett<strong>in</strong>g the threshold too high (larger than the maximal<br />
possible value of the v i ), <strong>MP</strong>-Mix behaves as a pure MaxN<br />
player, as no change <strong>in</strong> strategy will ever occur.<br />
Property 3.2 (<strong>MP</strong>-Mix on low T values):<br />
Let x be the number of times lead<strong>in</strong>gEdge ≥ 0, then if T =<br />
0, N(G, T ) = x<br />
In the other extreme case, when the threshold is set to zero,<br />
a paranoid or offensive behavior will occur every time the<br />
<strong>MP</strong>-Mix player leads (i.e., MaxN will never run). <strong>The</strong> above<br />
properties will come <strong>in</strong>to play <strong>in</strong> our experimental section as<br />
we experiment with different threshold values that converge<br />
to the orig<strong>in</strong>al <strong>algorithm</strong>s at the two extreme values.<br />
IV. EXPERIMENTAL RESULTS<br />
In order to evaluate the performance of <strong>MP</strong>-Mix, we<br />
implemented players that use MaxN, Paranoid and <strong>MP</strong>-Mix<br />
<strong>algorithm</strong>s <strong>in</strong> three popular games: the Hearts card game,<br />
Risk the strategic board game of world dom<strong>in</strong>ation, and the<br />
Quoridor board game. <strong>The</strong>se three games were chosen as they<br />
allow us to evaluate the <strong>algorithm</strong> <strong>in</strong> three different types of<br />
doma<strong>in</strong>s, and as such <strong>in</strong>crease the robustness of the evaluation.<br />
1) Hearts — is a four-player, imperfect-<strong>in</strong>formation, determ<strong>in</strong>istic<br />
card game.<br />
2) Risk — is a six-player, perfect-<strong>in</strong>formation, nondeterm<strong>in</strong>istic<br />
board game.<br />
3) Quoridor — is a four-player, perfect-<strong>in</strong>formation, determ<strong>in</strong>istic<br />
board game.<br />
In order to evaluate the <strong>MP</strong>-Mix <strong>algorithm</strong>, we performed a<br />
series of experiments with different sett<strong>in</strong>gs and environment<br />
variables. We used two methods to bound the search tree.<br />
• Fixed depth <strong>The</strong> first method was to perform a full<br />
width search up to a given depth. This provided a<br />
fair comparison to the logical behavior of the different<br />
strategies.<br />
• Fixed number of nodes <strong>The</strong> Paranoid strategy can<br />
benefit from deep prun<strong>in</strong>g while MaxN and Offensive<br />
can not. <strong>The</strong>refore, to provide a fair comparison we<br />
fixed the number of nodes N that can be visited, which<br />
will naturally allow the Paranoid to enjoy its prun<strong>in</strong>g<br />
advantage. To do this, we used iterative deepen<strong>in</strong>g to<br />
search for game trees as described by [8]. <strong>The</strong> player<br />
builds the search tree to <strong>in</strong>creas<strong>in</strong>gly larger depths, where<br />
at the end of each iteration he saves the current best move.<br />
Dur<strong>in</strong>g the iterations he keeps track of the number of<br />
nodes it visited, and if this number exceeds the node<br />
limit N, he immediately stops the search and retruns<br />
the current best move (which was found <strong>in</strong> the previous<br />
iteration).<br />
A. Experiments us<strong>in</strong>g Hearts<br />
1) Game description: Hearts is a multi-player, imperfect<strong>in</strong>formation,<br />
trick-tak<strong>in</strong>g card game designed to be played by<br />
exactly four players. A standard 52 card deck is used, with<br />
the cards <strong>in</strong> each suit rank<strong>in</strong>g <strong>in</strong> decreas<strong>in</strong>g order from Ace<br />
(highest) down to Two (lowest). At the beg<strong>in</strong>n<strong>in</strong>g of a game the<br />
cards are distributed evenly between the players, face down.<br />
<strong>The</strong> game beg<strong>in</strong>s when the player hold<strong>in</strong>g the Two of clubs<br />
card starts the first trick. <strong>The</strong> next trick is started by the w<strong>in</strong>ner<br />
of the previous trick. <strong>The</strong> other players, <strong>in</strong> clockwise order,<br />
must play a card of the same suit that started the trick, if they<br />
have any. If they do not have a card of that suit, they may<br />
play any card. <strong>The</strong> player who played the highest card of the<br />
suit which started the trick, w<strong>in</strong>s the trick.<br />
Each player scores penalty po<strong>in</strong>ts for some of the cards <strong>in</strong><br />
the tricks they won, therefore players usually want to avoid<br />
tak<strong>in</strong>g tricks. Each heart card scores one po<strong>in</strong>t, and the queen<br />
of spades card scores 13 po<strong>in</strong>ts. Tricks which conta<strong>in</strong> po<strong>in</strong>ts<br />
are called “pa<strong>in</strong>ted” tricks. 2 Each s<strong>in</strong>gle round has 13 tricks<br />
and distributes 26 po<strong>in</strong>ts among the players. Hearts is usually<br />
played as a tournament and the game does not end after the<br />
deck has been fully played. <strong>The</strong> game cont<strong>in</strong>ues until one of<br />
the players has reached or exceeded 100 po<strong>in</strong>ts (a predef<strong>in</strong>ed<br />
limit) at the conclusion of a trick. <strong>The</strong> player with the lowest<br />
score is declared the w<strong>in</strong>ner.<br />
While there are no formal partnerships <strong>in</strong> Hearts it is a very<br />
<strong>in</strong>terest<strong>in</strong>g doma<strong>in</strong> due to the specific po<strong>in</strong>t-tak<strong>in</strong>g rules. When<br />
play<strong>in</strong>g Hearts <strong>in</strong> a tournament, players might f<strong>in</strong>d that their<br />
best <strong>in</strong>terest is to help each other and oppose the leader. For<br />
example, when one of the players is lead<strong>in</strong>g by a large marg<strong>in</strong>,<br />
it will be <strong>in</strong> the best <strong>in</strong>terest of his opponents to give him<br />
po<strong>in</strong>ts, as it will decrease its advantage. Similarly, when there<br />
is a weak player whose po<strong>in</strong>t status is close to the tournament<br />
limit, his opponents might sacrifice by tak<strong>in</strong>g pa<strong>in</strong>ted tricks<br />
themselves, as a way to assure that the tournament will not end<br />
(which keeps their hopes of w<strong>in</strong>n<strong>in</strong>g). This <strong>in</strong>ternal structure<br />
of the game calls for use of the <strong>MP</strong>-Mix <strong>algorithm</strong>.<br />
2) Experiments’ design: We implemented a Hearts play<strong>in</strong>g<br />
environment and experimented with the follow<strong>in</strong>g players:<br />
1) Random (RND) - This player selects the next move<br />
randomly from the set of allowable moves.<br />
2) Weak rational (WRT) - This player picks the lowest<br />
possible card if he is start<strong>in</strong>g or follow<strong>in</strong>g a trick, and<br />
picks the highest card if it does not need to follow suit.<br />
3) MaxN (MAXN) - Runs the MaxN <strong>algorithm</strong>.<br />
4) Paranoid (PAR) - Runs the Paranoid <strong>algorithm</strong>.<br />
5) <strong>MP</strong>-Mix (<strong>MIX</strong>) - Runs the <strong>MP</strong>-Mix <strong>algorithm</strong> (thresholds<br />
are given as <strong>in</strong>put).<br />
In Hearts, players cannot view their opponent’s hands. In<br />
order to deal with the imperfect nature of the game the <strong>algorithm</strong><br />
uses a Monte-Carlo sampl<strong>in</strong>g based technique (adopted<br />
from [2]) with a uniform distribution function on the cards.<br />
It randomly simulates the opponent’s cards a large number<br />
of times (fixed to 1000 <strong>in</strong> our experiments), runs the search<br />
on each of the simulated hands and selects a card to play.<br />
<strong>The</strong> card f<strong>in</strong>ally played is the one that was selected the most<br />
among all simulations. <strong>The</strong> sampl<strong>in</strong>g technique is crucial <strong>in</strong><br />
2 In our variation of the game we did not use the “shoot the moon” rule <strong>in</strong><br />
order to simplify the heuristic construction process.