15.12.2012 Views

Bayesian Programming and Learning for Multi-Player Video Games ...

Bayesian Programming and Learning for Multi-Player Video Games ...

Bayesian Programming and Learning for Multi-Player Video Games ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 6.9 displays the mean P(A, H) <strong>for</strong> Terran (in TvZ) attacks decision-making<br />

<strong>for</strong> the most 32 probable type/region tactical couples. It is in this kind of l<strong>and</strong>scape<br />

(though more steep because Fig. 6.9 is a mean) that we sample (or pick the most<br />

probable couple) to take a decision.<br />

Figure 6.9: Mean P(A, H) <strong>for</strong> all H values <strong>and</strong> the top 8 P(Ai, Hi) values, <strong>for</strong> Terran in TvZ.<br />

The larger the white square area, the higher P(Ai, Hi). A simple way of taking a tactical decision<br />

according to this model, <strong>and</strong> the learned parameters, is by sampling in this distribution.<br />

Finally, we can steer our technological growth towards the opponent’s weaknesses. A question<br />

that we can ask our model (at time t) is P(T T ), or, in two parts: we first find i, hi which maximize<br />

P(A1:n, H1:n) at time t + 1, <strong>and</strong> then ask a more directive:<br />

P(T T |hi) ∝ �<br />

P(hi|HP )P(HP |T T )P(T T )<br />

HP<br />

so that it gives us a distribution on the tech trees (T T ) needed to be able to per<strong>for</strong>m the wanted<br />

attack type. To take a decision on our technology direction, we can consider the distances<br />

between our current tt t <strong>and</strong> all the probable values of T T t+1 .<br />

6.7 Discussion<br />

6.7.1 In-game learning<br />

The prior on A can be tailored to the opponent (P(A|opponent)). In a given match, it should<br />

be initialized to uni<strong>for</strong>m <strong>and</strong> progressively learn the preferred attack regions of the opponent<br />

<strong>for</strong> better predictions. We can also penalize or reward ourselves <strong>and</strong> learn the regions in which<br />

our attacks fail or succeed <strong>for</strong> decision-making. Both these can easily be done with some <strong>for</strong>m<br />

of weighted counting (“add-n smoothing”) or even Laplace’s rule of succession.<br />

In match situation against a given opponent, <strong>for</strong> inputs that we can unequivocally attribute<br />

to their intention (style <strong>and</strong> general strategy), we can also refine the probability tables<br />

of P(E, T, T A, B|A, opponent), P(AD, GD, ID|H, opponent) <strong>and</strong> P(H|HP, opponent) (with<br />

Laplace’s rule of succession). For instance, we can refine �<br />

E,T,T A P(E, T, T A, B|A, opponent)<br />

corresponding to their aggressiveness (aggro) or our successes <strong>and</strong> failures. Indeed, if we sum<br />

over E, T <strong>and</strong> T A, we consider the inclination of our opponent to venture into enemy territory<br />

or the interest that we have to do so by counting our successes with aggressive or defensive<br />

112

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!