15.12.2012 Views

Bayesian Programming and Learning for Multi-Player Video Games ...

Bayesian Programming and Learning for Multi-Player Video Games ...

Bayesian Programming and Learning for Multi-Player Video Games ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

9.2.4 Inter-game Adaptation (Meta-game)<br />

In a given match, <strong>and</strong>/or against a given player, players tend to learn from their immediate<br />

mistakes, <strong>and</strong> they adapt their strategies to each other’s. As this can be seen as a continuous<br />

learning problem (rein<strong>for</strong>cement learning, exploration-exploitation trade-off), there is more to<br />

it. Human players call this the meta-game*, as they enter the “I think that he thinks that I<br />

think...” game until arriving at fixed points. This “meta-game” is closely related to the balance<br />

of the game, <strong>and</strong> the fact that there are several equilibriums makes the interest of StarCraft.<br />

Also, clearly, <strong>for</strong> human players there is psychology involved.<br />

Continuous learning<br />

For all strategic models, the possible improvements (subsections 7.5.3, 7.6.3, 7.7.2) would include<br />

to learn specific sets of parameters against the opponent’s strategies. The problem here is that<br />

(contrary to battles/tactics) there are not much observations (games against a given opponent) to<br />

learn from. For instance, a naive approach would be to learn a Laplace’s law of succession directly<br />

on P (ET echT rees = ett|P layer = p) = 1+nbgames(ett,p)<br />

#ET T +nbgames(p) , <strong>and</strong> do the same <strong>for</strong> EClusters,<br />

but this could require several games. Even if we see more than one tech tree per game <strong>for</strong> the<br />

opponent, a few games will still only show a sparse subset of ET T . Another part of this problem<br />

could arise if we want to learn really in-game as we would only have partial observations.<br />

Manish Meta [2010] approached “meta-level behavior adaptation in RTS games” as a mean<br />

<strong>for</strong> their case-based planning AI to learn from its own failures in Wargus. The opponent is not<br />

considered at all, but this could be an interesting entry point to discover bugs or flaws in the<br />

bot’s parameters.<br />

Bot’s psychological warfare<br />

Our main idea <strong>for</strong> real meta-game playing by our AI would be to use our models recursively. As<br />

<strong>for</strong> some of our models (tactics & strategy), that can be used both <strong>for</strong> prediction <strong>and</strong> decisionmaking,<br />

we could have a full model of the enemy by maintaining the state of a model from them,<br />

with their inputs (<strong>and</strong> continually learn some of the parameters). For instance, if we have our<br />

army adaptation model <strong>for</strong> ourselves <strong>and</strong> <strong>for</strong> the opponent, we need to incorporate the output<br />

of their model as an input of our model, in the part which predicts the opponent’s future army<br />

composition. If we cycle (iterate) the reasoning (“I will produce this army because they will have<br />

this one”...), we should reach these meta-game equilibriums.<br />

Final words<br />

Finally, bots are as far from having a psychological model of their opponent as from beating the<br />

best human players. I believe that this is our adaptability, our continuous learning, which allows<br />

human players (even simply “good” ones like me) to beat RTS games bots consistently. When<br />

robotic AI will start winning against human players, we may want to hinder them to have only<br />

partial vision of the world (as humans do through a screen) <strong>and</strong> a limited number of concurrent<br />

actions (humans use a keyboard <strong>and</strong> a mouse so they have limited APM*). At this point, they<br />

will need an attention model <strong>and</strong> some <strong>for</strong>m of hierarchical action selection. Be<strong>for</strong>e that, all the<br />

problems arose in this thesis should be solved, at least at the scale of the RTS domain.<br />

176

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!