Bayesian Programming and Learning for Multi-Player Video Games ...
Bayesian Programming and Learning for Multi-Player Video Games ...
Bayesian Programming and Learning for Multi-Player Video Games ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
9.2.4 Inter-game Adaptation (Meta-game)<br />
In a given match, <strong>and</strong>/or against a given player, players tend to learn from their immediate<br />
mistakes, <strong>and</strong> they adapt their strategies to each other’s. As this can be seen as a continuous<br />
learning problem (rein<strong>for</strong>cement learning, exploration-exploitation trade-off), there is more to<br />
it. Human players call this the meta-game*, as they enter the “I think that he thinks that I<br />
think...” game until arriving at fixed points. This “meta-game” is closely related to the balance<br />
of the game, <strong>and</strong> the fact that there are several equilibriums makes the interest of StarCraft.<br />
Also, clearly, <strong>for</strong> human players there is psychology involved.<br />
Continuous learning<br />
For all strategic models, the possible improvements (subsections 7.5.3, 7.6.3, 7.7.2) would include<br />
to learn specific sets of parameters against the opponent’s strategies. The problem here is that<br />
(contrary to battles/tactics) there are not much observations (games against a given opponent) to<br />
learn from. For instance, a naive approach would be to learn a Laplace’s law of succession directly<br />
on P (ET echT rees = ett|P layer = p) = 1+nbgames(ett,p)<br />
#ET T +nbgames(p) , <strong>and</strong> do the same <strong>for</strong> EClusters,<br />
but this could require several games. Even if we see more than one tech tree per game <strong>for</strong> the<br />
opponent, a few games will still only show a sparse subset of ET T . Another part of this problem<br />
could arise if we want to learn really in-game as we would only have partial observations.<br />
Manish Meta [2010] approached “meta-level behavior adaptation in RTS games” as a mean<br />
<strong>for</strong> their case-based planning AI to learn from its own failures in Wargus. The opponent is not<br />
considered at all, but this could be an interesting entry point to discover bugs or flaws in the<br />
bot’s parameters.<br />
Bot’s psychological warfare<br />
Our main idea <strong>for</strong> real meta-game playing by our AI would be to use our models recursively. As<br />
<strong>for</strong> some of our models (tactics & strategy), that can be used both <strong>for</strong> prediction <strong>and</strong> decisionmaking,<br />
we could have a full model of the enemy by maintaining the state of a model from them,<br />
with their inputs (<strong>and</strong> continually learn some of the parameters). For instance, if we have our<br />
army adaptation model <strong>for</strong> ourselves <strong>and</strong> <strong>for</strong> the opponent, we need to incorporate the output<br />
of their model as an input of our model, in the part which predicts the opponent’s future army<br />
composition. If we cycle (iterate) the reasoning (“I will produce this army because they will have<br />
this one”...), we should reach these meta-game equilibriums.<br />
Final words<br />
Finally, bots are as far from having a psychological model of their opponent as from beating the<br />
best human players. I believe that this is our adaptability, our continuous learning, which allows<br />
human players (even simply “good” ones like me) to beat RTS games bots consistently. When<br />
robotic AI will start winning against human players, we may want to hinder them to have only<br />
partial vision of the world (as humans do through a screen) <strong>and</strong> a limited number of concurrent<br />
actions (humans use a keyboard <strong>and</strong> a mouse so they have limited APM*). At this point, they<br />
will need an attention model <strong>and</strong> some <strong>for</strong>m of hierarchical action selection. Be<strong>for</strong>e that, all the<br />
problems arose in this thesis should be solved, at least at the scale of the RTS domain.<br />
176