NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Applications of Reinforcement Learning in Gaming Domains<br />
Frank G. Glavin Michael G. Madden<br />
College of Engineering and Informatics, National University of Ireland, <strong>Galway</strong><br />
Frank.Glavin@gmail.com, Michael.Madden@nuigalway.ie<br />
Abstract<br />
This paper introduces the concept of Reinforcement<br />
Learning (RL) and then describes the elements of a<br />
reinforcement learning system. Some related work is<br />
briefly mentioned and then some potentially relevant<br />
domains are discussed. The paper is concluded by<br />
stating the goal of this research.<br />
1. Introduction<br />
Reinforcement learning is a branch of Artificial<br />
Intelligence in which a learner, often called an agent,<br />
interacts with an environment in order to achieve an<br />
explicit goal. The agent receives feedback for its actions<br />
in the form of numerical rewards. The agent learns from<br />
its interactions with the environment and aims to<br />
maximize the reward values that it receives over time.<br />
The agent must make a tradeoff between exploring the<br />
effects of taking novel actions and exploiting the<br />
knowledge that has been acquired from earlier<br />
exploration.<br />
2. Reinforcement Learning System<br />
In addition to the agent and its environment, Sutton<br />
and Barto [1] have identified four primary sub-elements<br />
that form a reinforcement learning system. These are: a<br />
policy; a reward function; a value function; and a model<br />
of the environment. A policy is a definition of the<br />
proposed agent’s behaviour in a given situation. This is<br />
essentially a mapping from states to actions. The reward<br />
function assigns a single numeric reward value to each<br />
state in the environment to represent the desirability of<br />
being in the state. These values can be used as the basis<br />
of altering the agent’s policy. The value function<br />
estimates the amount of reward that an agent can expect<br />
to acquire from the current state over possible future<br />
states. The values are estimated with a view to increase<br />
the amount of rewards achieved over time. A model<br />
consists of the agents internal representation of the<br />
environment and is used to predict future states and<br />
rewards before they are actually experienced, which is<br />
useful for planning ahead.<br />
3. Related Work<br />
Gaming environments have been widely used as test<br />
beds for reinforcement learning algorithms. One of the<br />
most successful applications was Gerald Tesauro’s TD-<br />
Gammon [2] which was developed in the early 1990s.<br />
This used an Artificial Neural Network which was<br />
trained using a temporal difference learning algorithm<br />
called TD-Lamda[3]. It achieved a level of play close to<br />
the top human players in the world.<br />
9<br />
4. Potentially Relevant Domains<br />
4.1. RoboCup Soccer Tournament<br />
The objective of this annually held tournament is to<br />
promote research into robotics and Artificial<br />
Intelligence. Teams of researchers from around the<br />
world compete every year in both the robotic and<br />
software simulation competitions. The overall goal is to<br />
produce a team of fully autonomous humanoid robots<br />
that can play against, and beat, the current world cup<br />
holders of 2050[4].<br />
4.2. <strong>First</strong> Person Shooter (FPS) Bots<br />
As graphics in modern computer games move closer<br />
to photorealism, the emphasis is switching to improving<br />
in-game artificial intelligence. Rule-based and<br />
traditional scripting systems are being replaced by<br />
intelligent reinforcement learning agents. There has<br />
been some recent promising work in this area but there<br />
is plenty of scope for improvement.<br />
4.3. Educational and Training Software<br />
This would involve creating an intelligent agent that<br />
could build up a user profile based on what it learns<br />
from the user’s interactions. This could possibly be<br />
applied to “brain training”, “typing tutor” or similar<br />
type games. The practicability of such an approach<br />
would have to be tested as there is very little reported<br />
work of reinforcement learning being applied to such<br />
problems in the literature.<br />
5. Conclusion<br />
The overall goal of this research will be to examine the<br />
current state of the art in the application of<br />
Reinforcement Learning to gaming domains. Future<br />
work will involve identifying a specific gaming domain<br />
following an extensive literature review. Novel<br />
experimentation will then be carried out in this area.<br />
6. References<br />
[1] Sutton, R. S., & A. G Barto, and E.F. Roberts,<br />
Reinforcement Learning: An Introduction, MIT Press,<br />
Cambridge, MA, 1998.<br />
[2] Tesauro, G., “TD-Gammon: A self-teaching backgammon<br />
program achieves master-level play.” Neural<br />
Computation, 6(2), 215-219, 1995.<br />
[3] Sutton, R. S., “Learning to predict by the methods of<br />
temporal differences” Machine Learning, 3, 9-44.<br />
1988.<br />
[4] Http://www.robocup.org