29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Applications of Reinforcement Learning in Gaming Domains<br />

Frank G. Glavin Michael G. Madden<br />

College of Engineering and Informatics, National University of Ireland, <strong>Galway</strong><br />

Frank.Glavin@gmail.com, Michael.Madden@nuigalway.ie<br />

Abstract<br />

This paper introduces the concept of Reinforcement<br />

Learning (RL) and then describes the elements of a<br />

reinforcement learning system. Some related work is<br />

briefly mentioned and then some potentially relevant<br />

domains are discussed. The paper is concluded by<br />

stating the goal of this research.<br />

1. Introduction<br />

Reinforcement learning is a branch of Artificial<br />

Intelligence in which a learner, often called an agent,<br />

interacts with an environment in order to achieve an<br />

explicit goal. The agent receives feedback for its actions<br />

in the form of numerical rewards. The agent learns from<br />

its interactions with the environment and aims to<br />

maximize the reward values that it receives over time.<br />

The agent must make a tradeoff between exploring the<br />

effects of taking novel actions and exploiting the<br />

knowledge that has been acquired from earlier<br />

exploration.<br />

2. Reinforcement Learning System<br />

In addition to the agent and its environment, Sutton<br />

and Barto [1] have identified four primary sub-elements<br />

that form a reinforcement learning system. These are: a<br />

policy; a reward function; a value function; and a model<br />

of the environment. A policy is a definition of the<br />

proposed agent’s behaviour in a given situation. This is<br />

essentially a mapping from states to actions. The reward<br />

function assigns a single numeric reward value to each<br />

state in the environment to represent the desirability of<br />

being in the state. These values can be used as the basis<br />

of altering the agent’s policy. The value function<br />

estimates the amount of reward that an agent can expect<br />

to acquire from the current state over possible future<br />

states. The values are estimated with a view to increase<br />

the amount of rewards achieved over time. A model<br />

consists of the agents internal representation of the<br />

environment and is used to predict future states and<br />

rewards before they are actually experienced, which is<br />

useful for planning ahead.<br />

3. Related Work<br />

Gaming environments have been widely used as test<br />

beds for reinforcement learning algorithms. One of the<br />

most successful applications was Gerald Tesauro’s TD-<br />

Gammon [2] which was developed in the early 1990s.<br />

This used an Artificial Neural Network which was<br />

trained using a temporal difference learning algorithm<br />

called TD-Lamda[3]. It achieved a level of play close to<br />

the top human players in the world.<br />

9<br />

4. Potentially Relevant Domains<br />

4.1. RoboCup Soccer Tournament<br />

The objective of this annually held tournament is to<br />

promote research into robotics and Artificial<br />

Intelligence. Teams of researchers from around the<br />

world compete every year in both the robotic and<br />

software simulation competitions. The overall goal is to<br />

produce a team of fully autonomous humanoid robots<br />

that can play against, and beat, the current world cup<br />

holders of 2050[4].<br />

4.2. <strong>First</strong> Person Shooter (FPS) Bots<br />

As graphics in modern computer games move closer<br />

to photorealism, the emphasis is switching to improving<br />

in-game artificial intelligence. Rule-based and<br />

traditional scripting systems are being replaced by<br />

intelligent reinforcement learning agents. There has<br />

been some recent promising work in this area but there<br />

is plenty of scope for improvement.<br />

4.3. Educational and Training Software<br />

This would involve creating an intelligent agent that<br />

could build up a user profile based on what it learns<br />

from the user’s interactions. This could possibly be<br />

applied to “brain training”, “typing tutor” or similar<br />

type games. The practicability of such an approach<br />

would have to be tested as there is very little reported<br />

work of reinforcement learning being applied to such<br />

problems in the literature.<br />

5. Conclusion<br />

The overall goal of this research will be to examine the<br />

current state of the art in the application of<br />

Reinforcement Learning to gaming domains. Future<br />

work will involve identifying a specific gaming domain<br />

following an extensive literature review. Novel<br />

experimentation will then be carried out in this area.<br />

6. References<br />

[1] Sutton, R. S., & A. G Barto, and E.F. Roberts,<br />

Reinforcement Learning: An Introduction, MIT Press,<br />

Cambridge, MA, 1998.<br />

[2] Tesauro, G., “TD-Gammon: A self-teaching backgammon<br />

program achieves master-level play.” Neural<br />

Computation, 6(2), 215-219, 1995.<br />

[3] Sutton, R. S., “Learning to predict by the methods of<br />

temporal differences” Machine Learning, 3, 9-44.<br />

1988.<br />

[4] Http://www.robocup.org

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!