NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

More documents

Recommendations

Info

Applications of Reinforcement Learning in Gaming Domains Frank G. Glavin Michael G. Madden College of Engineering and Informatics, National University of Ireland, Galway Frank.Glavin@gmail.com, Michael.Madden@nuigalway.ie Abstract This paper introduces the concept of Reinforcement Learning (RL) and then describes the elements of a reinforcement learning system. Some related work is briefly mentioned and then some potentially relevant domains are discussed. The paper is concluded by stating the goal of this research. 1. Introduction Reinforcement learning is a branch of Artificial Intelligence in which a learner, often called an agent, interacts with an environment in order to achieve an explicit goal. The agent receives feedback for its actions in the form of numerical rewards. The agent learns from its interactions with the environment and aims to maximize the reward values that it receives over time. The agent must make a tradeoff between exploring the effects of taking novel actions and exploiting the knowledge that has been acquired from earlier exploration. 2. Reinforcement Learning System In addition to the agent and its environment, Sutton and Barto [1] have identified four primary sub-elements that form a reinforcement learning system. These are: a policy; a reward function; a value function; and a model of the environment. A policy is a definition of the proposed agent’s behaviour in a given situation. This is essentially a mapping from states to actions. The reward function assigns a single numeric reward value to each state in the environment to represent the desirability of being in the state. These values can be used as the basis of altering the agent’s policy. The value function estimates the amount of reward that an agent can expect to acquire from the current state over possible future states. The values are estimated with a view to increase the amount of rewards achieved over time. A model consists of the agents internal representation of the environment and is used to predict future states and rewards before they are actually experienced, which is useful for planning ahead. 3. Related Work Gaming environments have been widely used as test beds for reinforcement learning algorithms. One of the most successful applications was Gerald Tesauro’s TD- Gammon [2] which was developed in the early 1990s. This used an Artificial Neural Network which was trained using a temporal difference learning algorithm called TD-Lamda[3]. It achieved a level of play close to the top human players in the world. 9 4. Potentially Relevant Domains 4.1. RoboCup Soccer Tournament The objective of this annually held tournament is to promote research into robotics and Artificial Intelligence. Teams of researchers from around the world compete every year in both the robotic and software simulation competitions. The overall goal is to produce a team of fully autonomous humanoid robots that can play against, and beat, the current world cup holders of 2050[4]. 4.2. First Person Shooter (FPS) Bots As graphics in modern computer games move closer to photorealism, the emphasis is switching to improving in-game artificial intelligence. Rule-based and traditional scripting systems are being replaced by intelligent reinforcement learning agents. There has been some recent promising work in this area but there is plenty of scope for improvement. 4.3. Educational and Training Software This would involve creating an intelligent agent that could build up a user profile based on what it learns from the user’s interactions. This could possibly be applied to “brain training”, “typing tutor” or similar type games. The practicability of such an approach would have to be tested as there is very little reported work of reinforcement learning being applied to such problems in the literature. 5. Conclusion The overall goal of this research will be to examine the current state of the art in the application of Reinforcement Learning to gaming domains. Future work will involve identifying a specific gaming domain following an extensive literature review. Novel experimentation will then be carried out in this area. 6. References [1] Sutton, R. S., & A. G Barto, and E.F. Roberts, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998. [2] Tesauro, G., “TD-Gammon: A self-teaching backgammon program achieves master-level play.” Neural Computation, 6(2), 215-219, 1995. [3] Sutton, R. S., “Learning to predict by the methods of temporal differences” Machine Learning, 3, 9-44. 1988. [4] Http://www.robocup.org
Analysis and Evolution of Strategies within Turn Based Strategy War Games Ethan McMeekin and Colm O’Riordan CIRG, Information Technology, College of Engineering and Informatics, N.U.I. Galway e.mcmeekin1@nuigalway.ie, colm.oriordan@nuigalway.ie Abstract The aim of this research is to provide an analysis of various playing strategies and styles within a turn based strategy war game. Moreover, we wish to use evolutionary computation techniques to find an optimal strategy or strategies in order to explore under which conditions these games are solvable. 1. Introduction Within the computer games industry is the Massively Multiplayer Online Role Playing Game (M.M.O.R.P.G.) genre which itself is extremely diverse, where some of these M.M.O.R.P.G.s have as many as 12 million subscribers [1]. There are several variations of turn-based, browser MMORPGs where players control an army and attempt to grow their army’s size and power in competition with other players. A few examples of this type of game include Gatewars.com, Sgancienwars.com and KingsofChaos.com. While the game mechanics behind each of these may be different, the general game play is similar. A common goal of all players is to gather resources to increase their unit productions so that their armies grow faster and hence will be able to gain further resources or recover from attacks. Each player must decide what portions of their population should be assigned to defence, offence, intelligence, counter-intelligence and income as building an impenetrable defence would require a large amount of units leaving you a negligible income to protect. Building a massive offence with little defence would leave you vulnerable to having your offence easily destroyed by an angered enemy. Investing in counter-intelligence can hide statistics about your account such as troop deployment, economy size, etc. from enemies who have not invested enough in intelligence. These balances between economy, defence, strike, covert and anti covert can be used to characterise multiple strategies of play. A few examples of these strategies include: turtle, sniper, tank and balanced. Turtles would typically be passive players who maintain relatively large defences. The opposite of a turtle would be a sniper which maintain relatively high offences, low defences and are very aggressive. Tanks would maintain relatively high offences and defences while being moderately aggressive. While a balanced player would maintain a moderate defences, offences and aggression. 10 2. Research Hypothesis Models and simulations can be created to analyse the various strategies available within a turn-based strategy game. A detailed understanding of each strategy’s advantages and disadvantages can be recorded, in various environments and situations within the game. Secondly, evolutionary computation techniques can be employed to find an optimal strategy or strategies in order to explore under which conditions these games are solvable. 3. Current Work A simulator based on the games available online has been constructed. This allows us to run the game at an accelerated pace under a completely controlled set of parameters. With the code in place to support all the actions a human player would be capable of making in one of these games, a finite state machine was designed to simulate the actions a player would make each turn within one of these games. The state machine transitions are controlled by an array of parameters which influence how aggressive the artificial player will behave, how much of their population they will assign to defence, offence, economics, etc. By varying this controlling array the artificial player can simulate players with different strategies. 4. Future Work Firstly by comparing how various defined strategies of play perform under a range of circumstances a report of the strengths and weakness of each strategy will be compiled, e.g. comparing how an aggressive player performs amongst a population of passive players. Later evolutionary computation techniques will be used to attempt to learn an optimum set of values for the controlling array of the artificial player under similar conditions to the earlier test cases, e.g. to find the strategy that performs best amongst a population of passive players. 5. References [1] Blizzard Entertainment, 07-Oct Press Release, 2010.
Page 1 and 2: NUI Galway - UL Alliance First Annu
Page 4 and 5: FULL TABLE OF CONTENTS 1 GAMES, VIS
Page 6 and 7: 4 MECHANICAL AND BIOMEDICAL ENGINEE
Page 8 and 9: 5.21 Detecting Topics and Events in
Page 10 and 11: 8.7 Modelling Extreme Flood Events
Page 12 and 13: GAMES, VISUALISATION & EDUCATION 1.
Page 14 and 15: Generation and Analysis of Graph St
Page 16 and 17: Evolution and Analysis of Strategie
Page 18 and 19: Abstract The delivery of multimedia
Page 22 and 23: Assessing the effects of interactiv
Page 24 and 25: Real-time depth map generation usin
Page 26 and 27: An analysis of the capability of pr
Page 28 and 29: Building Information Modelling duri
Page 30 and 31: Dwelling Energy Measurement Procedu
Page 32 and 33: Numerical Modelling of Tidal Turbin
Page 34 and 35: Energy Storage using Microencapsula
Page 36 and 37: Data Centre Energy Efficiency Mark
Page 38 and 39: An embodied energy and carbon asses
Page 40 and 41: SmartOp - Smart Buildings Operation
Page 42 and 43: Ocean Wave Energy Exploitation in D
Page 44 and 45: Future Smart Grid Synchronization C
Page 46 and 47: Web-Based Building Energy Usage Vis
Page 48 and 49: Image Recognition and Classificatio
Page 50 and 51: Android Based Multi-Feature Elderly
Page 52 and 53: Determining Subjects’ Activities
Page 54 and 55: New Analysis Techniques for ICU Dat
Page 56 and 57: National E-Prescribing Systems in I
Page 58 and 59: Using Mashups to Satisfy Personalis
Page 60 and 61: 3D Computational Modeling of Blood
Page 62 and 63: Experimental and Computational Inve
Page 64 and 65: Experimental Analysis of the Therma
Page 66 and 67: Simulating Actin Cytoskeleton Remod
Page 68 and 69: Computational Analysis of Transcath
Page 70 and 71:
An In vitro Shear Stress System for
Page 72 and 73:
Development of a Micropipette Aspir
Page 74 and 75:
A Computational Test-Bed to Examine
Page 76 and 77:
Computational Modeling of Ceramic-b
Page 78 and 79:
Multi-Scale Computational Modelling
Page 80 and 81:
Development of a mixed-mode cohesiv
Page 82 and 83:
Active Computational Modelling of C
Page 84 and 85:
Modelling the Management of Medical
Page 86 and 87:
SOCIAL MEDIA, SEARCH & RECOMMENDATI
Page 88 and 89:
Improving Twitter Search by Removin
Page 90 and 91:
Abstract The goal of this research
Page 92 and 93:
Generalized Blockmodeling Samantha
Page 94 and 95:
Life-Cycles and Mutual Effects of S
Page 96 and 97:
dcat: Searching Public Sector Infor
Page 98 and 99:
The Effect of User Features on Chur
Page 100 and 101:
User Similarity and Interaction in
Page 102 and 103:
Improving Categorisation in Social
Page 104 and 105:
Natural Language Queries on Enterpr
Page 106 and 107:
Studying Forum Dynamics from a User
Page 108 and 109:
Provenance in the Web of Data: a bu
Page 110 and 111:
Towards Social Descriptions of Serv
Page 112 and 113:
ENVIRONMENTAL ENGINEERING 6.1 Asses
Page 114 and 115:
Novel Agri-engineering solutions fo
Page 116 and 117:
Evaluation of amendments to control
Page 118 and 119:
Determination of optimal applicatio
Page 120 and 121:
Treatment of Piggery Wastewaters us
Page 122 and 123:
NEXT GENERATION INTERNET 7.1 Extens
Page 124 and 125:
Enabling Federation of Government M
Page 126 and 127:
Curated Entities for Enterprise Uma
Page 128 and 129:
Mobile Web + Social Web + Semantic
Page 130 and 131:
Engaging Citizens in the Policy-Mak
Page 132 and 133:
Preference-based Discovery of Dynam
Page 134 and 135:
RDF On the Go: An RDF Storage and Q
Page 136 and 137:
Policy Modeling meets Linked Open D
Page 138 and 139:
A Contextualized Perspective for Li
Page 140 and 141:
Improving discovery in Life Science
Page 142 and 143:
The Semantic Public Service Portal
Page 144 and 145:
Personalized Content Delivery on Mo
Page 146 and 147:
A Framework to Describe Localisatio
Page 148 and 149:
The influence of secondary settleme
Page 150 and 151:
Analysis of Shear Transfer in Void-
Page 152 and 153:
Cost-Effective Sustainable Construc
Page 154 and 155:
Modelling Extreme Flood Events due
Page 156 and 157:
Axial Load Capacity of a Driven Cas
Page 158 and 159:
Chemical amendment of dairy cattle
Page 160 and 161:
Seismic Design of Concentrically Br
Page 162 and 163:
MODELLING, ALGORITHMS & CONTROL 9.1
Page 164 and 165:
Eigen-based Approach for Leverage P
Page 166 and 167:
Evolutionary Modelling of Industria
Page 168 and 169:
Abstract: Graphical Semantic Wiki f
Page 170 and 171:
Low Coverage Genome Assembly Using
Page 172 and 173:
Evolving a Robust Open-Ended Langua
Page 174 and 175:
Context Stamp - A Topic-based Conte
Page 176 and 177:
DSP-Based Control of Multi-Rail DC-
Page 178 and 179:
Topographical Cues - Controlling Ce
Page 180 and 181:
Creep Relaxation and Crack Growth P
Page 182 and 183:
Finite Element Modelling of Failure
Page 184 and 185:
Influence of Fluorine and Nitrogen
Page 186 and 187:
Phase Decompositions of Bioceramic
Page 188 and 189:
High Resolution Microscopical Analy
Page 190 and 191:
An Experimental and Numerical Analy
Page 192 and 193:
Thermomechanical characterisation o
Page 194 and 195:
A multiaxial damage mechanics metho
Page 196:
The effect of citrate ester plastic
show all

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

Create successful ePaper yourself

Delete template?

Save as template?