16.01.2013 Views

An Introduction to Genetic Algorithms - Boente

An Introduction to Genetic Algorithms - Boente

An Introduction to Genetic Algorithms - Boente

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 3.6: A schematic illustration of the components of an agent in ERL. The agent's genotype is a bit string<br />

that encodes the weights of two neural networks: an evaluation network that maps the agent's current state <strong>to</strong><br />

an evaluation of that state, and an action network that maps the agent's current state <strong>to</strong> an action <strong>to</strong> be taken at<br />

the next time step. The weights on the evaluation network are constant during an agent's lifetime but the<br />

weights on the action network can be modified via a reinforcement learning method that takes its positive or<br />

negative signal from the evaluation network. (The networks displayed here are simplified for clarity.) The<br />

agent's genotype is not modified by this learning procedure, and only the genotype is passed from an agent <strong>to</strong><br />

its offspring. (Reprinted from Chris<strong>to</strong>pher G. Lang<strong>to</strong>n et al., eds., Artificial Life: Volume II; © 1992<br />

Addison−Wesley Publishing Company, Inc. Reprinted by permission of the publisher.)<br />

Evolutionary Reinforcement Learning<br />

Chapter 3: <strong>Genetic</strong> <strong>Algorithms</strong> in Scientific Models<br />

A second computational demonstration of the Baldwin effect was given by David Ackley and Michael<br />

Littman (1992). Their primary goal was <strong>to</strong> incorporate "reinforcement learning" (an unsupervised learning<br />

method) in<strong>to</strong> an evolutionary framework and <strong>to</strong> see whether evolution could produce individuals that not only<br />

behaved appropriately but also could correctly evaluate the situations they encountered as beneficial or<br />

dangerous for future survival. In Ackley and Littman's Evolutionary Reinforcement Learning (ERL) model,<br />

individuals ("agents") move randomly on a finite two−dimensional lattice, encountering food, preda<strong>to</strong>rs,<br />

hiding places, and other types of entities. Each agent's "state" includes the entities in its visual range, the level<br />

of its internal energy s<strong>to</strong>re, and other parameters.<br />

The components making up an individual are illustrated schematically in figure 3.6. Each agent possesses two<br />

feedforward neural networks: an evaluation network that takes as input the agent's state at time t and produces<br />

on its single output unit an activation representing a judgment about how good that state is for the agent, and<br />

an action network that takes as input the agent's state at time t and produces on its two output units a code for<br />

the action the agent is <strong>to</strong> take on that time step. The only possible actions are moves from the current lattice<br />

site <strong>to</strong> one of the four neighboring sites, but actions can result in eating, being eaten, and other less radical<br />

consequences. The architectures of these two networks are common <strong>to</strong> all agents, but the weights on the links<br />

can vary between agents. The weights on a given agent's evaluation network are fixed from birth—this<br />

network represents innate goals and desires inherited from the agent's ances<strong>to</strong>rs (e.g., "being near food is<br />

good"). The weights on the action network change over the agent's lifetime according <strong>to</strong> a<br />

reinforcementlearning algorithm that is a combination of back−propagation and standard reinforcement<br />

learning.<br />

<strong>An</strong> agent's genome is a bit string encoding the permanent weights for the evaluation network and the initial<br />

weights for the action network. The network architectures are such that there are 84 possible weights, each<br />

encoded by four bits. The length of a chromosome is thus 336 bits.<br />

Agents have an internal energy s<strong>to</strong>re (represented by a real number) which must be kept above a certain level<br />

<strong>to</strong> prevent death; this is accomplished by eating food that is encountered as the agent moves from site <strong>to</strong> site<br />

on the lattice. <strong>An</strong> agent must also avoid preda<strong>to</strong>rs, or it will be killed. <strong>An</strong> agent can reproduce once it has<br />

72

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!