An Introduction to Genetic Algorithms - Boente
An Introduction to Genetic Algorithms - Boente
An Introduction to Genetic Algorithms - Boente
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 2: <strong>Genetic</strong> <strong>Algorithms</strong> in Problem Solving<br />
space the GA searched was thus 2 128 —far <strong>to</strong>o large for any kind of exhaustive search.<br />
In our main set of experiments, we set N = 149 (chosen <strong>to</strong> be reasonably large but not computationally<br />
intractable). The GA began with a population of 100 randomly generated chromosomes (generated with some<br />
initial biases—see Mitchell, Crutchfield, and Hraber 1994a, for details). The fitness of a rule in the population<br />
was calculated by (i) randomly choosing 100 ICs (initial configurations) that are uniformly distributed over Á<br />
Î [0.0,1.0], with exactly half with ÁÁc and half with ÁÁc, (ii) running the rule on each IC either until it arrives<br />
at a fixed point or for a maximum of approximately 2N time steps, and (iii) determining whether the final<br />
pattern is correct—i.e., N zeros for Á0Ác and N ones for Á0Ác. The initial density, Á0, was never exactly since<br />
N was chosen <strong>to</strong> be odd. The rule's fitness, f100, was the fraction of the 100 ICs on which the rule produced the<br />
correct final pattern. No partial credit was given for partially correct final configurations.<br />
A few comments about the fitness function are in order. First, as was the case in Hillis's sorting−networks<br />
project, the number of possible input cases (2 149 for N = 149) was far <strong>to</strong>o large <strong>to</strong> test exhaustively. Instead,<br />
the GA sampled a different set of 100 ICs at each generation. In addition, the ICs were not sampled from an<br />
unbiased distribution (i.e., equal probability of a one or a zero at each site in the IC), but rather from a flat<br />
distribution across Á Î [0,1] (i.e., ICs of each density from Á = 0 <strong>to</strong> Á = 1 were approximately equally<br />
represented). This flat distribution was used because the unbiased distribution is binomially distributed and<br />
thus very strongly peaked at . The ICs selected from such a distribution will likely all have , the<br />
hardest cases <strong>to</strong> classify. Using an unbiased sample made it <strong>to</strong>o difficult for the GA <strong>to</strong> ever find any<br />
high−fitness CAs. (As will be discussed below, this biased distribution turns out <strong>to</strong> impede the GA in later<br />
generations: as increasingly fit rules are evolved, the IC sample becomes less and less challenging for the<br />
GA.)<br />
Our version of the GA worked as follows. In each generation, (i) a new set of 100 ICs was generated, (ii) f100<br />
was calculated for each rule in the population, (iii) the population was ranked in order of fitness, (iv) the 20<br />
highest−fitness ("elite") rules were copied <strong>to</strong> the next generation without modification, and (v) the remaining<br />
80 rules for the next generation were formed by single−point crossovers between randomly chosen pairs of<br />
elite rules. The parent rules were chosen from the elite with replacement—that is, an elite rule was permitted<br />
<strong>to</strong> be chosen any number of times. The offspring from each crossover were each mutated twice. This process<br />
was repeated for 100 generations for a single run of the GA. (More details of the implementation are given in<br />
Mitchell, Crutchfield, and Hraber 1994a.)<br />
Note that this version of the GA differs from the simple GA in several ways. First, rather than selecting<br />
parents with probability proportional <strong>to</strong> fitness, the rules are ranked and selection is done at random from the<br />
<strong>to</strong>p 20% of the population. Moreover, all of the <strong>to</strong>p 20% are copied without modification <strong>to</strong> the next<br />
generation, and only the bot<strong>to</strong>m 80% are replaced. This is similar <strong>to</strong> the selection method—called "(¼ +<br />
»)"—used in some evolution strategies; see Back, Hoffmeister, and Schwefel 1991.<br />
This version of the GA was the one used by Packard (1988), so we used it in our experiments attempting <strong>to</strong><br />
replicate his work (Mitchell, Hraber, and Crutchfield 1993) and in our subsequent experiments. Selecting<br />
parents by rank rather than by absolute fitness prevents initially stronger individuals from quickly dominating<br />
the population and driving the genetic diversity down <strong>to</strong>o early. Also, since testing a rule on 100 ICs provides<br />
only an approximate gauge of the true fitness, saving the <strong>to</strong>p 20% of the rules was a good way of making a<br />
"first cut" and allowing rules that survive <strong>to</strong> be tested over more ICs. Since a new set of ICs was produced<br />
every generation, rules that were copied without modification were always retested on this new set. If a rule<br />
performed well and thus survived over a large number of generations, then it was likely <strong>to</strong> be a genuinely<br />
better rule than those that were not selected, since it was tested with a large set of ICs. <strong>An</strong> alternative method<br />
would be <strong>to</strong> test every rule in each generation on a much larger set of ICs, but this would waste computation<br />
time. Too much effort, for example, would go in<strong>to</strong> testing very weak rules, which can safely be weeded out<br />
early using our method. As in most applications, evaluating the fitness function (here, iterating each CA) takes<br />
37