16.01.2013 Views

An Introduction to Genetic Algorithms - Boente

An Introduction to Genetic Algorithms - Boente

An Introduction to Genetic Algorithms - Boente

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

could be cut after the second locus <strong>to</strong> yield two strings: {(2, 0) (3,0)} and {(1,1) (4,1) (6,0)}. The splice<br />

opera<strong>to</strong>r takes two strings and splices them <strong>to</strong>gether. For example,<br />

could be spliced <strong>to</strong>gether <strong>to</strong> form<br />

Chapter 4: Theoretical Foundations of <strong>Genetic</strong> <strong>Algorithms</strong><br />

Under the messy encoding, cut and splice always produce perfectly legal strings. The hope is that the<br />

primordial phase will have produced all the building blocks needed <strong>to</strong> create an optimal string, and in<br />

sufficient numbers so that cut and splice will be likely <strong>to</strong> create that optimal string before <strong>to</strong>o long. Goldberg<br />

and his colleagues did not use mutation in the experiments they reported.<br />

Goldberg, Korb, and Deb (1989) performed a very rough mathematical analysis of this algorithm <strong>to</strong> argue<br />

why it should work better than a simple GA, and then showed empirically that it performed much better than a<br />

simple GA on a 30−bit deceptive problem. In this problem, the fitness function <strong>to</strong>ok a 30−bit string divided<br />

in<strong>to</strong> ten adjacent segments of three bits each. Each three−bit segment received a fixed score: 111 received the<br />

highest score, but 000 received the second highest and was a local optimum (thus making the problem<br />

deceptive). The score 5 of each three−bit segment was as follows: S(000) = 28; S(001) = 26; S(010) = 22;<br />

S(011) = 0; S(100) = 14; S(101) = 0; S(110) = 0; S(111) = 30. The fitness of a 30−bit string was the sum of the<br />

scores of each three−bit segment. The messy GA was also tried successfully on several variants of this fitness<br />

function (all of roughly the same size).<br />

Two immediate problems with this approach will jump out at the reader: (1) One must know (or guess) ahead<br />

of time the minimum useful schema order k, and it is not clear how one can do this a priori for a given fitness<br />

function. (2) Even if one could guess k, for problems of realistic size the combina<strong>to</strong>rics are intractable. For<br />

Goldberg, Korb, and Deb's (1989) k = 3, l = 30 problem, the primordial stage started off with a complete<br />

enumeration of the possible three−bit schemas in a 30−bit string. In general there are<br />

such schemas, where k is the order of each schema and l is the length of the entire string. (The derivation of<br />

this formula is left as an exercise.) For l = 30 and k = 3 and , n = 32, 480, a reasonably tractable number <strong>to</strong><br />

begin with. However, consider R1 (defined in chapter 4), in which l = 64 and k = 8 (a very reasonably sized<br />

problem for a GA). In that case, n H 10 12 . If each fitness evaluation <strong>to</strong>ok a millisecond, evaluating the initial<br />

population would take over 30 years! Since most messy−GA researchers do not have that kind of time, a new<br />

approach had <strong>to</strong> be found.<br />

Goldberg, Deb, Kargupta, and Harik (1993) refer—a bit <strong>to</strong>o calmly—<strong>to</strong> this combina<strong>to</strong>rial explosion as the<br />

"initialization bottleneck." Their proposed solution is <strong>to</strong> dispense with the complete enumeration of order−k<br />

schemas, and <strong>to</strong> replace it by a "probabilistically complete initialization." The combina<strong>to</strong>rics can be overcome<br />

in part by making the initial strings much longer than k (though shorter than l0, so that implicit parallelism<br />

provides many order−k schemas on one string. Let the initial string length be denoted by l', and let the initial<br />

population size be denoted by ng. Goldberg et al. calculate what pairs of l', and ng values will ensure that, on<br />

average, each schema of order k will be present in the initial population. If l' is increased, ng can be greatly<br />

decreased for the primordial stage.<br />

123

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!