12.01.2015 Views

Evolutionary Computation : A Unified Approach

Evolutionary Computation : A Unified Approach

Evolutionary Computation : A Unified Approach

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

108 CHAPTER 5. EVOLUTIONARY ALGORITHMS AS PROBLEM SOLVERS<br />

of models of sufficient descriptive power are generally so large as to prohibit any form of<br />

systematic search. However, as we discussed in the previous section, EAs can often be used<br />

to search large complex spaces effectively, including the model spaces of interest here.<br />

Consequently, ML-EAs tend to adopt a more top-down approach of searching a model<br />

space by means of a population of models that compete with each other, not unlike scientific<br />

theories in the natural sciences. In order for this to be effective, a notion of model “fitness”<br />

must be provided in order to bias the search process in a useful way. The first thought one<br />

might have is to define the fitness of a model in terms of its observed performance on the<br />

provided training examples. However, just as we see in other approaches, ML-EAs with no<br />

other feedback or bias will tend to overfit the training data at the expense of generality.<br />

So, for example, if an ML-EA is attempting to learn a set of classification rules using only<br />

training set performance as the measure of fitness, it is not at all unusual to see near-perfect<br />

rule sets emerge that consist of approximately one rule per training example!<br />

Stated another way, for a given set of training examples, there are a large number of<br />

theories (models) that have identical performance on the training data, but can have quite<br />

different predictive power. Since, by definition, we cannot learn from unseen examples, this<br />

generality must be achieved by other means. A standard technique for doing so is to adopt<br />

some sort of “Occam’s razor” approach: all other things being equal, select a simpler model<br />

over a more complex one.<br />

For ML-EAs this typically is achieved by augmenting the fitness function to include both<br />

performance of the training data and the parsimony of the model. Precise measurements<br />

of parsimony can be quite difficult to define in general. However, rough estimates based on<br />

the size of a model measured in terms of its basic building blocks have been shown to be<br />

surprisingly effective. For example, using the number of rules in a rule set or the number of<br />

hidden nodes in an artificial neural network as an estimate of parsimony works quite well.<br />

Somewhat more difficult from an ML-EA designer’s point of view is finding an appropriate<br />

balance between parsimony and performance. If one puts too much weight on generality,<br />

performance will suffer and vice versa. As we will see in a later section in this chapter, how<br />

best to apply EAs to problems involving multiple conflicting objectives is a challenging<br />

problem in general. To keep things simple, most ML-EAs adopt a fairly direct approach<br />

such as:<br />

fitness(model) =performance(model) − w ∗ parsimony(model)<br />

where the weight w is empirically chosen to discourage overfitting, and parsimony is a simple<br />

linear or quadratic function of model size (Smith, 1983; Bassett and De Jong, 2000).<br />

Just as was the case in the previous sections, applying EAs to machine-learning problems<br />

is not some sort of magic wand that renders existing ML techniques obsolete. Rather, ML-<br />

EAs complement existing approaches in a number of useful ways:<br />

• Many of the existing ML techniques are designed to learn “one-shot” classification<br />

tasks, in which problems are presented as precomputed feature vectors to be classified<br />

as belonging to a fixed set of categories. When ML-EAs are applied to such problems,<br />

they tend to converge more slowly than traditional ML techniques but often to more<br />

parsimonious models (De Jong, 1988).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!