27.06.2013 Views

Evolution and Optimum Seeking

Evolution and Optimum Seeking

Evolution and Optimum Seeking

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Evolution</strong> <strong>and</strong> <strong>Optimum</strong> <strong>Seeking</strong><br />

Hans-Paul Schwefel


Preface<br />

In 1963 two students at the Technical University of Berlin met <strong>and</strong> were soon to collaborate<br />

on experiments which used the wind tunnel of the Institute of Flow Engineering.<br />

During the search for the optimal shapes of bodies in a ow, which was then a matter<br />

of laborious intuitive experimentation, the idea was conceived of proceeding strategically.<br />

However, attempts with the coordinate <strong>and</strong> simple gradient strategies were unsuccessful.<br />

Then one of the students, Ingo Rechenberg, now Professor of Bionics <strong>and</strong> <strong>Evolution</strong>ary<br />

Engineering, hit upon the idea of trying r<strong>and</strong>om changes in the parameters de ning the<br />

shape, following the example of natural mutations. The evolution strategy was born. A<br />

third student, Peter Bienert, joined them <strong>and</strong> started the construction of an automatic<br />

experimenter, which would work according to the simple rules of mutation <strong>and</strong> selection.<br />

The second student, I myself, set about testing the e ciency of the new methods with<br />

the help of a Zuse Z23 computer for there were plenty of objections to these r<strong>and</strong>om<br />

strategies. In spite of an occasional lack of nancial support, the <strong>Evolution</strong>ary Engineering<br />

Group which had been formed held rmly together. Ingo Rechenberg received his<br />

doctorate in 1970 for the seminal thesis: Optimierung technischer Systeme nach Prinzipien<br />

der biologischen <strong>Evolution</strong>. It contains the theory of the two membered evolution<br />

strategy <strong>and</strong> a rst proposal for a multimembered strategy, which in the nomenclature introduced<br />

here, is of the ( +1) type. In the same year nancial support from the Deutsche<br />

Forschungsgemeinschaft (German Research Association) enabled the initiation of the work<br />

which comprises most of the present book. This work was concluded, at least temporarily,<br />

in 1974 with the thesis <strong>Evolution</strong>sstrategie und numerische Optimierung <strong>and</strong> published<br />

by Birkhauser, Basle, Switzerl<strong>and</strong>, in 1977 under the title Numerische Optimierung von<br />

Computer-Modellen mittels der <strong>Evolution</strong>sstrategie as well as by Wiley, Chichester, in<br />

1981 as monograph Numerical optimization of computer models.<br />

Between 1976 <strong>and</strong> 1985 the author was not able to continue his work in the eld of<br />

<strong>Evolution</strong> Strategies (nowadays abbreviated: ESs). The general interest in this type of<br />

optimum seeking algorithms was not broad enough for there to be nancial support. On<br />

the other h<strong>and</strong>, the number of articles, journals, <strong>and</strong> books devoted to (mathematical)<br />

optimization has increased tremendously.<br />

Looking back upon the development from 1964 on, when the rst ES version was devoted<br />

to experimental optimization, i.e., upon 30 years, or roughly one human generation, reveals<br />

three interesting facts:<br />

First, ESs are not at all outdated. On the contrary, three consecutive conferences<br />

on Parallel Problem Solving from Nature (PPSN) in 1990 (see Schwefel <strong>and</strong><br />

Manner, 1991), 1992 (Manner <strong>and</strong> M<strong>and</strong>erick, 1992), <strong>and</strong> 1994 (Davidor, Schwefel,<br />

<strong>and</strong> Manner, 1994) have demonstrated a revived <strong>and</strong> increasing interest.<br />

Secondly, the computational environment has changed over time, not only with<br />

respect to the number of (also personal) computers <strong>and</strong> their data processing power,<br />

but even more with respect to new architectures. MIMD (Multiple Instructions<br />

v


vi<br />

Multiple Data) machines with many processors working in parallel for one task<br />

seem to wait for inherently parallel problem solving concepts like ESs. Biological<br />

metaphors prevail within the new branch of Arti cial Intelligence, called Arti cial<br />

Life (AL).<br />

Third, updating this dissertation from 1974/1975 once more (after adding only a few<br />

pages to Chapter 7 in 1981) can be done without rewriting the bulk of the chapters on<br />

traditional approaches. Since the emphasis always has been centered on derivativefree<br />

direct optimum-seeking methods, it should be su cient to add material on three<br />

concepts now, i.e., Genetic Algorithms (GAs), Simulated Annealing (SA), <strong>and</strong> Tabu<br />

Search (TS). This was done with the new Sections 5.3 to 5.5 in Chapter 5.<br />

Another innovation is a oppy disk with all those procedures which had been used for the<br />

test series in the 1970s, along with a users' manual. Hopefully, some incorrectnesses have<br />

been deleted now, too.<br />

A rst thank goes again to my friend Dr. Mike Finnis whose translation of my German<br />

original into English still forms the core of this book. Thanks go also to those<br />

who helped me in completing this update, especially Ms. Heike Bracklo, who brought<br />

the scanned ASCII text into LaTeX formats, Mr. Ulrich Hermes, Mr. Jorn Mehnen, <strong>and</strong><br />

Mr. Joachim Sprave forthemany graphs <strong>and</strong> ready for use computer programs, as well as<br />

all those who helped in the process of proofreading the complete work. Finally, Iwould<br />

like to thank the Wiley team for the fruitful collaboration during the process of editing<br />

the camera-ready script.<br />

Dortmund, Autumn 1994 Hans-Paul Schwefel


Contents<br />

Preface v<br />

1 Introduction 1<br />

2 Problems <strong>and</strong> Methods of Optimization 5<br />

2.1 General Statement of the Problems : : : : : : : : : : : : : : : : : : : : : : 5<br />

2.2 Particular Problems <strong>and</strong> Methods of Solution : : : : : : : : : : : : : : : : 6<br />

2.2.1 Experimental Versus Mathematical Optimization : : : : : : : : : : 6<br />

2.2.2 Static Versus Dynamic Optimization : : : : : : : : : : : : : : : : : 9<br />

2.2.3 Parameter Versus Functional Optimization : : : : : : : : : : : : : : 10<br />

2.2.4 Direct (Numerical) Versus<br />

Indirect (Analytic) Optimization : : : : : : : : : : : : : : : : : : : 13<br />

2.2.5 Constrained Versus Unconstrained Optimization : : : : : : : : : : : 16<br />

2.3 Other Special Cases : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18<br />

3 Hill climbing Strategies 23<br />

3.1 One Dimensional Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : 25<br />

3.1.1 Simultaneous Methods : : : : : : : : : : : : : : : : : : : : : : : : : 26<br />

3.1.2 Sequential Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : 27<br />

3.1.2.1 Boxing in the Minimum : : : : : : : : : : : : : : : : : : : 28<br />

3.1.2.2 Interval Division Methods : : : : : : : : : : : : : : : : : : 29<br />

3.1.2.2.1 Fibonacci Division. : : : : : : : : : : : : : : : : : 29<br />

3.1.2.2.2 The Golden Section. : : : : : : : : : : : : : : : : 32<br />

3.1.2.3 Interpolation Methods : : : : : : : : : : : : : : : : : : : : 33<br />

3.1.2.3.1 Regula Falsi Iteration. : : : : : : : : : : : : : : : 34<br />

3.1.2.3.2 Newton-Raphson Iteration. : : : : : : : : : : : : 35<br />

3.1.2.3.3 Lagrangian Interpolation. : : : : : : : : : : : : : 35<br />

3.1.2.3.4 Hermitian Interpolation. : : : : : : : : : : : : : : 37<br />

3.2 Multidimensional Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : 38<br />

3.2.1 Direct Search Strategies : : : : : : : : : : : : : : : : : : : : : : : : 40<br />

3.2.1.1 Coordinate Strategy : : : : : : : : : : : : : : : : : : : : : 41<br />

3.2.1.2 Strategy of Hooke <strong>and</strong>Jeeves: Pattern Search : : : : : : : 44<br />

3.2.1.3 Strategy of Rosenbrock: Rotating Coordinates : : : : : : : 48<br />

3.2.1.4 Strategy of Davies, Swann, <strong>and</strong> Campey (DSC) : : : : : : 54<br />

3.2.1.5 Simplex Strategy of Nelder <strong>and</strong> Mead : : : : : : : : : : : 57<br />

vii


viii<br />

3.2.1.6 Complex Strategy of Box : : : : : : : : : : : : : : : : : : 61<br />

3.2.2 Gradient Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : : 65<br />

3.2.2.1 Strategy of Powell: Conjugate Directions : : : : : : : : : : 69<br />

3.2.3 Newton Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74<br />

3.2.3.1 DFP: Davidon-Fletcher-Powell Method<br />

(Quasi-Newton Strategy, Variable Metric Strategy) : : : : 77<br />

3.2.3.2 Strategy of Stewart:<br />

Derivative-free Variable Metric Method : : : : : : : : : : : 78<br />

3.2.3.3 Further Extensions : : : : : : : : : : : : : : : : : : : : : : 81<br />

4 R<strong>and</strong>om Strategies 87<br />

5 <strong>Evolution</strong> Strategies for Numerical Optimization 105<br />

5.1 The Two Membered <strong>Evolution</strong> Strategy : : : : : : : : : : : : : : : : : : : :105<br />

5.1.1 The Basic Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : :106<br />

5.1.2 The Step Length Control : : : : : : : : : : : : : : : : : : : : : : : :110<br />

5.1.3 The Convergence Criterion : : : : : : : : : : : : : : : : : : : : : : :113<br />

5.1.4 The Treatment of Constraints : : : : : : : : : : : : : : : : : : : : :115<br />

5.1.5 Further Details of the Subroutine EVOL : : : : : : : : : : : : : : :115<br />

5.2 A Multimembered <strong>Evolution</strong> Strategy : : : : : : : : : : : : : : : : : : : : :118<br />

5.2.1 The Basic Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : :118<br />

5.2.2 The Rate of Progress of the (1 , )<strong>Evolution</strong> Strategy : : : : : : : :120<br />

5.2.2.1 The Linear Model (Inclined Plane) : : : : : : : : : : : : :124<br />

5.2.2.2 The Sphere Model : : : : : : : : : : : : : : : : : : : : : :127<br />

5.2.2.3 The Corridor Model : : : : : : : : : : : : : : : : : : : : :134<br />

5.2.3 The Step Length Control : : : : : : : : : : : : : : : : : : : : : : : :142<br />

5.2.4 The Convergence Criterion for >1Parents : : : : : : : : : : : : :145<br />

5.2.5 Scaling of the Variables by Recombination : : : : : : : : : : : : : :146<br />

5.2.6 Global Convergence : : : : : : : : : : : : : : : : : : : : : : : : : : :149<br />

5.2.7 Program Details of the ( + ) ES Subroutines : : : : : : : : : : : :149<br />

5.3 Genetic Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :151<br />

5.3.1 The Canonical Genetic Algorithm for Parameter Optimization : : :152<br />

5.3.2 Representation of Individuals : : : : : : : : : : : : : : : : : : : : :153<br />

5.3.3 Recombination <strong>and</strong> Mutation : : : : : : : : : : : : : : : : : : : : :155<br />

5.3.4 Reproduction <strong>and</strong> Selection : : : : : : : : : : : : : : : : : : : : : :157<br />

5.3.5 Further Remarks : : : : : : : : : : : : : : : : : : : : : : : : : : : :158<br />

5.4 Simulated Annealing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :160<br />

5.5 Tabu Search <strong>and</strong> Other Hybrid Concepts : : : : : : : : : : : : : : : : : : :162<br />

6 Comparison of Direct Search Strategies for Parameter Optimization 165<br />

6.1 Di culties : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :165<br />

6.2 Theoretical Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :166<br />

6.2.1 Proofs of Convergence : : : : : : : : : : : : : : : : : : : : : : : : :167<br />

6.2.2 Rates of Convergence : : : : : : : : : : : : : : : : : : : : : : : : : :168<br />

6.2.3 Q-Properties : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :169


6.2.4 Computing Dem<strong>and</strong>s : : : : : : : : : : : : : : : : : : : : : : : : : :170<br />

6.3 Numerical Comparison of Strategies : : : : : : : : : : : : : : : : : : : : : :173<br />

6.3.1 Computer Used : : : : : : : : : : : : : : : : : : : : : : : : : : : : :174<br />

6.3.2 Optimization Methods Tested : : : : : : : : : : : : : : : : : : : : :175<br />

6.3.3 Results of the Tests : : : : : : : : : : : : : : : : : : : : : : : : : : :179<br />

6.3.3.1 First Test: Convergence Rates<br />

for a Quadratic Objective Function : : : : : : : : : : : : :179<br />

6.3.3.2 Second Test: Reliability : : : : : : : : : : : : : : : : : : :204<br />

6.3.3.3 Third Test: Non-Quadratic Problems<br />

with Many Variables : : : : : : : : : : : : : : : : : : : : :217<br />

6.4 Core storage required : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :232<br />

7 Summary <strong>and</strong> Outlook 235<br />

8 References 249<br />

Appendices 325<br />

A Catalogue of Problems 325<br />

A.1 Test Problems for the First Part of the Strategy Comparison : : : : : : : :325<br />

A.2 Test Problems for the Second Part of the Strategy Comparison : : : : : : :327<br />

A.3 Test Problems for the Third Part of the Strategy Comparison : : : : : : :361<br />

B Program Codes 367<br />

B.1 (1+1) <strong>Evolution</strong> Strategy EVOL : : : : : : : : : : : : : : : : : : : : : : : :367<br />

B.2 ( , )<strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO : : : : : : : : : : : : : : : :375<br />

B.3 ( + )<strong>Evolution</strong> Strategy KORR : : : : : : : : : : : : : : : : : : : : : : :386<br />

C Programs 415<br />

C.1 Contents of the Floppy Disk : : : : : : : : : : : : : : : : : : : : : : : : : :415<br />

C.2 About the Program Disk : : : : : : : : : : : : : : : : : : : : : : : : : : : :416<br />

C.3 Running the C Programs : : : : : : : : : : : : : : : : : : : : : : : : : : : :417<br />

C.3.1 How to Install OptimA on a PC Using LINUX<br />

or on a UNIX Workstation : : : : : : : : : : : : : : : : : : : : : : :417<br />

C.3.2 How to Install OptimA on a PC Under DOS : : : : : : : : : : : :418<br />

C.3.3 Running OptimA : : : : : : : : : : : : : : : : : : : : : : : : : : : :418<br />

C.4 Description of the Programs : : : : : : : : : : : : : : : : : : : : : : : : : :418<br />

C.4.1 How to Incorporate New Functions : : : : : : : : : : : : : : : : : :419<br />

C.5 Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :421<br />

C.5.1 An Application of the Multimembered <strong>Evolution</strong> Strategy<br />

to the Corridor Model : : : : : : : : : : : : : : : : : : : : : : : : :421<br />

C.5.2 OptimA Working in Batch Mode : : : : : : : : : : : : : : : : : : :422<br />

Index 425<br />

ix


Chapter 1<br />

Introduction<br />

There is scarcely a modern journal, whether of engineering, economics, management,<br />

mathematics, physics or the social sciences, in which the concept optimization is missing<br />

from the subject index. If one abstracts from all specialist points of view, the recurring<br />

problem is to select a better or best (Leibniz, 1710 eventually, heintroduced the term<br />

optimal ) alternative fromamonganumber of possible states of a airs. However, if one<br />

were to follow thehypothesis of Leibniz, as presented in his Theodicee, that our world<br />

is the best of all possible worlds, one could justi ably sink into passive fatalism. There<br />

would be nothing to improve or to optimize.<br />

Biology, especially since Darwin, has replaced the static world picture of Leibniz' time<br />

by a dynamic one, that of the more or less gradual development of the species culminating<br />

in the appearance of man. Paleontology is providing an increasingly complete picture<br />

of organic evolution. So-called missing links repeatedly turn out to be not missing, but<br />

rather hitherto undiscovered stages of this process. Very much older than the recognition<br />

that man is the result (or better, intermediate state) of a meliorization process is<br />

the seldom-questioned assumption that he is a perfect end product, the \pinnacle of creation."<br />

Furthermore, long before man conceived of himself as an active participant in<br />

the development of things, he had unconsciously in uenced this evolution. There can be<br />

no doubt that his ability <strong>and</strong> e orts to make the environment meet his needs raised him<br />

above other forms of life <strong>and</strong> have enabled him, despite physical inferiority, to nd,to<br />

hold, <strong>and</strong> to extend his place in the world{so far at least. As long as mankind has existed<br />

on our planet, spaceship earth, we, together with other species have mutually in uenced<br />

<strong>and</strong> changed our environment. Has this always been done in the sense of meliorization?<br />

In 1759, the French philosopher Voltaire (1759), dissatis ed with the conditions of his<br />

age, was already taking up arms against Leibniz' philosophical optimism <strong>and</strong> calling for<br />

conscious e ort to change the state of a airs. In the same way today, whenwe optimize<br />

we nd that we are both the subject <strong>and</strong> object of the history of development. In the<br />

desire to improve an object, a process, or a system, Wilde <strong>and</strong> Beightler (1967) see an<br />

expression of the human striving for perfection. Whether such alofty goal can be attained<br />

depends on many conditions.<br />

It is not possible to optimize when there is only one way to carry out a task{then one<br />

has no alternative. If it is not even known whether the problem at h<strong>and</strong> is soluble, the<br />

1


2 Introduction<br />

situation calls for an invention or discovery <strong>and</strong> not, at that stage, for any optimization.<br />

But wherever two or more solutions exist <strong>and</strong> one must decide upon one of them, one<br />

should choose the best, that is to say optimize. Those independent features that distinguish<br />

the results from one another are called (independent) variables or parameters of the<br />

object or system under consideration they may be represented as binary, integer, otherwise<br />

discrete, or real values. A rational decision between the real or imagined variants<br />

presupposes a value judgement, which requires a scale of values, a quantitative criterion<br />

of merit, according to which one solution can be classi ed as better, another as worse.<br />

This dependent variable is usually called an objective (function) because it depends on<br />

the objective of the system{the goal to be attained with it{<strong>and</strong> is functionally related to<br />

the parameters. There may even exist several objectives at the same time{the normal<br />

case in living systems where the mix of objectives also changes over time <strong>and</strong> may, in fact,<br />

be induced by the actual course of the evolutionary paths themselves.<br />

Sometimes the hardest part of optimization is to de ne clearly an objective function.<br />

For instance, if several subgoals are aimed at, a relativeweightmust be attached to each of<br />

the individual criteria. If these are contradictory one only can hope to nd a compromise<br />

on a trade-o subset of non-dominated solutions. Variability <strong>and</strong> distinct order of merit<br />

are the unavoidable conditions of any optimization. One may sometimes also think one<br />

has found the right objective for a subsystem, only to realize later that, in doing so,<br />

one has provoked unwanted side e ects, the rami cations of which have worsened the<br />

disregarded total objective function. We are just now experiencing how narrow-minded<br />

scales of value can steer us into dangerous plights, <strong>and</strong> how it is sometimes necessary to<br />

consider the whole Earth as a system, even if this is where di erences of opinion about<br />

value criteria are the greatest.<br />

The second di culty in optimization, particularly of multiparameter objectives or<br />

processes, lies in the choice or design of a suitable strategy for proceeding. Even when the<br />

objective has been su ciently clearly de ned, indeed even when the functional dependence<br />

on the independent variables has been mathematically (or computationally) formulated,<br />

it often remains di cult enough, if not completely impossible, to nd the optimum,<br />

especially in the time available.<br />

The uninitiated often think that it must be an easy matter to solve a problem expressed<br />

in the language of mathematics, that most exact of all sciences. Far from it: The problem<br />

of how to solve problems is unsolved{<strong>and</strong> mathematicians have beenworking on it for<br />

centuries. For giving exact answers to questions of extremal values <strong>and</strong> corresponding<br />

positions (or conditions) we are indebted, for example, to the di erential <strong>and</strong> variational<br />

calculus, ofwhich the development in the 18th century is associated with such illustrious<br />

names as Newton, Euler, Lagrange, <strong>and</strong> Bernoulli. These constitute the foundations of<br />

the present methods referred to as classical, <strong>and</strong> of the further developments in the theory<br />

of optimization. Still, there is often a long way from the theory, which is concerned<br />

with establishing necessary (<strong>and</strong> su cient) conditions for the existence of minima <strong>and</strong><br />

maxima, to the practice, the determination of these most desirable conditions. Practically<br />

signi cant solutions of optimization problems rst became possible with the arrival of<br />

(large <strong>and</strong>) fast programmable computers in the mid-20th century. Since then the ood<br />

of publications on the subject of optimization has been steadily rising in volume it is a


Introduction 3<br />

simple matter to collect several thous<strong>and</strong> published articles about optimization methods.<br />

Even an interested party nds it di cult to keep pace nowadays with the development<br />

that is going on. It seems far from being over, for there still exists no all-embracing theory<br />

of optimization, nor is there any universal method of solution. Thus it is appropriate, in<br />

Chapter 2, to give a general survey of optimization problems <strong>and</strong> methods. The special<br />

r^ole of direct, static, non-discrete, <strong>and</strong> non-stochastic parameter optimization emerges<br />

here, for many of these methods can be transferred to other elds the converse is less often<br />

possible. In Chapter 3, some of these strategies are presented in more depth, principally<br />

those that extract the information they require only from values of the objective function,<br />

thatistosay without recourse to analytic partial derivatives (derivative-free methods).<br />

Methods of a probabilistic nature are omitted here.<br />

Methods which use chance as an aid to decision making, are treated separately in<br />

Chapter 4. In numerical optimization, chance is simulated deterministically by means of<br />

a pseudor<strong>and</strong>om number generator able to produce some kind of deterministic chaos only.<br />

One of the r<strong>and</strong>om strategies proves to be extremely promising. It imitates, in a highly<br />

simpli ed manner, the mutation-selection game of nature. This concept, a two membered<br />

evolution strategy, is formulated in a manner suitable for numerical optimization in Chapter<br />

5, Section 5.1. Following the hypothesis put forward by Rechenberg, that biological<br />

evolution is, or possesses, an especially advantageous optimization process <strong>and</strong> is therefore<br />

worthy of imitation, an extended multimembered scheme that imitates the population<br />

principle of evolution is introduced in Chapter 5, Section 5.2. It permits a more natural<br />

as well as more e ective speci cation of the step lengths than the two membered scheme<br />

<strong>and</strong> actually invites the addition of further evolutionary principles, such as, for example,<br />

sexual propagation <strong>and</strong> recombination. An approximate theory of the rate of convergence<br />

can also be set up for the (1 , )evolution strategy, inwhich only the best of descendants<br />

of a generation become parents of the following one.<br />

A short excursion, new to this edition, introduces nearly concurrent developments<br />

that the author was unaware of when compiling his dissertation in the early 1970s, i.e.,<br />

genetic algorithms, simulated annealing, <strong>and</strong>tabu search.<br />

Chapter 6 then makes a comparison of the evolution strategies with the direct search<br />

methods of zero, rst, <strong>and</strong> second order, which were treated in detail in Chapter 3.<br />

Since the predictive power of theoretical proofs of convergence <strong>and</strong> statements of rates<br />

of convergence is limited to simple problem structures, the comparison includes mainly<br />

numerical tests employing various model objective functions. The results are evaluated<br />

from two points of view:<br />

E ciency, or speed of approach to the objective<br />

E ectivity, or reliability under varying conditions<br />

The evolution strategies are highly successful in the test of e ectivity orrobustness.<br />

Contrary to the widely held opinion that biological evolution is a very wasteful method<br />

of optimization, the convergence rate test shows that, in this respect too, the evolution<br />

methods can hold their own <strong>and</strong> are sometimes even more e cient than many purely<br />

deterministic methods. The circle is closed in Chapter 7, where the analogy between


4 Introduction<br />

iterative optimization <strong>and</strong> evolution is raised once again for discussion, with a look at<br />

some natural improvements <strong>and</strong> extensions of the concept of the evolution strategy.<br />

The list of test problems that were used can be found in Appendix A, <strong>and</strong> FORTRAN<br />

codes of the evolution strategies, with detailed guidance for users, are in Appendix B.<br />

Finally, Appendix C explains how to use the C <strong>and</strong> FORTRAN programs on the oppy<br />

disk.


Chapter 2<br />

Problems <strong>and</strong> Methods of<br />

Optimization<br />

2.1 General Statement of the Problems<br />

According to whether one emphasizes the theoretical aspect (existence conditions of optimal<br />

solutions) or the practical (procedures for reaching optima), optimization nowadays<br />

is classi ed as a branch of applied or numerical mathematics, operations research, orof<br />

computer-assisted systems (engineering) design. In fact many optimization methods are<br />

based on principles which were developed in linear <strong>and</strong> non-linear algebra. Whereas for<br />

equations, or systems of equations, the problem is to determine a quantity or set of quantities<br />

such that functions which depend on them have speci ed values, in the case of an<br />

optimization problem, an initially unknown extremal value is sought. Many of the current<br />

methods of solution of systems of linear equations start with an approximation <strong>and</strong> successively<br />

improve itby minimizing the deviation from the required value. For non-linear<br />

equations <strong>and</strong> for incomplete or overdetermined systems this way of proceeding is actually<br />

essential (Ortega <strong>and</strong> Rheinboldt, 1970). Thus many seemingly quite di erent <strong>and</strong> apparently<br />

unrelated problems turn out, after a suitable reformulation, to be optimization<br />

problems.<br />

Into this class come, for example, the solution of di erential equations (boundary<br />

<strong>and</strong> initial value problems) <strong>and</strong> eigenvalue problems, as well as problems of observational<br />

calculus, adaptation, <strong>and</strong> approximation (Stiefel, 1965 Schwarz, Rutishauser, <strong>and</strong> Stiefel,<br />

1968 Collatz <strong>and</strong> Wetterling, 1971). In the rst case, the basic problem again is to<br />

solve equations in the second, the problem is often reduced to minimize deviations in the<br />

Gaussian sense (sum of squares of residues) or the Tchebyche sense (maximum of the<br />

absolute residues). Even game theory (Vogelsang, 1963) <strong>and</strong> pattern or shape recognition<br />

as a branch of information theory (Andrews, 1972 Niemann, 1974) have features in<br />

common with the theory of optimization. In one case, from among a stored set of idealized<br />

types, a pattern will be sought that has the maximum similarity to the one presented in<br />

another case, the search will be for optimal courses of action in con ict situations. Here,<br />

two or more interests are competing. Each player tries to maximize his chanceofwinning<br />

with regard to the way in which his opponent supposedly plays. Most optimization<br />

5


6 Problems <strong>and</strong> Methods of Optimization<br />

problems, however, are characterized by a single interest, to reach an objectivethatisnot<br />

in uenced by others.<br />

The engineering aspect of optimization has manifested itself especially clearly with the<br />

design of learning robots, which have toadapt their operation to the prevailing conditions<br />

(see for example Feldbaum, 1962 Zypkin, 1970). The feedback between the environment<br />

<strong>and</strong> the behavior of the robot is e ected here by a program, a strategy, which can perhaps<br />

even alter itself. Wiener (1963) goes even further <strong>and</strong> considers self-reproducing machines,<br />

thus arriving at a consideration of robots that are similar to living beings. Computers<br />

are often regarded as the most highly developed robots, <strong>and</strong> it is therefore tempting to<br />

make comparisons with the human brain <strong>and</strong> its neurons <strong>and</strong> synapses (von Neumann,<br />

1960, 1966 Marfeld, 1970 Steinbuch, 1971). They are nowadays the most important aid<br />

to optimization, <strong>and</strong> many problems are intractable without them.<br />

2.2 Particular Problems <strong>and</strong> Methods of Solution<br />

The lack of a universal method of optimization has led to the present availability of<br />

numerous procedures that eachhave only limited application to special cases. No attempt<br />

will be made here to list them all. A short survey should help to distinguish the parameter<br />

optimization strategies, treated in detail later, from the other procedures, but while at<br />

the same time exhibiting some features they have incommon. The chosen scheme of<br />

presentation is to discuss two opposing concepts together.<br />

2.2.1 Experimental Versus Mathematical Optimization<br />

If the functional relation between the variables <strong>and</strong> the objective function is unknown,<br />

one is forced to experiment either on the real object or on a scale model. To dosoone<br />

must be as free as possible to vary the independentvariables <strong>and</strong> have access to measuring<br />

instruments with which the dependent variable, the quality, can be measured. Systematic<br />

investigation of all possible states of the system will be too costly if there are many<br />

variables, <strong>and</strong> r<strong>and</strong>om sampling of various combinations is too unreliable for achieving<br />

the desired result. A procedure must be signi cantly more e ective if it systematically<br />

exploits whatever information is retained about preceding attempts. Such a plan is also<br />

called a strategy. The concept originated in game theory <strong>and</strong> was formulated by von<br />

Neumann <strong>and</strong> Morgenstern (1961).<br />

Many of the search strategies of mathematical optimization to be discussed later<br />

were also applied under experimental conditions{not always successfully. An important<br />

characteristic of the experiment is the unavoidable e ect of (stochastic) disturbances on<br />

the measured results. A good experimental optimization strategy has to take account of<br />

this fact <strong>and</strong> approach the desired extremum with the least possible cost in attempts.<br />

Two methods in particular are most frequently mentioned in this connection:<br />

The EVOP (evolutionary operation) method proposed by G.E.P.Box (1957), a<br />

development of the experimental gradient method of Box <strong>and</strong> Wilson (1951)<br />

The strategy of arti cial evolution designed by Rechenberg (1964)


Particular Problems <strong>and</strong> Methods of Solution 7<br />

The algorithm of Rechenberg's evolution strategy will be treated in detail in Chapter 5.<br />

In the experimental eld it has often been applied successfully: for example, to the solution<br />

of multiparameter shaping problems (Rechenberg, 1964 Schwefel, 1968 Klockgether <strong>and</strong><br />

Schwefel, 1970). All variables are simultaneously changed by a small r<strong>and</strong>om amount.<br />

The changes are (binomially or) normally distributed. The expected value of the r<strong>and</strong>om<br />

vector is zero (for all components). Failures leave the starting condition unchanged,<br />

only successes are adopted. Stochastic disturbances or perturbations, brought about by<br />

errors of measurement, do not a ect the reliability but in uence the speed of convergence<br />

according to their magnitude. Rechenberg (1973) gives rules for the optimal choice of a<br />

common variance of the probability density distribution of the r<strong>and</strong>om changes for both<br />

the unperturbed <strong>and</strong> the perturbed cases.<br />

The EVOP method of G. E. P. Boxchanges only two or three parameters at a time{if<br />

possible those which have the strongest in uence. A square or cube is constructed with<br />

an initial condition at its midpoint its 2 2 = 4 or 2 3 = 8 corners represent the points in<br />

a cycle of trials. These deterministically established states are tested sequentially, several<br />

times if perturbations are acting. The state with the best result becomes the midpoint<br />

of the next pattern of points. Under some conditions, one also changes the scaling of<br />

the variables or exchanges one or more parameters for others. Details of this altogether<br />

heuristic way of proceeding are described by Box <strong>and</strong> Draper (1969, 1987). The method<br />

has mainly been applied to the dynamic optimization of chemical processes. Experiments<br />

are performed on the real system, sometimes over a period of several years.<br />

The counterpart to experimental optimization is mathematical optimization. The<br />

functional relation between the criterion of merit or quality <strong>and</strong> the variables is known,<br />

at least approximately to put it another way, a more or less simpli ed mathematical<br />

model of the object, process or system is available. In place of experiments there appears<br />

the manipulation of variables <strong>and</strong> the objective function. It is sometimes easy to set up<br />

a mathematical model, for example if the laws governing the behavior of the physical<br />

processes involved are known. If, however, these are largely unresearched, as is often the<br />

case for ecological or economic processes, the work of model building can far exceed that<br />

of the subsequent optimization.<br />

Depending on what deliberate in uence one can have on the process, one is either<br />

restricted to the collection of available data or one can uncover the relationships between<br />

independent <strong>and</strong> dependent variables by judiciously planning <strong>and</strong> interpreting tests. Such<br />

methods (Cochran <strong>and</strong> Cox, 1950 Kempthorne, 1952 Davies, 1954 Cox, 1958 Fisher,<br />

1966 Vajda, 1967 Yates, 1967 John, 1971) were rst applied only to agricultural problems,<br />

but later spread into industry. Since the analyst is intent on building the best<br />

possible model with the fewest possible tests, such an analysis itself constitutes an optimization<br />

problem, just as does the synthesis that follows it. Wald (1966) therefore<br />

recommends proceeding sequentially, that is to construct a model as a hypothesis from<br />

initial experiments or given a priori information, <strong>and</strong> then to improve it in a stepwise<br />

fashion by a further series of tests, or, alternatively, to sometimes reject the model completely.<br />

The tting of model parameters to the measured data can be considered as an<br />

optimization problem insofar as the expected error or maximum risk is to be minimized.<br />

This is a special case of optimization, called calculus of observations ,whichinvolves sta-


8 Problems <strong>and</strong> Methods of Optimization<br />

tistical tests like regression <strong>and</strong> variance analyses on data subject to errors, for which the<br />

principle of maximum likelihood or minimum 2 is used (see Heinhold <strong>and</strong> Gaede, 1972).<br />

The cost of constructing a model of large systems with many variables, or of very<br />

complicated objects, can become so enormous that it is preferable to get to the desired<br />

optimal condition by direct variation of the parameters of the process, in other words to<br />

optimize experimentally. The fact that one tries to analyze the behavior of a model or<br />

system at all is founded on the hope of underst<strong>and</strong>ing the processes more fully <strong>and</strong> of<br />

being able to solve thesynthesis problem in a more general way than is possible in the<br />

case of experimental optimization, which is tied to a particular situation.<br />

If one has succeeded in setting up a mathematical model of the system under consideration,<br />

then the optimization problem can be expressed mathematically as follows:<br />

F (x) =F (x 1x 2:::xn) ! extr<br />

The round brackets symbolize the functional relationship between the n independent<br />

variables<br />

fxi i = 1(1)ng 1<br />

<strong>and</strong> the dependent variable F , the quality or objective function. In the following it<br />

is always a scalar quantity. The variables can be scalars or functions of one or more<br />

parameters. Whether a maximum or a minimum is sought for is of no consequence for<br />

the method of optimization because of the relation<br />

maxfF (x)g = ; minf;F (x)g<br />

Without loss of generality one can concentrate on one of the types of problem usually<br />

the minimum problem is considered. Restrictions do arise, insofar as in many practical<br />

problems the variables cannot be chosen arbitrarily. They are called constraints. The<br />

simplest of these are the non-negative conditions:<br />

xi 0 for all i = 1(1)n<br />

They are formulated more generally like the objective function:<br />

8<br />

><<br />

Gj(x) =Gj(x1x2:::xn) >: =<br />

9<br />

>=<br />

><br />

0 for all j = 1(1)m<br />

The notation chosen here follows the convention of parameter optimization. One<br />

distinguishes between equalities <strong>and</strong> inequalities. Each equality constraint reduces the<br />

number of true variables of the problem by one. Inequalities, on the other h<strong>and</strong>, simply<br />

reduce the size of the allowed space of solutions without altering its dimensionality. The<br />

sense of the inequality is not critical. Like the interchanging of minimum <strong>and</strong> maximum<br />

problems, one can transform one type into the other by reversing the signs of the terms.<br />

It is su cient to limit consideration to one formulation. For minimum problems this is<br />

1 The term 1(1)n st<strong>and</strong>s for 1,2,3,...,n.


Particular Problems <strong>and</strong> Methods of Solution 9<br />

normally the type Gj(x) 0: Points on the edge of the (closed) allowed space are thereby<br />

permitted. A di erent situation arises if the constraint isgiven as a strict inequality of<br />

the form Gj(x) > 0: Then the allowed space can be open if Gj(x) iscontinuous. If for<br />

Gj(x) 0, with other conditions the same, the minimum lies on the border Gj(x) =0,<br />

then for Gj(x) > 0, there is no true minimum. One refers here to an in mum, the<br />

greatest lower bound, at which actually Gj(x) = 0. In analogous fashion one distinguishes<br />

between maxima <strong>and</strong>suprema (smallest upper bounds). Optimization in the following<br />

means always to nd a maximum or a minimum, perhaps under inequality constraints.<br />

2.2.2 Static Versus Dynamic Optimization<br />

The term static optimization means that the optimum is time invariant or stationary.<br />

It is su cient to determine its position <strong>and</strong> size once <strong>and</strong> for all. Once the location<br />

of the extremum has been found, the search isover. In many cases one cannot control<br />

all the variables that in uence the objective function. Then it can happen that these<br />

uncontrollable variables change with time <strong>and</strong> displace the optimum (non-stationary case).<br />

The goal of dynamic optimization 2 is therefore to maintain an optimal condition in<br />

the face of varying conditions of the environment. The search for the extremum becomes<br />

a more or less continuous process. According to the speed of movement of the optimum,<br />

it may be necessary, instead of making the slow adjustment of the independent variables<br />

by h<strong>and</strong>{as for example in the EVOP method (see Chap. 2, Sect. 2.2.1), to give the task<br />

to a robot or automaton.<br />

The automaton <strong>and</strong> the process together form a control loop. However, unlike conventional<br />

control loops this one is not required to maintain a desired value of a quantity<br />

but to discover the most favorable value of an unknown <strong>and</strong> time-dependent quantity.<br />

Feldbaum (1962), Frankovic et al. (1970), <strong>and</strong> Zach (1974) investigate in detail such automatic<br />

optimization systems, known as extreme value controllers or optimizers. In each<br />

case they are built around a search process. For only one variable (adjustable setting) a<br />

variety ofschemes can be designed. It is signi cantly more complicated for an optimal<br />

value loop when several parameters have tobeadjusted.<br />

Many of the search methods are so very costly because there is no a priori information<br />

about the process to be controlled. Hence nowadays one tries to build adaptive control<br />

systems that use information gathered over a period of time to set up an internal model<br />

of the system, or that, in a sense, learn. Oldenburger (1966) <strong>and</strong>, in more detail, Zypkin<br />

(1970) tackle the problems of learning <strong>and</strong> self-learning robots. Adaptation is said to take<br />

place if the change in the control characteristics is made on the basis of measurements<br />

of those input quantities to the process that cannot be altered{also known as disturbing<br />

variables. If the output quantities themselves are used (here the objective function) to<br />

adjust the control system, the process is called self-learning or self-adaptation. The latter<br />

possibility is more reliable but, because of the time lag, slower. Cybernetic engineering is<br />

concerned with learning processes in a more general form <strong>and</strong> always sees or even seeks<br />

links with natural analogues.<br />

An example of a robot that adapts itself to the environment is the homeostat of Ashby<br />

2 Some authors use the term dynamic optimization in a di erent way than is done here.


10 Problems <strong>and</strong> Methods of Optimization<br />

(1960). Nowadays, however, one does not build one's own optimizer every time there is a<br />

given problem to be solved. Rather one makes use of so-called process computers, which<br />

for a new task only need another special program. They can h<strong>and</strong>le large <strong>and</strong> complicated<br />

problems <strong>and</strong> are coupled to the process by sensors <strong>and</strong> transducers in a closed loop (online)<br />

(Levine <strong>and</strong> Vilis, 1973 McGrew <strong>and</strong> Haimes, 1974). The actual computer usually<br />

works digitally, so that analogue-digital <strong>and</strong> digital-analogue converters are required for<br />

input <strong>and</strong> output. Process computers are employed for keeping process quantities constant<br />

<strong>and</strong> maintaining required pro les as well as for optimization. In the latter case an internal<br />

model (a computer program) usually serves to determine the optimal process parameters,<br />

taking account of the latest measured data values in the calculation.<br />

If the position of the optimum in a dynamic process is shifting very rapidly, the<br />

manner in which the search process follows the extremum takes on a greater signi cance<br />

for the overall quality. In this case one has to go about setting up a dynamic model <strong>and</strong><br />

specifying all variables, including the controllable ones, as functions of time. The original<br />

parameter optimization goes over to functional optimization.<br />

2.2.3 Parameter Versus Functional Optimization<br />

The case when not only the objective function but also the independent variables are<br />

scalar quantities is called parameter optimization. Numerical values<br />

fx i i = 1(1)ng<br />

of the variables or parameters are sought for which thevalue of the objective function<br />

F = F (x ) = extrfF (x)g<br />

is an optimum. The number of parameters describing a state of the object or system is<br />

nite. In the simplest case of only one variable (n = 1), the behavior of the objective<br />

function is readily visualized on a diagram with two orthogonal axes. The value of the<br />

parameter is plotted on the abscissa <strong>and</strong> that of the objective function on the ordinate.<br />

The functional dependence appears as a curve. For n = 2 a three dimensional Cartesian<br />

coordinate system is required. The state of the system is represented as a point in the<br />

horizontal plane <strong>and</strong> the value of the objective function as the vertical height above it.<br />

A mountain range is obtained, the surface of which expresses the relation of dependent<br />

to independent variables. To further simplify the representation, the curves of intersection<br />

between the mountain range <strong>and</strong> parallel horizontal planes are projected onto the<br />

base plane, which provides a contour diagram of the objective function. From this three<br />

dimensional picture <strong>and</strong> its two dimensional projection, concepts like peak, plateau, valley,<br />

ridge, <strong>and</strong> contour line are readily transferred to the n-dimensional case, which is<br />

otherwise beyond our powers of description <strong>and</strong> visualization.<br />

In functional optimization, instead of optimal points in three dimensional Euclidean<br />

space, optimal trajectories in function spaces (such asBanach or Hilbert space) are to<br />

be determined. Thus one refers also to in nite dimensional optimization as opposed to<br />

the nite dimensional parameter optimization. Since the variables to be determined are


Particular Problems <strong>and</strong> Methods of Solution 11<br />

themselves functions of one or more parameters, the objective function is a function of a<br />

function, or a functional.<br />

A classical problem is to determine the smooth curve down which apoint mass will<br />

slide between two points in the shortest time, acted upon by the force of gravity <strong>and</strong><br />

without friction. Known as the brachistochrone problem, it can be solved by means of the<br />

ordinary variational calculus (Courant <strong>and</strong> Hilbert, 1968a,b Denn, 1969 Clegg, 1970). If<br />

the functions to be determined depend on several variables it is a multidimensional variational<br />

problem (Klotzler, 1970). In many cases the time t appears as the only parameter.<br />

The objective function is commonly an integral, in the integr<strong>and</strong> of which will appear not<br />

only the independent variables<br />

x(t) =fx 1(t)x 2(t):::xn(t)g<br />

but also their derivatives _xi(t) =@xi=@t <strong>and</strong> sometimes also the parameter t itself:<br />

F (x(t)) =<br />

Z t2<br />

t1<br />

f(x(t) _x(t)t) dt ! extr<br />

Such problems are typical in control theory, where one has to nd optimal controlling<br />

functions for control processes (e.g., Chang, 1961 Lee, 1964 Leitmann, 1964 Hestenes,<br />

1966 Balakrishnan <strong>and</strong> Neustadt, 1967 Karreman, 1968 Demyanov <strong>and</strong> Rubinov, 1970).<br />

Whereas the variational calculus <strong>and</strong> its extensions provide the mathematical basis<br />

of functional optimization (in the language of control engineering: optimization with distributed<br />

parameters), parameter optimization (with localized parameters) is based on the<br />

theory of maxima <strong>and</strong> minima from the elementary di erential calculus. Consequently<br />

both branches have followed independent paths of development <strong>and</strong> become almost separate<br />

disciplines. The functional analysis theory of Dubovitskii <strong>and</strong> Milyutin (see Girsanov,<br />

1972) has bridged the gap between the problems by allowing them to be treated as special<br />

cases of one fundamental problem, <strong>and</strong> it could thus lead to a general theory of<br />

optimization. However di erent their theoretical bases, in cases of practical signi cance<br />

the problems must be solved on a computer, <strong>and</strong> the iterative methods employed are then<br />

broadly the same.<br />

One of these iterative methods is the dynamic programming or stepwise optimization<br />

of Bellman (1967). It was originally conceived for the solution of economic problems, in<br />

which time-dependent variables are changed in a stepwise way at xed points in time.<br />

The method is a discrete form of functional optimization in which the trajectory sought<br />

appears as a steplike function. At each step a decision is taken, the sequence of which is<br />

called a policy. Assuming that the state at a given step depends only on the decision at<br />

that step <strong>and</strong> on the preceding state{i.e., there is no feedback{, then dynamic programming<br />

can be applied. The Bellman optimum principle implies that each piece of the optimal<br />

trajectory that includes the end point is also optimal. Thus one begins by optimizing the<br />

nal decision at the transition from the last-but-one to the last step. Nowadays dynamic<br />

programming is frequently applied to solving discrete problems of optimal control <strong>and</strong><br />

regulation (Gessner <strong>and</strong> Spremann, 1972 Lerner <strong>and</strong> Rosenman, 1973). Its advantage<br />

compared to other, analytic methods is that its algorithm can be formulated as a program<br />

suitable for digital computers, allowing fairly large problems to be tackled (Gessner <strong>and</strong>


12 Problems <strong>and</strong> Methods of Optimization<br />

Wacker, 1972). Bellman's optimum principle can, however, also be expressed in di erential<br />

form <strong>and</strong> applied to an area of continuous functional optimization (Jacobson <strong>and</strong> Mayne,<br />

1970).<br />

The principle of stepwise optimization can be applied to problems of parameter optimization,<br />

if the objective function is separable (Hadley, 1969): that is, it must be expressible<br />

as a sum of partial objective functions in whichjustoneoravery few variables appear<br />

at a time. The number of steps (k) corresponds to the number of the partial functions at<br />

each step a decision is made only on the (`) variables in the partial objective. They are<br />

also called control or decision variables. Subsidiary conditions (number m) in the form<br />

of inequalities can be taken into account. The constraint functions, like thevariables,<br />

are allowed to take a nite number (b) of discrete values <strong>and</strong> are called state variables.<br />

The recursion formula for the stepwise optimization will not be discussed here. Only the<br />

number of required operations (N) in the calculation will be mentioned, which is of the<br />

order<br />

N kb m+`<br />

For this reason the usefulness of dynamic programming is mainly restricted to the case<br />

` = 1k = n <strong>and</strong> m = 1. Then at each ofthen steps, just one control variable is<br />

speci ed with respect to one subsidiary condition. In the other limiting case where all<br />

variables have to be determined at one step, the normal case of parameter optimization,<br />

the process goes over to a grid method (complete enumeration) with a computational<br />

requirementoforderO(b (n+m) ). Herein lies its capability for locating global optima, even<br />

of complicated multimodal objective functions. However, it is only especially advantageous<br />

if the structure of the objective function permits the enumeration to be limited to a small<br />

part of the allowed region.<br />

Digital computers are poorly suited to solving continuous problems because they cannot<br />

operate directly with functions. Numerical integration procedures are possible, but<br />

costly. Analogue computers are more suitable because they can directly imitate dynamic<br />

processes. Compared to digital computers, however, they have a small numerical range<br />

<strong>and</strong> low accuracy <strong>and</strong> are not so easily programmable. Thus sometimes digital <strong>and</strong> analogue<br />

computers are coupled for certain tasks as so-called hybrid computers. With such<br />

systems a set of di erential equations can be tackled to the same extent as a problem<br />

in functional optimization (Volz, 1965, 1973). The digital computer takes care of the<br />

iteration control, while on the analogue computer the di erentiation <strong>and</strong> integration operations<br />

are carried out according to the parameters supplied by the digital computer.<br />

Korn <strong>and</strong> Korn (1964), <strong>and</strong> Bekey <strong>and</strong> Karplus (1971), describe the operations involved<br />

in trajectory optimization <strong>and</strong> the solution of di erential equations by means of hybrid<br />

computers. The fact that r<strong>and</strong>om methods are often used for such problems has to do<br />

with the computational imprecision of the analogue part, with which deterministic processes<br />

usually fail to cope. If requirements for accuracy are very high, however, purely<br />

digital computation has to take over, with the consequent greater cost in computation<br />

time.


Particular Problems <strong>and</strong> Methods of Solution 13<br />

2.2.4 Direct (Numerical) Versus<br />

Indirect (Analytic) Optimization<br />

The classi cation of mathematical methods of optimization into direct <strong>and</strong> indirect procedures<br />

is attributed to Edelbaum (1962). Especially if one has a computer model of a<br />

system, with which one can perform simulation experiments, the search for a certain set<br />

of exogenous parameters to generate excellent results asks for robust direct optimization<br />

methods. Direct or numerical methods are those that approach the solution in a stepwise<br />

manner (iteratively), at each step (hopefully) improving the value of the objective<br />

function. If this cannot be guaranteed, a trial <strong>and</strong> error process results.<br />

An indirect or analytic procedure attempts to reach the optimum in a single (calculation)<br />

step, without tests or trials. It is based on the analysis of the special properties of<br />

the objective function at the position of the extremum. In the simplest case, parameter<br />

optimization without constraints, one proceeds on the assumption that the tangent plane<br />

at the optimum is horizontal, i.e., the rst partial derivatives of the objective function<br />

exist <strong>and</strong> vanish in x :<br />

@F<br />

=0 for all i = 1(1)n (2.1)<br />

@xi x=x<br />

This system of equations can be expressed with the so-called Nabla operator (r) asa<br />

single vector equation for the stationary point x :<br />

rF (x )=0 (2.2)<br />

Equation (2.1) or (2.2) transforms the original optimization problem into a problem of<br />

solving a set of, perhaps non-linear, simultaneous equations. If F (x) or one or more of its<br />

derivatives are not continuous, there may be extrema that do not satisfy the otherwise<br />

necessary conditions. On the other h<strong>and</strong> not every point inIR n {the n-dimensional space<br />

of real variables{ that satis es conditions (2.1) need be a minimum it could also be a<br />

maximum or a saddle point. Equation (2.2) is referred to as a necessary condition for the<br />

existence of a local minimum.<br />

To givesu cient conditions requires further processes of di erentiation. In fact,<br />

di erentiations must be carried out until the determinant of the matrix of the second<br />

or higher partial derivatives at the point x is non-zero. Things remain simple in the case<br />

of only one variable, when it is required that the lowest order non-vanishing derivative is<br />

positive <strong>and</strong> of even order. Then <strong>and</strong> only then is there a minimum. If the derivative is<br />

negative, x represents a maximum. A saddle point exists if the order is odd.<br />

For n variables, at least the n<br />

(n +1) second partial derivatives<br />

2<br />

@2F (x)<br />

for all i j = 1(1)n<br />

@xi @xj<br />

must exist at the point x . The determinant of the Hessian matrix r 2 F (x )must be<br />

positive, as well as the further n ; 1 principle subdeterminants of this matrix. While<br />

MacLaurin had already completely formulated the su cient conditions for the existence<br />

of minima <strong>and</strong> maxima of one parameter functions in 1742, the corresponding theory


14 Problems <strong>and</strong> Methods of Optimization<br />

for functions of several variables was only completed nearly 150 years later by Schee er<br />

(1886) <strong>and</strong> Stolz (1893) (see also Hancock, 1960).<br />

Su cient conditions can only be applied to check a solution that was obtained from<br />

the necessary conditions. The analytic path thus always leads the original optimization<br />

problem back to the problem of solving a system of simultaneous equations (Equation<br />

(2.2)). If the objective function is of second order, one is dealing with a linear system,<br />

which can be solved with the aid of one of the usual methods of linear algebra. Even if noniterative<br />

procedures are used, such astheGaussian elimination algorithm or the matrix<br />

decomposition method of Cholesky, this cannot be done with a single-step calculation.<br />

Rather the number of operations grows as O(n 3 ): With fast digital computers it is certainly<br />

a routine matter to solve systems of equations with even thous<strong>and</strong>s of variables however,<br />

the inevitable rounding errors mean that complete accuracy is never achieved (Broyden,<br />

1973).<br />

One can normally be satis ed with a su ciently good approximation. Here relaxation<br />

methods, which are iterative, show themselves to be comparable or superior. It depends in<br />

detail on the structure of the coe cient matrix. Starting from an initial approximation,<br />

the error as measured by the residues of the equations is minimized. Relaxation procedures<br />

are therefore basically optimization methods but of a special kind, since the value of the<br />

objective function at the optimum is known beforeh<strong>and</strong>. This a priori information can be<br />

exploited to make savings in the computations, as can the fact that each component of<br />

the residue vector must individually go to zero (e.g., Traub, 1964 Wilkinson <strong>and</strong> Reinsch,<br />

1971 Hestenes, 1973 Hestenes <strong>and</strong> Stein, 1973).<br />

Objective functions having terms or members of higher than second order lead to<br />

non-linear equations as the necessary conditions for the existence of extrema. In this<br />

case, the stepwise approach tothenull position is essential, e.g., with the interpolation<br />

method, which was conceived in its original form by Newton (Chap. 3, Sect. 3.1.2.3.2).<br />

The equations are linearized about the current approximation point. Linear relations<br />

for the correcting terms are then obtained. In this way a complete system of n linear<br />

equations has to be solved at each step of the iteration. Occasionally a more convenient<br />

approach is to search for the minimum of the function<br />

~F (x) =<br />

nX<br />

i=1<br />

with the help of a direct optimization method. Besides the fact that ~ F(x) goes to zero, not<br />

only at the sought for minimum of F (x) but also at its maxima <strong>and</strong> saddle points, it can<br />

sometimes yield non-zero minima of no interest for the solution of the original problem.<br />

Thus it is often preferable not to proceed via the conditions of Equation (2.2) but to<br />

minimize F (x) directly. Only in special cases do indirect methods lead to faster, more<br />

elegant solutions than direct methods. Such is, for example, the case if the necessary<br />

existence condition for minima with one variable leads to an algebraic equation, <strong>and</strong><br />

sectioning algorithms like the computational scheme of Horner can be used or if objective<br />

functions are in the form of so-called posynomes, for which Du n, Peterson, <strong>and</strong> Zener<br />

(1967) devised geometric programming, anentirely indirect method.<br />

@F<br />

@xi<br />

! 2


Particular Problems <strong>and</strong> Methods of Solution 15<br />

Subsidiary conditions, or constraints, complicate matters. In rare cases equality constraints<br />

can be expressed as equations in one variable, that can be eliminated from the<br />

objective function, or constraints in the form of inequalities can be made super uous<br />

by a transformation of the variables. Otherwise there are the methods of bounded variation<br />

<strong>and</strong> Lagrange multipliers, in addition to penalty functions <strong>and</strong> the procedures of<br />

mathematical programming.<br />

The situation is very similar for functional optimization, except that here the indirect<br />

methods are still dominanteven today. Thevariational calculus provides as conditions for<br />

optima di erential instead of ordinary equations{actually ordinary di erential equations<br />

(Euler-Lagrange) or partial di erential equations (Hamilton-Jacobi). In only a few cases<br />

can such a system be solved in a straightforward way for the unknown functions. One<br />

must usually resort again to the help of a computer. Whether it is advantageous to<br />

use a digital or an analogue computer depends on the problem. It is a matter of speed<br />

versus accuracy. Ahybrid system often turns out to be especially useful. If, however, the<br />

solution cannot be found by a purely analytic route, why not choose from the start the<br />

direct procedure also for functional optimization? In fact with the increasing complexity<br />

of practical problems in numerical optimization, this eld is becoming more important, as<br />

illustrated by the work of Daniel (1969), who takes over methods without derivatives from<br />

parameter optimization <strong>and</strong> applies them to the optimization of functionals. An important<br />

point in this is the discretization or parameterization of the originally continuous problem,<br />

which canbeachieved in at least two ways:<br />

By approximation of the desired functions using a sum of suitable known functions or<br />

polynomials, so that only the coe cients of these remain to be determined (Sirisena,<br />

1973)<br />

By approximation of the desired functions using step functions or sides of polygons,<br />

so that only heights <strong>and</strong> positions of the connecting points remain to be determined<br />

Recasting a functional into a parameter optimization problem has the great advantage<br />

that a digital computer can be used straightaway to nd the solution numerically. The<br />

disadvantage that the result only represents a suboptimum is often not serious in practice,<br />

because the assumed values of parameters of the process are themselves not exactly<br />

known (Dixon, 1972a). The experimentally determined numbers are prone to errors or<br />

to statistical uncertainties. In any case, large <strong>and</strong> complicated functional optimization<br />

problems cannot be completely solved by the indirect route.<br />

The direct procedure can either start directly with the functional to be minimized, if<br />

the integration over the substituted function can be carried out (Rayleigh-Ritz method)<br />

or with the necessary conditions, the di erential equations, which specify the optimum. In<br />

the latter case the integral is replaced by a nite sum of terms (Beveridge <strong>and</strong> Schechter,<br />

1970). In this situation gradient methods are readily applied (Kelley, 1962 Klessig <strong>and</strong><br />

Polak, 1973). The detailed way to proceed depends very much on the subsidiary conditions<br />

or constraints of the problem.


16 Problems <strong>and</strong> Methods of Optimization<br />

2.2.5 Constrained Versus Unconstrained Optimization<br />

Special techniques have been developed for h<strong>and</strong>ling problems of optimization with constraints.<br />

In parameter optimization these are the methods of penalty functions <strong>and</strong> mathematical<br />

programming. In the rst case a modi ed objective function is set up, which<br />

For the minimum problem takes the value F (x) = +1 in the forbidden region,<br />

but which remains unchanged in the allowed (feasible) region (barrier method e.g.,<br />

used within the evolution strategies, see Chap. 5)<br />

Only near the boundary inside the allowed region, yields values di erent from F (x)<br />

<strong>and</strong> thus keeps the search at a distance from the edge (partial penalty function e.g.,<br />

used within Rosenbrock's strategy, see Chap. 3, Sect. 3.2.1.3)<br />

Di ers from F (x) over the whole space spanned by thevariables (global penalty<br />

function)<br />

This last is the most common way of treating constraints in the form of inequalities. The<br />

main ideas here are due to Carroll (1961 created response surface technique) <strong>and</strong> to Fiacco<br />

<strong>and</strong> McCormick (1964, 1990 SUMT, sequential unconstrained minimization technique).<br />

For the problem<br />

F (x) ! min<br />

Gj(x) 0 for all j = 1(1)m<br />

Hk(x) = 0 for all k = 1(1)`<br />

the penalty function is of the form (with rvjwk > 0 <strong>and</strong> Gj > 0)<br />

~F (x) =F (x)+r<br />

mX<br />

j=1<br />

vj 1<br />

+<br />

Gj(x) r<br />

`X<br />

k=1<br />

wk [Hk(x)] 2<br />

The coe cients vj <strong>and</strong> wk are weighting factors for the individual constraints <strong>and</strong> r is a<br />

free parameter. The optimum of ~F(x) will depend on the choice of r, so it is necessary<br />

to alter r in a stepwise way. The original extreme value problem is thereby solved by a<br />

sequence of optimizations in which r is gradually reduced to zero. One can hope in this<br />

way at least to nd good approximations for the required minimum problem within a<br />

nite sequence of optimizations.<br />

The choice of suitable values for r is not, however, easy. Fiacco (1974) <strong>and</strong> Fiacco<br />

<strong>and</strong> McCormick (1968, 1990) give some indications, <strong>and</strong> also suggest further possibilities<br />

for penalty functions. These procedures are usually applied in conjunction with gradient<br />

methods. The hemstitching method <strong>and</strong> the riding the constraints method of Roberts <strong>and</strong><br />

Lyvers (1961) work by changing the chosen direction whenever a constraint is violated,<br />

without using a modi ed objective function. They orient themselves with respect to<br />

the gradient of the objective <strong>and</strong> the derivatives of the constraint functions (Jacobian<br />

matrix). In hemstitching, there is always a return into the feasible region, while in riding<br />

the constraints the search runs along the active constraint boundaries. The variables are


Particular Problems <strong>and</strong> Methods of Solution 17<br />

reset into the allowed region by thecomplex method of M. J. Box (1965) (a direct search<br />

strategy) whenever explicit bounds are crossed. Implicit constraints on the other h<strong>and</strong><br />

are treated as barriers (see Chap. 3, Sect. 3.2.1.6).<br />

The methods of mathematical programming, both linear <strong>and</strong> non-linear, treat the<br />

constraints as the main aspect of the problem. They were specially evolved for operations<br />

research (Muller-Merbach, 1971) <strong>and</strong> assume that all variables must always be positive.<br />

Such non-negativity conditions allow special solution procedures to be developed. The<br />

simplest models of economic processes are linear. There are often no better ones available.<br />

For this purpose Dantzig (1966) developed the simplex method of linear programming (see<br />

also Krelle <strong>and</strong> Kunzi, 1958 Hadley, 1962 Weber, 1972).<br />

The linear constraints, together with the condition on the signs of the variables, span<br />

the feasible region in the form of a polygon (for n = 2) or a polyhedron, sometimes called<br />

simplex. Since the objective function is also linear, except in special cases, the desired<br />

extremum must lie in a corner of the polyhedron. It is therefore su cient just to examine<br />

the corners. The simplex method of Dantzig does this in a particularly economic way, since<br />

only those corners are considered in which the objective function has progressively better<br />

values. It can even be thought of as a gradient method along the edges of the polyhedron.<br />

It can be applied in a straightforward way to manyhundreds, even thous<strong>and</strong>s, of variables<br />

<strong>and</strong> constraints. For very large problems, which mayhave a particular structure, special<br />

methods have also been developed (Kunzi <strong>and</strong> Tan, 1966 Kunzi, 1967). Into this category<br />

come the revised <strong>and</strong> the dual simplex methods, the multiphase <strong>and</strong> duplex methods, <strong>and</strong><br />

decomposition algorithms. An unpleasant property of linear programs is that sometimes<br />

just small changes of the coe cients in the objective function or the constraints can cause<br />

a big alteration in the solution. To reveal such dependencies, methods of parametric linear<br />

programming <strong>and</strong> sensitivity analysis have been developed (Dinkelbach, 1969).<br />

Most strategies of non-linear programming resemble the simplex method or use it<br />

as a subprogram (Abadie, 1972). This is the case in particular for the techniques of<br />

quadratic programming, which are conceived for quadratic objective functions <strong>and</strong> linear<br />

constraints. The theory of non-linear programming is based on the optimality conditions<br />

developed by Kuhn <strong>and</strong> Tucker (1951), an extension of the theory of maxima <strong>and</strong> minima<br />

to problems with constraints in the form of inequalities. These can be expressed<br />

geometrically as follows: at the optimum (in a corner of the allowed region) the gradient<br />

of the objective function lies within the cone formed by the gradients of the active<br />

constraints. To start with, this is only a necessary condition. It becomes su cient under<br />

certain assumptions concerning the structure of the objective <strong>and</strong> constraint functions.<br />

For minimum problems, the objective function <strong>and</strong> the feasible region must be convex,<br />

that is the constraints must be concave. Such a problem is also called a convex program.<br />

Finally the Kuhn-Tucker theorem transforms a convex program into an equivalent saddle<br />

point problem (Arrow <strong>and</strong> Hurwicz, 1956), just as the Lagrange multiplier method does<br />

for constraints in the form of equalities. A complete theory of equality constraints is due<br />

to Apostol (1957).<br />

Non-linear programming is therefore only applicable to convex optimization, in which,<br />

to be precise, one must distinguish at least seven types of convexity (Ponstein, 1967). In<br />

addition, all the functions are usually required to be continuously di erentiable, with an


18 Problems <strong>and</strong> Methods of Optimization<br />

analytic speci cation of their partial derivatives. There is an extensive literature on this<br />

subject, of whichthebooksby Arrow, Hurwicz, <strong>and</strong> Uzawa (1958), Zoutendijk (1960), Vajda<br />

(1961), Kunzi, Krelle, <strong>and</strong> Oettli (1962), Kunzi, Tzschach, <strong>and</strong> Zehnder (1966, 1970),<br />

Kunzi <strong>and</strong> Krelle (1969), Zangwill (1969), Suchowitzki <strong>and</strong> Awdejewa (1969), Mangasarian<br />

(1969), Stoer <strong>and</strong> Witzgall (1970), Whittle (1971), Luenberger (1973), <strong>and</strong> Varga<br />

(1974) are but a small sample. Kappler (1967) considers some of the procedures from the<br />

point of view of gradient methods. Kunzi <strong>and</strong> Oettli (1969) give a survey of the more<br />

extended procedures together with an extensive bibliography. FORTRAN programs are<br />

to be found in McMillan (1970), Kuester <strong>and</strong> Mize (1973), <strong>and</strong> L<strong>and</strong> <strong>and</strong> Powell (1973).<br />

Of special importance in control theory are optimization problems in which the constraints<br />

are partly speci ed as di erential equations. They are also called non-holonomous<br />

constraints. Pontrjagin et al. (1967) have given necessary conditions for the existence of<br />

optima in these problems. Their trick was to distinguish between the free control functions<br />

to be determined <strong>and</strong> the local or state functions which are bound by constraints.<br />

Although the theory has given a strong foothold to the analytic treatment ofoptimal<br />

control processes, it must be regarded as a case of good luck if a practical problem can<br />

be made to yield an exact solution in this way. One must usually resort in the end to<br />

numerical approximation methods in order to obtain the desired optimum (e.g., Balakrishnan<br />

<strong>and</strong> Neustadt, 1964, 1967 Rosen, 1966 Leitmann, 1967 Kopp, 1967 Mufti, 1970<br />

Tabak, 1970 Canon, Cullum, <strong>and</strong> Polak, 1970 Tolle, 1971 Unbehauen, 1971 Boltjanski,<br />

1972 Luenberger, 1972 Polak, 1973).<br />

2.3 Other Special Cases<br />

According to the typeofvariables there are still other special areas of mathematical<br />

optimization. In parameter optimization for example the variables can sometimes be<br />

restricted to discrete or integer values. The extreme case is if a parameter may only<br />

take two distinct values, zero <strong>and</strong> unity. Mixed variable types can also appear in the<br />

same problem hence the terms discrete, integer, binary (or zero-one), <strong>and</strong> mixed-integer<br />

programming. Most of the solution procedures that have been worked out deal with linear<br />

integer problems (e.g., those proposed by Gomory, Balas, <strong>and</strong> Beale). An important<br />

class of methods, the branch <strong>and</strong> bound methods, is described for example by Weinberg<br />

(1968). They are classed together with dynamic programming as decision tree strategies.<br />

For the general non-linear case, a last resort can be to try out all possibilities. This<br />

kind of optimization is referred to as complete enumeration. Since the cost of such a<br />

procedure is usually prohibitive, heuristic approaches are also tried, with which usable,<br />

not necessarily optimal, solutions can be found (Weinberg <strong>and</strong> Zehnder, 1969). More<br />

clever ways of proceeding in special cases, for example by applying non-integer techniques<br />

of linear <strong>and</strong> non-linear programming, can be found in Korbut <strong>and</strong> Finkelstein (1971),<br />

Greenberg (1971), Plane <strong>and</strong> McMillan (1971), Burkard (1972), Hu (1972), <strong>and</strong> Gar nkel<br />

<strong>and</strong> Nemhauser (1972, 1973).<br />

By stochastic programming is meant the solution of problems with objective functions,<br />

<strong>and</strong> sometimes also constraints, that are subject to statistical perturbations (Faber, 1970).<br />

It is simplest if such problems can be reduced to deterministic ones, for example byworking


Other Special Cases 19<br />

with expectation values. However, there are some problems in which the probability<br />

distributions signi cantly in uence the optimal solution. Operational methods at rst<br />

only existed for special cases such as, for example, warehouse problems (Beckmann, 1971).<br />

Their numbers as well as the elds of application are growing steadily (Hammer, 1984<br />

Ermoliev <strong>and</strong> Wets, 1988 Ermakov, 1992). In general, one has to make a clear distinction<br />

between deterministic solution methods for more or less noisy or stochastic situations<br />

<strong>and</strong> stochastic methods for deterministic but di cult situations like multimodal or fractal<br />

topologies. Here we refer to the former in Chapter 4 we will do so for the latter, especially<br />

under the aspect of global optimization.<br />

In a rather new branch within the mathematical programming eld, called non-smooth<br />

or non-di erentiable optimization, more or less classical gradient-type methods for nding<br />

solutions still persist (e.g., Balinski <strong>and</strong> Wolfe, 1975 Lemarechal <strong>and</strong> Mi in, 1978<br />

Nurminski, 1982 Kiwiel, 1985).<br />

For successively approaching the zero or extremum of a function if the measured values<br />

are subject to uncertainties, a familiar strategy is that of stochastic approximation (Wasan,<br />

1969). The original concept is due to Robbins <strong>and</strong> Monro (1951). Kiefer <strong>and</strong> Wolfowitz<br />

(1952) have adapted it for problems in which the maximum of a unimodal regression<br />

function is sought. Blum (1954a) has proved that the method is certain to converge. It<br />

distinguishes between test or trial steps <strong>and</strong> work steps. With one variable, starting at the<br />

point x (k) , the value of the objective function is obtained at the two positions x (k) c (k) .<br />

The slope is then calculated as<br />

y (k) = F (x(k) + c (k) ) ; F (x (k) ; c (k) )<br />

2c (k)<br />

Awork step follows from the recursion formula (for minimum searches)<br />

x (k+1) = x (k) ; 2a (k) y (k)<br />

The choice of the positive sequences c (k) <strong>and</strong> a (k) is important for convergence of the<br />

process. These should satisfy the relations<br />

1X<br />

k=1<br />

1X<br />

k=1<br />

One chooses for example the sequences<br />

lim<br />

k!1 c(k) ! 0<br />

1X<br />

k=1<br />

a (k) = 1<br />

a (k) c (k) < 1<br />

(k) a<br />

c (k)<br />

! 2<br />

< 1<br />

a (k) = a(0)<br />

k a(0) > 0<br />

c (k) = c(0)<br />

4p k c (0) > 0 k>0


20 Problems <strong>and</strong> Methods of Optimization<br />

This means that the work step length goes to zero very much faster than the test step<br />

length, in order to compensate for the growing in uence of the perturbations.<br />

Blum (1954b) <strong>and</strong> Dvoretzky (1956) describe how to apply this process to multidimensional<br />

problems. The increment in the objective function, hence an approximation to<br />

the gradient vector, is obtained from n +1 observations. Sacks (1958) uses 2 n trial steps.<br />

The stochastic approximation can thus be regarded, in a sense, as a particular gradient<br />

method.<br />

Yet other basic strategies have been proposed these adopt only the choice of step<br />

lengths from the stochastic approximation, while the directions are governed by other<br />

criteria. Thomas <strong>and</strong> Wilde (1964) for example, combine the stochastic approximation<br />

with the relaxation method of Southwell (1940, 1946). Kushner (1963) <strong>and</strong> Schmitt (1969)<br />

even take r<strong>and</strong>om directions into consideration. All the proofs of convergence of the<br />

stochastic approximation assume unimodal objective functions. A further disadvantage<br />

is that stability against perturbations is bought atavery high cost, especially if the<br />

number of variables is large. How many steps are required to achieve a given accuracy<br />

can only be stated if the probability density distribution of the stochastic perturbations<br />

is known. Many authors have tried to devise methods in which the basic procedure can<br />

be accelerated: e.g., Kesten (1958), who only reduces the step lengths after a change<br />

in direction of the search, or Odell (1961), who makes the lengths of the work steps<br />

dependent on measured values of the objective function. Other attempts are directed<br />

towards reducing the e ect of the perturbations (Venter, 1967 Fabian, 1967), for example<br />

by making only the direction <strong>and</strong> not the size of the gradients determine the step lengths.<br />

Bertram (1960) describes various examples of applications. More of such work is that of<br />

Krasulina (1972) <strong>and</strong> Engelhardt (1973).<br />

In this introduction many classes of possible or practically occurring optimization<br />

problems <strong>and</strong> methods have been sketched brie y, but the coverage is far from complete.<br />

No mention has been made, for example, of broken rational programming, norofgraphical<br />

methods of solution. In operations research especially (Henn <strong>and</strong> Kunzi, 1968) there are<br />

many special techniques for solving transport, allocation, routing, queuing, <strong>and</strong>warehouse<br />

problems, such as network planning <strong>and</strong> other graph theoretical methods. This excursion<br />

into the vast realm of optimization problems was undertaken because some of the algorithms<br />

to be studied in more depth in what follows, especially the r<strong>and</strong>om methods of<br />

Chapter 4, owe their origin <strong>and</strong> nomenclature to other elds. It should also be seen to<br />

what extent methods of direct parameter optimization permeate the other branches of<br />

the subject, <strong>and</strong> how they are related to each other. An overall scheme of how thevarious<br />

branches are interrelated can be found in Saaty (1970).<br />

If there are two ormoreobjectives at the same time <strong>and</strong> occasion, <strong>and</strong> especially<br />

if these are not con ict-free, single solution points in the decision variable space can<br />

no longer give the full answer to an optimization question, not even in the otherwise<br />

simplest situation. How to look for the whole subset of e cient, non-dominated, or Paretooptimal<br />

solutions can be found under keywords like vector optimization, polyoptimization<br />

or multiple criteria decision making (MCDM) (e.g., Bell, Keeney, <strong>and</strong> Rai a, 1977 Hwang<br />

<strong>and</strong> Masud, 1979 Peschel, 1980 Grauer, Lew<strong>and</strong>owski, <strong>and</strong> Wierzbicki, 1982 Steuer,<br />

1986). Game theory comes into play when several decision makers have access to di erent


Other Special Cases 21<br />

parts of the decision variable set only (e.g., Luce <strong>and</strong> Rai a, 1957 Maynard Smith, 1982<br />

Axelrod, 1984 Sigmund, 1993). No consideration is given here to these special elds.


22 Problems <strong>and</strong> Methods of Optimization


Chapter 3<br />

Hill climbing Strategies<br />

In this chapter some of the direct, mathematical parameter optimization methods will<br />

be treated in more detail for static, non-discrete, non-stochastic, mostly unconstrained<br />

functions. They come under the general heading of hill climbing strategies because their<br />

manner of searching for a maximum corresponds closely to the intuitive way a sightless<br />

climber might feel his way from a valley up to the highest peak of a mountain. For<br />

minimum problems the sense of the displacements is simply reversed, otherwise uphill or<br />

ascent <strong>and</strong> downhill or descent methods (Bach, 1969) are identical. Whereas methods of<br />

mathematical programming are dominant in operations research <strong>and</strong> the special methods<br />

of functional optimization in control theory, the hill climbing strategies are most frequently<br />

applied in engineering design. Analytic methods often prove unsuitable in this eld<br />

Because the assumptions are not satis ed under which necessary conditions for<br />

extrema can be stated (e.g., continuity of the objective function <strong>and</strong> its derivatives)<br />

Because there are di culties in carrying out the necessary di erentiations<br />

Because a solution of the equations describing the conditions does not always lead<br />

to the desired optimum (it can be a local minimum, maximum, or saddle point)<br />

Because the equations describing the conditions, in general a system of simultaneous<br />

non-linear equations, are not immediately soluble<br />

To what extent hill climbing strategies take care of these particular characteristics<br />

depends on the individual method. Very thorough presentations covering some topics can<br />

be found in Wilde (1964), Rosenbrock <strong>and</strong> Storey (1966), Wilde <strong>and</strong> Beightler (1967),<br />

Kowalik <strong>and</strong> Osborne (1968), Box, Davies, <strong>and</strong> Swann (1969), Pierre (1969), Pun (1969),<br />

Converse (1970), Cooper <strong>and</strong> Steinberg (1970), Ho mann <strong>and</strong> Hofmann (1970), Beveridge<br />

<strong>and</strong> Schechter (1970), Aoki (1971), Zahradnik (1971), Fox (1971), Cea (1971), Daniel<br />

(1971), Himmelblau (1972b), Dixon (1972a), Jacoby, Kowalik <strong>and</strong> Pizzo (1972), Stark <strong>and</strong><br />

Nicholls (1972), Brent (1973), Gottfried <strong>and</strong> Weisman (1973), V<strong>and</strong>erplaats (1984), <strong>and</strong><br />

Papageorgiou (1991). More variations or theoretical <strong>and</strong> numerical studies of older methods<br />

can be found as individual publications in a wide variety of journals, or in the volumes<br />

of collected articles such asGraves <strong>and</strong> Wolfe (1963), Blakemore <strong>and</strong> Davis (1964), Lavi<br />

23


24 Hill climbing Strategies<br />

<strong>and</strong> Vogl (1966), Klerer <strong>and</strong> Korn (1967), Abadie (1967, 1970), Fletcher (1969a), Rosen,<br />

Mangasarian, <strong>and</strong> Ritter (1970), Geo rion (1972), Murray (1972a), Lootsma (1972a),<br />

Szego (1972), <strong>and</strong> Sebastian <strong>and</strong> Tammer (1990).<br />

Formulated as a minimum problem without constraints, the task can be stated as<br />

follows:<br />

minfF<br />

(x) j x 2 IR<br />

x n g (3.1)<br />

The column vector x (at the extreme position) is required<br />

x =<br />

2<br />

6<br />

4<br />

x 1<br />

x 2<br />

.<br />

xn 3<br />

7<br />

5 =(x 1 x 2 :::x n )T<br />

<strong>and</strong> the associated extreme value F = F (x ) of the objective function F (x), in this case<br />

the minimum. The expression x 2 IR n means that the variables are allowed to take all<br />

real values x can thus be represented by any point inann-dimensional Euclidean space<br />

IR n . Di erent types of minima are distinguished: strong <strong>and</strong> weak, local <strong>and</strong> global.<br />

For a local minimum the following relationship holds:<br />

for<br />

<strong>and</strong><br />

0 kx ; x k =<br />

F (x ) F (x) (3.2)<br />

vu<br />

u<br />

t nX<br />

i=1<br />

x 2 IR n<br />

(xi ; x i ) 2 "<br />

This means that in the neighborhood of x de ned by the size of " there is no vector<br />

x for which F (x) is smaller than F (x ). If the equality sign in Equation (3.2) only<br />

applies when x = x , the minimum is called strong, otherwise it is weak. An objective<br />

function that only displays one minimum (or maximum) is referred to as unimodal. In<br />

many cases, however, F (x) has several local minima (<strong>and</strong> maxima), which maybeof<br />

di erent heights. The smallest, absolute or global minimum (minimum minimorum) ofa<br />

multimodal objective function satis es the stronger condition<br />

F (x ) F (x) for all x 2 IR n<br />

This is always the preferred object of the search.<br />

If there are also constraints, in the form of inequalities<br />

or equalities<br />

(3.3)<br />

Gj(x) 0 for all j = 1(1)m (3.4)<br />

Hk(x) =0 for all k = 1(1)` (3.5)


One Dimensional Strategies 25<br />

then IR n in Equations (3.1) to (3.3) must either be replaced by the hopefully non-empty<br />

subset M 2 IR n to represent the feasible region in IR n de ned by Equation (3.4), or by<br />

IR n;` , the subspace of lower dimensionality spanned by the variables that now depend<br />

on each other according to Equation (3.5). If solutions at in nity are excluded, then<br />

the theorem of Weierstrass holds (see for example Rothe, 1959): \In a closed compact<br />

region a x b every function which iscontinuous there has at least one (i.e., an<br />

absolute) minimum <strong>and</strong> maximum." This can lie inside or on the boundary. In the case<br />

of discontinuous functions, every point of discontinuity is also a potential c<strong>and</strong>idate for<br />

the position of an extremum.<br />

3.1 One Dimensional Strategies<br />

The search for a minimum is especially easy if the objective function only depends on one<br />

variable.<br />

F(x)<br />

a<br />

b<br />

c d e f g h<br />

Figure 3.1: Special points of a function of one variable<br />

a: local maximum at the boundary<br />

b: local minimum at a point of discontinuity ofFx(x)<br />

c: saddle point, or point of in ection<br />

d-e: weak local maximum<br />

f: local minimum<br />

g: maximum (may be global) at a point ofdiscontinuity ofF (x)<br />

h: global minimum at the boundary<br />

x


26 Hill climbing Strategies<br />

This problem would be of little practical interest, however, were it not for the fact that<br />

many of the multidimensional strategies make use of one dimensional minimizations in<br />

selected directions, referred to as line searches. Figure 3.1 shows some possible ways<br />

minima <strong>and</strong> other special points can arise in the one dimensional case.<br />

3.1.1 Simultaneous Methods<br />

One possible way of discovering the minimum of a function with one parameter is to<br />

determine the value of the objective function at a number of points <strong>and</strong> then to declare<br />

the point with the smallest value the minimum. Since in principle all trials can be carried<br />

out at the same time, this procedure is referred to as simultaneous optimization. How<br />

closely the true minimum is approached depends on the choice of the number <strong>and</strong> location<br />

of the trial points. The more trials are made, the more accurate the solution can be. One<br />

will be concerned, however, to obtain a result at the lowest cost in time <strong>and</strong> computation<br />

(or material). The two requirements of high accuracy <strong>and</strong> lowest cost are contradictory<br />

thus an optimum compromise must be sought.<br />

The e ectiveness of a search method is judged by the size of the largest remaining<br />

interval of uncertainty (in the least favorable case) relative to the position of the minimum<br />

for a given number of trials (the so-called minimax concept, see Wilde, 1964 Beamer <strong>and</strong><br />

Wilde, 1973). Assuming that the points in the series of trials are so densely distributed<br />

that several at a time are in the neighborhood of a local minimum, then the length of<br />

the interval of uncertainty is the same as the distance between the two points in the<br />

neighborhood of the smallest value of F (x). The number of necessary trials can thus<br />

get very large unless one has at least some idea of whereabouts the desired minimum<br />

is situated. In practice one must limit investigation of the objective function to a nite<br />

interval [a b]. It is obvious, <strong>and</strong> it can be proved theoretically, that the optimal choice of<br />

all simultaneous search methods is the one in which the trial points are evenly distributed<br />

over the interval [a b] (Boas, 1962, 1963a{d).<br />

If N equidistant points are used, the interval of uncertainty is of length<br />

<strong>and</strong> the e ectiveness takes the value<br />

`N = 2<br />

(b ; a)<br />

N +1<br />

= 2<br />

N +1<br />

Put another way: To be sure of achieving an accuracy of " >0, the equidistant search<br />

(also called lattice, grid, or tabulation method ) requires N trials, where<br />

2(b ; a)<br />

"<br />

; 1


One Dimensional Strategies 27<br />

Even more e ectivesearchschemes can be devised if the objective function is unimodal<br />

in the interval [a b]. Wilde <strong>and</strong> Beightler (1967) describe a procedure, using evenly<br />

distributed pairs of points, which is also referred to as a simultaneous dichotomous search.<br />

The distance between two points of a pair must be chosen to be su ciently large that<br />

their objective function values are di erent. As ! 0 the dichotomous search with an<br />

even number of trials (even block search) is the best. The number of trials required is<br />

2(b ; a)<br />

"<br />

; 2


28 Hill climbing Strategies<br />

of favorable conditions for the next trial presupposes a more or less precise internal model<br />

of the objective function the better the model corresponds to reality, the better will be<br />

the results of the interpolation <strong>and</strong> extrapolation processes. The simplest assumption<br />

is that the objective function is unimodal, which means that local minima also always<br />

represent global minima. On this basis a number of sequential interval-dividing procedures<br />

have been constructed (Sect. 3.1.2.2). Iterativeinterpolation methods dem<strong>and</strong> more<br />

\smoothness" of the objective function (Sect. 3.1.2.3). In the former case it is necessary,<br />

in the latter useful, to determine at the outset a suitable interval, [a (0) b (0) ], in which the<br />

desired extremum lies (Sect. 3.1.2.1).<br />

3.1.2.1 Boxing in the Minimum<br />

If there are no clues as to whereabouts the desired minimum might be situated, one can<br />

start with two points x (0) <strong>and</strong> x (1) = x (0) + s <strong>and</strong> determine the objective function there.<br />

If F (x (1) ) F(x (k) )<br />

If, however, F (x (1) ) >F(x (0) ), one chooses the opposite direction:<br />

<strong>and</strong><br />

x (2) = x (0) ; s<br />

x (k+1) = x (k) ; s for k 2<br />

similarly, until a step past the minimum is taken, one has thus determined the minimum<br />

of the unimodal function to within an uncertainty interval of length 2 s (Beveridge <strong>and</strong><br />

Schechter, 1970).<br />

In numerical optimization problems the values of the variables often run through<br />

several powers of 10, or alternatively they must be precisely determined at many points.<br />

In this case the boxing-in method with a very small xed step length is too costly. Box,<br />

Davies, <strong>and</strong> Swann (1969) therefore suggest starting with an initial step length s (0) <strong>and</strong><br />

doubling it at each successful step. Their recursion formula is as follows:<br />

x (k+1) = x (0) +2 k s (0)<br />

It is applied as long as F (x (k+1) ) F (x (k) ) holds. As soon as F (x (k+1) ) > F(x (k) )<br />

is registered, however, b (0) = x (k+1) is set as the upper bound to the interval <strong>and</strong> the<br />

starting point x (0) is returned to. The lower bound a (0) is found by a corresponding<br />

process with negative step lengths going in the opposite direction. In this way a starting<br />

interval [a (0) b (0) ] is obtained for the one dimensional search procedure to be described<br />

below. It can happen, because of the convention for equality oftwo function values, that<br />

the search for a bound to the interval does not end if the objective function reaches a<br />

constant horizontal level. It is therefore useful to specify a maximum step length that<br />

may not be exceeded.


One Dimensional Strategies 29<br />

The boxing-in method has also been proposed occasionally as a one dimensional optimization<br />

strategy (Rosenbrock, 1960 Berman, 1966) in its own right. In order not to<br />

waste too many trials far from the target when the accuracy requirement isvery high, it<br />

is useful to start with relatively large steps. Each time a loop ends with a failure the step<br />

length is reduced by a factor less than 0:5, e.g., 0:25. If the above rules for increasing<br />

<strong>and</strong> reducing the step lengths are combined, a very exible procedure is obtained. Dixon<br />

(1972a) calls it the success/failure routine. If a starting interval [a (0) b (0) ] is already at<br />

h<strong>and</strong>, however, there are signi cantly better strategies for successively reducing the size<br />

of the interval.<br />

3.1.2.2 Interval Division Methods<br />

If an equidistant division method is applied repeatedly, the interval of uncertainty is<br />

reduced at each stepby the same factor ,<strong>and</strong>thus for k steps by k . This exponential<br />

progression is considerably stronger than the linear dependence of the value of on the<br />

number of trials per step. Thus as few simultaneous trials as possible would be used. A<br />

comparison of two schemes, with two <strong>and</strong> three simultaneous trials, shows that except in<br />

the rst loop, only two new objective function values must be obtained at a time in both<br />

cases, since of three trial points in one step, one coincides with a point of the previous<br />

step. The total number of trials required with sequential application of the equidistant<br />

three point scheme is<br />

1+<br />

2 log b;a<br />

"<br />

log 2<br />


30 Hill climbing Strategies<br />

fk = fk;1 + fk;2 for k 2<br />

An initial interval [a (0) b (0) ] is required, containing the extremum together with a number<br />

N, which represents the total number of intended interval divisions. If the general interval<br />

is called [a (k) b (k) ], the lengths<br />

s (k) = t (k) (b (k) ; a (k) )=(b (k+1) ; a (k+1) )<br />

are subtracted from its ends, with the reduction factor<br />

giving<br />

t (k) = fN;k;1<br />

fN;k<br />

c (k) = a (k) + s (k)<br />

d (k) = b (k) ; s (k)<br />

(3.10)<br />

The values of the objective function at c (k) <strong>and</strong> d (k) are compared <strong>and</strong> whichever subinterval<br />

contains the better (in a minimum search, lower) value is taken as de ning the<br />

interval for the next step.<br />

If<br />

F (d (k) ) F(c (k) )<br />

a (k+1) = d (k)<br />

b (k+1) = b (k)<br />

A consequence of the Fibonacci series is that, except for the rst interval division, at<br />

all of the following steps one of the two newpoints c (k+1) <strong>and</strong> d (k+1) is always already<br />

known.<br />

If<br />

F (d (k) ) F(c (k) )<br />

d (k+1) = c (k)


One Dimensional Strategies 31<br />

F(x)<br />

a<br />

a<br />

a d<br />

d<br />

c<br />

d<br />

c<br />

b<br />

Figure 3.2: Interval division in the Fibonacci search<br />

c<br />

b<br />

b<br />

x<br />

Step k<br />

Step k+1<br />

Step k+2<br />

so that each time only one new value of the objective function needs to be obtained.<br />

Figure 3.2 illustrates two steps of the procedure. The process is continued until k =<br />

N ; 2. At the next division, because f 2 = 2 f 1d (k) <strong>and</strong> c (k) coincide. A further<br />

interval reduction can only be achieved by slightly displacing one of the test points. The<br />

displacement must be at least big enough for the two objective function values to still<br />

be distinguishable. Then the remaining interval after N trials is of length<br />

`N = 1<br />

fN<br />

(b (0) ; a (0) )+<br />

As ! 0 the e ectiveness tends to f ;1<br />

N . Johnson (1956) <strong>and</strong> Kiefer (1957) show<br />

that this value is optimal in the sense of the -minimax concept, according to which the<br />

Fibonacci search is the best of all sequential interval division procedures. However, by<br />

taking account ofthe displacement, not only at the last but at all the steps, Oliver <strong>and</strong><br />

Wilde (1964) give a recursion formula that for the same number of trials yields a slightly<br />

smaller residual interval. Avriel <strong>and</strong> Wilde (1966a) provide a proof of optimality. Ifone<br />

has a priori information about the structure of the objective function it can be exploited<br />

to advantage (Gal, 1971) in order to reduce further the number of trials. Overholt (1967a,<br />

1973) suggests that in general there is no a priori information available to x suitably,<br />

<strong>and</strong> it is therefore better to omit the nal division using a displacement rule <strong>and</strong> to choose<br />

N one bigger from the start. In order to obtain the minimum with accuracy ">0one


32 Hill climbing Strategies<br />

should choose N such that<br />

fN > b(0) ; a (0)<br />

"<br />

Then the e ectiveness of the procedure becomes<br />

<strong>and</strong> since (Lucas, 1876)<br />

fN = 1 p 5<br />

2<br />

4<br />

p !N +1<br />

1+ 5<br />

;<br />

2<br />

the number of trials is approximately<br />

= 2<br />

fN +1<br />

N ' log b(0) ;a (0)<br />

" +log p 5<br />

log 1+<br />

p<br />

5<br />

2<br />

fN;1<br />

p !N +1<br />

1 ; 5<br />

2<br />

3<br />

5 ' 1 p<br />

5<br />

log b(0) ; a (0)<br />

"<br />

p !N +1<br />

1+ 5<br />

2<br />

(3.11)<br />

Overholt (1965) shows by means of numerical tests that the procedure must often be<br />

terminated prematurely as F (d (k+1) ) becomes equal to F (c (k+1) ), for example because of<br />

computing with a nite number of signi cant gures. Further divisions of the interval of<br />

uncertainty are then pointless.<br />

For the boxing-in method of determining the initial interval one would x an initial<br />

step length of about 10 " <strong>and</strong> a maximum step length of about 5 10 9 ", so that for a 36-bit<br />

computer the number range of integers is not exceeded by the largest required Fibonacci<br />

number. Finally, two further applications of the Fibonacci procedure may bementioned.<br />

By reversing the scheme, Wilde <strong>and</strong> Beightler (1967) obtain a method of boxing in the<br />

minimum. Kiefer (1957) shows how to proceed if values of the objective function can<br />

only be obtained at discrete, not necessarily equidistant, points. More about such lattice<br />

search problems can be found in Wilde (1964), <strong>and</strong> Beveridge <strong>and</strong> Schechter (1970).<br />

3.1.2.2.2 The Golden Section. It can sometimes be inconvenienttohave to specify<br />

in advance the number of interval divisions. In this case Kiefer (1953) <strong>and</strong> Johnson (1956)<br />

propose, instead of the reduction factor t (k) , which varies with the iteration number in<br />

the Fibonacci search, a constant factor<br />

t =<br />

2<br />

1+ p 5 ' 0:618 (positive rootof:t2 + t =1) (3.12)<br />

For large N ; k t (k) reduces to t. In addition, t is identical to the ratio of lengths a<br />

to b, which is obtained by dividing a total length of a + b into two pieces such that the<br />

smaller, a, has the same ratio to the larger, b, as the larger to the total. This harmonic<br />

division (after Euclid) is also known as the golden section, whichgave the procedure its<br />

name (Wilde, 1964). After N function calls the uncertainty interval is of length<br />

`N = t N;1 (b (0) ; a (0) )


One Dimensional Strategies 33<br />

For the limiting case N !1, since<br />

lim<br />

N!1 (tN;1 fN) =1:17<br />

the number of trials compared to the Fibonacci procedure is about 17% higher. Compared<br />

to the Fibonacci search without displacement, since<br />

lim<br />

N!1<br />

1<br />

2 tN;1 fN +1<br />

' 0:95<br />

the number of trials is about 5% lower. It should further be noted that, when using the<br />

Fibonacci method on digital computers, the Fibonacci numbers must rst be generated,<br />

or a su cient number of them must be provided <strong>and</strong> stored. The number of trials needed<br />

for a sequential golden section is<br />

N =<br />

2<br />

6<br />

log b(0) ;a (0)<br />

"<br />

log t<br />

3<br />

7 ; 1 log b(0) ; a (0)<br />

"<br />

(3.13)<br />

Other properties of the iteration sequence, including the criterion for termination at<br />

equal function values, are the same as those of the method of interval division according<br />

to Fibonacci numbers. Further details can be found, for example, in Avriel <strong>and</strong> Wilde<br />

(1968). Complete programs for the interval division procedures have been published by<br />

Pike <strong>and</strong> Pixner (1965), <strong>and</strong> Overholt (1967b,c) (see also Boothroyd, 1965 Pike, Hill, <strong>and</strong><br />

James, 1967 Overholt, 1967a).<br />

3.1.2.3 Interpolation Methods<br />

In many cases one is dealing with a continuous function, the minimum of which isto<br />

be determined. If, in addition to the value of the objective function, its slope can be<br />

speci ed everywhere, many methods can be derived that may converge faster than the<br />

optimal elimination methods. One of the oldest schemes is based on the procedure named<br />

after Bolzano for determining the zeros of a function. Assuming that one has two points<br />

at which the slopes of the objective function have opposite signs, one bisects the interval<br />

between them <strong>and</strong> determines the slope at the midpoint. This replaces the interval end<br />

point, which has a slope of the same sign. The procedure can then be repeated iteratively.<br />

At each trial the interval is halved. If the slope has to be calculated from the di erence of<br />

two objective function values, the bisection or midpoint strategy becomes the sequential<br />

dichotomous search. Avriel <strong>and</strong> Wilde (1966b) propose, as a variant of the Bolzano search,<br />

evaluating the slope at two points in the interval so as to increase the reduction factor.<br />

They show thattheirdiblock strategy is slightly superior to the dichotomous search.<br />

If derivatives of the objective function are available, or at least if it can be assumed that<br />

these exist, i.e., the function F (x)iscontinuous <strong>and</strong> di erentiable, far better strategies for<br />

the minimum search can be devised. They determine analytically the minimum of a trial<br />

function that coincides with the objective function, <strong>and</strong> possibly also its derivatives, at<br />

selected argumentvalues. One distinguishes linear, quadratic, <strong>and</strong> cubic models according<br />

to the order of the trial polynomial. Polynomials of higher order are virtually never used.


34 Hill climbing Strategies<br />

They require too much information about the function F (x). Furthermore, it turns out<br />

that in contrast to all the methods referred to so far such strategies do not always converge,<br />

for reasons other than rounding error.<br />

3.1.2.3.1 Regula Falsi Iteration. Given two points a (k) <strong>and</strong> b (k) , with their function<br />

values F (a (k) )<strong>and</strong>F (b (k) ), the simplest approximation formula for a zero c (k) of F (x) is<br />

c (k) = a (k) ; F (a (k) )<br />

b (k) ; a (k)<br />

F (b (k) ) ; F (a (k) )<br />

This technique, known as regula falsi or regula falsorum, predicts the position of the zero<br />

correctly if F (x) depends linearly on x. For one dimensional minimization it can be<br />

applied to nd a zero of Fx(x) =dF (x)=dx:<br />

c (k) = a (k) ; Fx(a (k) )<br />

b (k) ; a (k)<br />

Fx(b (k) ) ; Fx(a (k) )<br />

(3.14)<br />

The underlying model here is a second order polynomial with linear slope. If Fx(a (k) )<br />

<strong>and</strong> Fx(b (k) )have opposite sign, c (k) lies between a (k) <strong>and</strong> b (k) . If Fx(c (k) ) 6= 0, the<br />

procedure can be continued iteratively by using the reduced interval [a (k+1) b (k+1) ]=<br />

[a (k) c (k) ]ifFx(c (k) )<strong>and</strong>Fx(b (k) )have the same sign, or using [a (k+1) b (k+1) ]=[c (k) b (k) ]if<br />

Fx(c (k) ) <strong>and</strong> Fx(a (k) )have the same sign. If Fx(a (k) )<strong>and</strong>Fx(b (k) )have the same sign, c (k)<br />

must lie outside [a (k) b (k) ]. If Fx(c (k) ) has the same sign again, c (k) replaces the argument<br />

value at which jFxj is greatest. This extrapolation is also called the secant method. If<br />

Fx(c (k) ) has the opposite sign, one can continue using regula falsi to interpolate iteratively.<br />

As a termination criterion one can apply Fx(c (k) ) = 0 or jFx(c (k) )j ", " > 0. A<br />

minimum can only be found reliably in this way if the starting point of the search lies in<br />

its neighborhood. Otherwise the iteration sequence can also converge to a maximum, at<br />

which, of course, the slope also goes to zero if Fx(x) iscontinuous.<br />

Whereas in the Bolzano interval bisection method only the sign of the function whose<br />

zero is sought needs to be known at the argument values, the regula falsi method also<br />

makes use of the magnitude of the function. This extra information should enable it to<br />

converge more rapidly. As Ostrowski (1966) <strong>and</strong> Jarratt (1967, 1968) show, for example,<br />

this is only the case if the function corresponds closely enough to the assumed model.<br />

The simpler bisection method is better, even optimal (as a zero method in the minimax<br />

sense), if the function has opposite signs at the two starting points, is not linear <strong>and</strong> not<br />

convex. In this case the linear interpolation sometimesconverges very slowly. According to<br />

Stanton (1969), a cubic interpolation as a line search in the eccentric quadratic case often<br />

yields even worse results. Dixon (1972a) names two variants of the regula falsi recursion<br />

formula, but it is not known whether they lead to better convergence. Fox (1971) proposes<br />

a combination of the Bolzano method with the linear interpolation. Dekker (1969) (see<br />

also Forsythe, 1969) accredits this procedure with better than linear convergence. Even<br />

greater reliability <strong>and</strong> speed is attributed to the algorithm of Brent (1971), which follows<br />

Dekker's method by a quadratic interpolation process as soon as the latter promises to<br />

be successful.


One Dimensional Strategies 35<br />

It is inconvenient when dealing with minimization problems that the derivatives of<br />

the function are required. If the slopes are obtained from function values by a di erence<br />

method, di culties can arise from the nite accuracy of such a process. For this reason<br />

Brent (1973) combines regula falsi iteration with division according to the golden section.<br />

Further variations can be found in Schmidt <strong>and</strong> Trinkaus (1966), Dowell <strong>and</strong> Jarratt<br />

(1972), King (1973), <strong>and</strong> Anderson <strong>and</strong> Bjorck (1973).<br />

3.1.2.3.2 Newton-Raphson Iteration. Newton's interpolation formula for improving<br />

an approximate solution x (k) to the equation F (x) = 0 (see for example Madsen,<br />

1973)<br />

x (k+1) = x (k) ; F (x(k) )<br />

Fx(x (k) )<br />

uses only one argument value, but requires the value of the derivative of the function<br />

as well as the function itself. If F (x) is linear in x, the zero is correctly predicted here,<br />

otherwise an improved approximation is obtained at best, <strong>and</strong> the process must be repeated.<br />

Like regula falsi, Newton's recursion formula can also be applied to determining<br />

Fx(x) = 0, with of course the reservations already stated. The so-called Newton-Raphson<br />

rule is then<br />

x (k+1) = x (k) ; Fx(x (k) )<br />

Fxx(x (k) )<br />

(3.15)<br />

If F (x) is not quadratic, the necessary number of iterations must be made until a termination<br />

criterion is satis ed. Dixon (1972a) for example uses the condition jx (k+1) ; x (k) j


36 Hill climbing Strategies<br />

parabola as the model function (quadratic interpolation). Assuming that the three points<br />

are a (k) < b (k) < c (k) , with the objective function values F (a (k) )F(b (k) ) <strong>and</strong> F (c (k) ),<br />

the trial parabola P (x) hasavanishing rst derivative atthepoint<br />

d (k) = 1<br />

2<br />

[(b (k) ) 2 ; (c (k) ) 2 ] F (a (k) )+[(c (k) ) 2 ; (a (k) ) 2 ] F (b (k) )+[(a (k) ) 2 ; (b (k) ) 2 ] F (c (k) )<br />

[b (k) ; c (k) ] F (a (k) )+[c (k) ; a (k) ] F (b (k) )+[a (k) ; b (k) ] F (c (k) )<br />

(3.16)<br />

This point is a minimum only if the denominator is positive. Otherwise d (k) represents<br />

a maximum or a saddle point. In the case of a minimum, d (k) is introduced as a new<br />

argument value <strong>and</strong> one of the old ones is deleted:<br />

a (k+1) = a (k)<br />

b (k+1) = d (k)<br />

c (k+1) = b (k)<br />

a (k+1) = d (k)<br />

b (k+1) = b (k)<br />

c (k+1) = c (k)<br />

a (k+1) = b (k)<br />

b (k+1) = d (k)<br />

c (k+1) = c (k)<br />

a (k+1) = a (k)<br />

b (k+1) = b (k)<br />

c (k+1) = d (k)<br />

9<br />

>=<br />

><br />

9<br />

>=<br />

><br />

9<br />

>=<br />

><br />

9<br />

>=<br />

><br />

( if a (k)


One Dimensional Strategies 37<br />

F<br />

P<br />

F(x)<br />

a (k)<br />

Minimum P(x)<br />

P(x)<br />

d (k) b (k)<br />

(k+1) (k+1) (k+1)<br />

a b c<br />

Figure 3.3: Lagrangian quadratic interpolation<br />

the objective function <strong>and</strong> the trial function. In the most favorable case the objective<br />

function is also quadratic. Then one iteration is su cient. This is why it can be advantageous<br />

to use an interpolation method rather than an interval division method such as<br />

the optimal Fibonacci search. Dijkhuis (1971) describes a variant of the basic procedure<br />

in which four argument values are taken. The two inner ones <strong>and</strong> each of the outer ones<br />

in turn are used for two separate quadratic interpolations. The weighted mean of the two<br />

results yields a new iteration point. This procedure is claimed to increase the reliability<br />

of the minimum search for non-quadratic objective functions.<br />

3.1.2.3.4 Hermitian Interpolation. If one chooses, instead of a parabola, a third<br />

order polynomial as a test function, more information is needed to make it agree with the<br />

objective function. Beveridge <strong>and</strong> Schechter (1970) describe such acubic interpolation<br />

procedure. In place of four argument values <strong>and</strong> associated objective function values, two<br />

points a (k) <strong>and</strong> b (k) are enough, if, in addition to the values of the objective function, values<br />

of its slope, i.e., the rst order di erentials, are available. This Hermitian interpolation is<br />

mainly used in conjunction with gradient or quasi-Newton methods, because in any case<br />

they require the partial derivatives of the objective function, or they approximate them<br />

using nite di erence methods.<br />

The interpolation formula is:<br />

c (k) = a (k) +(b (k) ; a (k) )<br />

w ; Fx(a (k) ) ; z<br />

2 w + Fx(b (k) ) ; Fx(a (k) )<br />

c (k)<br />

x


38 Hill climbing Strategies<br />

where<br />

<strong>and</strong><br />

z = 3[F (a(k) ) ; F (b (k) )]<br />

(a (k) ; b (k) )<br />

q<br />

2 w =+ z ; Fx(a (k) ) Fx(b (k) )<br />

; Fx(a (k) ) ; Fx(b (k) ) (3.18)<br />

Recursive exchange of the argumentvalues takes place according to the sign of Fx(c (k) )<br />

in a similar way to the Bolzano method. It should also be veri ed here that a (k) <strong>and</strong> b (k)<br />

always bound the minimum. Fletcher <strong>and</strong> Reeves (1964) use Hermitian interpolation in<br />

their conjugate gradient method as a subroutine to approximate a relative minimum in<br />

speci ed directions. They terminate the iteration as soon as ja (k) ; b (k) j


Multidimensional Strategies 39<br />

the second variable. Both end results are then used to reject one of the values of the<br />

rst variable that were held constant, <strong>and</strong> to reduce the size of the interval with respect<br />

to this parameter. By analogy, a three dimensional minimization consists of a recursive<br />

sequence of two dimensional Fibonacci searches. If the number of function calls to reduce<br />

the uncertainty interval [aibi] su ciently with respect to the variable xi is Ni, then the<br />

total number N also obeys Equation (3.19). The advantage compared to the grid method<br />

is simply that Ni depends logarithmically on the ratio of initial interval size to accuracy<br />

(see Equation (3.11)). Aside from the fact that each variable must be suitably xed in<br />

advance, <strong>and</strong> that the unimodality requirement of the objective function only guarantees<br />

that local minima are approached, there is furthermore no guarantee that a desired<br />

accuracy will be reached within a nite number of objective function calls (Kaupe, 1964).<br />

Other elimination procedures have been extended in a similar way tothemultivariable<br />

case, such as, for example, the dichotomous search (Wilde, 1965) <strong>and</strong> a sequential boxingin<br />

method (Berman, 1969). In each case the e ort rises exponentially with the number<br />

of variables. Another elimination concept for the multidimensional case, the method of<br />

contour tangents, is due to Wilde (1963) (see also Beamer <strong>and</strong> Wilde, 1969). It requires,<br />

however, the determination of gradient vectors. Newman (1965) indicates how to proceed<br />

in the two dimensional case, <strong>and</strong> also for discrete values of the variables (lattice search).<br />

He requires that F (x) beconvex <strong>and</strong> unimodal. Then the cost should only increase<br />

linearly with the number of variables. For n 3, however, no applications of the contour<br />

tangent method are as yet known.<br />

Transferring interpolation methods to the n-dimensional case means transforming the<br />

original minimum problem into a series of problems, in the form of a set of equations to<br />

be solved. As non-linear equations can only be solved iteratively, this procedure is limited<br />

to the special case of linear interpolation with quadratic objective functions. Practical<br />

algorithms based on the regula falsi iteration can be found in Schmidt <strong>and</strong> Schwetlick<br />

(1968) <strong>and</strong> Schwetlick (1970). The procedure is not widely used as a minimization method<br />

(Schmidt <strong>and</strong> Vetters, 1970). The slopes of the objective function that it requires are<br />

implicitly calculated from function values. The secant method described by Wolfe (1959b)<br />

for solving a system of non-linear equations also works without derivatives of the functions.<br />

From n +1 current argument values, it extracts the required information about the<br />

structure of the n equations.<br />

Just as the transition from simultaneous to sequential one dimensional search methods<br />

reduces the e ort required at the expense of global convergence, so each further<br />

acceleration in the multidimensional case is bought by a reduction in reliability. High<br />

convergence rates are achieved by gathering more information <strong>and</strong> interpreting it in the<br />

form of a model of the objective function. If assumptions <strong>and</strong> reality agree, then this<br />

procedure is successful if they do not agree, then extrapolations lead to worse predictions<br />

<strong>and</strong> possibly even to ab<strong>and</strong>oning an optimization strategy. Figure 3.4 shows the contour<br />

diagram of a smooth two parameter objective function.<br />

All the strategies to be described assume a degree of smoothness in the objective<br />

function. They do not converge with certainty to the global minimum but at best to one<br />

of the local minima, or sometimes only to a saddle point.


40 Hill climbing Strategies<br />

x<br />

2<br />

c<br />

a<br />

d<br />

b<br />

c<br />

e<br />

x<br />

1<br />

a: Global minimum<br />

b: Local minimum<br />

c: Local maxima<br />

d,e: Saddle points<br />

Figure 3.4: Contour lines of a two parameter function F (x1x2)<br />

Various methods are distinguished according to the kind of information they need,<br />

namely:<br />

Direct search methods, which only need objective function values F (x)<br />

Gradient methods, which also use the rst partial derivatives rF (x) ( rst order<br />

strategies)<br />

Newton methods, which in addition make use of the second partial derivatives<br />

r 2 F (x) (second order strategies)<br />

The emphasis here will be placed on derivative-free strategies, that is on direct search<br />

methods, <strong>and</strong> on such higher order procedures as glean their required information about<br />

derivatives from a sequence of function values. The recursion scheme of most multidimensional<br />

strategies is based on the formula:<br />

x (k+1) = x (k) + s (k) v (k) (3.20)<br />

They di er from each other with regard to the choice of step length s (k) <strong>and</strong> search direction<br />

v (k) , the former being a scalar <strong>and</strong> the latter a vector of unit length.<br />

3.2.1 Direct Search Strategies<br />

Direct search strategies do without constructing a model of the objective function. Instead,<br />

the directions, <strong>and</strong> to some extent also step lengths, are xed heuristically, orby


Multidimensional Strategies 41<br />

ascheme of some sort, not always in an optimal way under the assumption of a speci ed<br />

internal model. Thus the risk is run of not being able to improve the objective function<br />

value at each step. Failures must accordingly be planned for, if something can also be<br />

\learned" from them. This trial character of search strategies has earned them the name<br />

of trial-<strong>and</strong>-error methods. The most important of them that are still in current use will<br />

be presented in the following chapters. Their attraction lies not in theoretical proofs of<br />

convergence <strong>and</strong> rates of convergence, but in their simplicity <strong>and</strong> the fact that they have<br />

proved themselves in practice. In the case of convex or quadratic unimodal objective<br />

functions, however, they are generally inferior to the rst <strong>and</strong> second order strategies to<br />

be described later.<br />

3.2.1.1 Coordinate Strategy<br />

The oldest of multidimensional search procedures trades under a variety of names (e.g.,<br />

successive variation of the variables, relaxation, parallel axis search, univariate or univariant<br />

search, one-variable-at-a-time method, axial iteration technique, cyclic coordinate<br />

ascent method, alternating variable search, sectioning method, Gauss-Seidel strategy) <strong>and</strong><br />

manifests itself in a large number of variations.<br />

The basic idea of the coordinate strategy, as it will be called here, comes from linear<br />

algebra <strong>and</strong> was rst put into practice by Gauss <strong>and</strong> Seidel in the single step relaxation<br />

method of solving systems of linear equations (see Ortega <strong>and</strong> Rocko , 1966 Ortega <strong>and</strong><br />

Rheinboldt, 1967 VanNorton, 1967 Schwarz, Rutishauser, <strong>and</strong> Stiefel, 1968). As an<br />

optimization strategy it is attributed to Southwell (1940, 1946) or Friedmann <strong>and</strong> Savage<br />

(1947) (see also D'Esopo, 1959 Zangwill, 1969 Zadeh, 1970 Schechter, 1970).<br />

The parameters in the iteration formula (3.20) are varied in turn individually, i.e., the<br />

search directions are xed by the rule:<br />

v (k) = e` with ` =<br />

( n if k = pn p integer<br />

k (mod n) otherwise<br />

where e` is the unit vector whose components have thevalue zero for all i 6= `, <strong>and</strong> unity<br />

for i = `. In its simplest form the coordinate strategy uses a constant steplengths (k) .<br />

Since, however, the direction to the minimum is unknown, both positive <strong>and</strong> negative<br />

values of s (k) must be tried. In a rst <strong>and</strong> easy improvement on the basic procedure, a<br />

successful step is followed by further steps in the same direction, until a worsening of the<br />

objective function is noted. It is clear that the choice of step length strongly in uences<br />

the number of trials required on the one h<strong>and</strong> <strong>and</strong> the accuracy that can be achieved in<br />

the approximation on the other.<br />

One can avoid the problem of the choice of step length most e ectively by using a line<br />

search method each time to locate the relative optimum in the chosen direction. Besides<br />

the interval division methods, the Fibonacci search <strong>and</strong> the golden section, Lagrangian<br />

interpolation can also be used, since all these procedures work without knowledge of the<br />

partial derivatives of the objective function. A further strategy for boxing in the minimum<br />

must be added, in order to establish a suitable starting interval for each one dimensional<br />

minimization.


42 Hill climbing Strategies<br />

The algorithm can be described as follows:<br />

Step 0: (Initialization)<br />

Establish a starting point x (00) <strong>and</strong> choose an accuracy bound ">0 for the<br />

one dimensional search.<br />

Set k =0<strong>and</strong>i =1.<br />

Step 1: (Boxing in the minimum)<br />

Starting from x (ki;1) with an initial step length s = smin (e.g., smin =10"),<br />

box in the minimum in the direction ei.<br />

Double the step length at each successful trial, as long as s


Multidimensional Strategies 43<br />

e 2<br />

(0,0)<br />

x<br />

Start<br />

(0,1)<br />

x<br />

(0,2) (1,0)<br />

x x<br />

=<br />

(1,1)<br />

x<br />

(1,2) (2,0)<br />

x x<br />

=<br />

End<br />

x (2,1)<br />

x (2,2) (3,0)<br />

= x<br />

e 1<br />

Figure 3.5: Coordinate strategy<br />

Numbering<br />

Iteration index<br />

Direction index<br />

Variable<br />

values<br />

k i x 1 x 2<br />

(0) 0 0 0 9<br />

(1) 0 1 3 9<br />

(2) 0 2 3 5<br />

carried over<br />

(2) 1 0 3 5<br />

(3) 1 1 7 5<br />

(4) 1 2 7 3<br />

carried over<br />

(4) 2 0 7 3<br />

(5) 2 1 9 3<br />

(6) 2 2 9 2<br />

carried over<br />

(6) 3 0 9 2<br />

convergence are so small that the number of signi cant gures to which data are h<strong>and</strong>led<br />

by the computer is insu cient for the variables to be signi cantly altered.<br />

Numerical tests with the coordinate strategy show that an exact determination of the<br />

relative minima is unnecessary, at least at distances far from the objective. It can even<br />

happen that one inaccurate line search can make the next one particularly e ective. This<br />

phenomenon is exploited in the procedures known as under- or overrelaxation (Engeli,<br />

Ginsburg, Rutishauser, <strong>and</strong> Stiefel, 1959 Varga, 1962 Schechter, 1962, 1968 Cryer,<br />

1971). Although the relative optimum is determined as before, either an increment is<br />

added on in the same direction or an iteration point is de ned on the route between the<br />

start <strong>and</strong> nish of the one dimensional search. The choice of the under- or overrelaxation<br />

factor requires assumptions about the structure of the problem. The necessary information<br />

is available for the problem of solving systems of linear equations with a positive de nite<br />

matrix of coe cients, but not for general optimization problems.<br />

Further possible variations of the coordinate strategy are obtained if the sequence of<br />

searches parallel to the axes is not made to follow the cyclic scheme. Southwell (1946), for<br />

example, always selects either the direction in which the slope of the objective function<br />

Fx i(x) = @F(x)<br />

is maximum, or the direction in which the largest step can be taken. Toevaluate the choice<br />

@xi


44 Hill climbing Strategies<br />

of direction, Synge (1944) uses the ratio Fx i=Fx ix i of rst to second partial derivatives at<br />

the point x (k) . Whether or not the additional e ort for this scheme is worthwhile depends<br />

on the particular topology of the contour surface. Adding directions other than parallel<br />

to the axes is also often found to accelerate the convergence (Pinsker <strong>and</strong> Tseitlin, 1962<br />

Elkin, 1968).<br />

Its great simplicity has always made the coordinate strategy attractive, despite its<br />

sometimes slow convergence. Rules for h<strong>and</strong>ling constraints{not counting here penalty<br />

function methods{have been devised, for example, by Singer (1962), Murata (1963), <strong>and</strong><br />

Mugele (1961, 1962, 1966). Singer's maze method departs from the coordinate directions<br />

as soon as a constraint is violated <strong>and</strong> progresses into the feasible region or along the<br />

boundary. For this, however, the gradient of the active constraints must be known.<br />

Mugele's poor man's optimizer, a discrete coordinate strategy without line searches, not<br />

only h<strong>and</strong>les active constraints, but can also cope with narrow valleys that do not run<br />

parallel to the coordinate axes. In this case diagonal steps are permitted. Similar to this<br />

strategy is the direct search methodofHooke <strong>and</strong> Jeeves, which because it has become<br />

very widely used will be treated in detail in the following chapter.<br />

3.2.1.2 Strategy of Hooke <strong>and</strong> Jeeves: Pattern Search<br />

The direct pattern search of Hooke <strong>and</strong> Jeeves (1961) was originally devised as an automatic<br />

experimental strategy (see Hooke, 1957 Hooke <strong>and</strong>VanNice, 1959). It is nowadays<br />

much more widely used as a numerical parameter optimization procedure.<br />

The method by which the direct pattern search works is characterized by two types<br />

of move. At each iteration there is an exploratory move, which represents a simpli ed<br />

Gauss-Seidel variation with one discrete step per coordinate direction. No line searches<br />

are made. On the assumption that the line joining the rst <strong>and</strong> last points of the exploratory<br />

move represents an especially favorable direction, an extrapolation is made along<br />

it (pattern move) before the variables are varied again individually. The extrapolations<br />

do not necessarily lead to an improvement in the objective function value. The success of<br />

the iteration is only checked after the following exploratory move. The length of the pattern<br />

step is thereby increased each time, while the optimal search direction only changes<br />

gradually. Thispays o to most advantage where there are narrow valleys. An ALGOL<br />

implementation of the strategy is due to Kaupe (1963). It was improved by Bell <strong>and</strong> Pike<br />

(1966), as well as by Smith (1969) (see also DeVogelaere, 1968 Tomlin <strong>and</strong> Smith, 1969).<br />

In the rst case, the sequence of plus <strong>and</strong> minus exploratory steps in the coordinate directions<br />

is modi ed to suit the conditions at any instant. The second improvement aims<br />

at permitting a retrospective scaling of the variables as the step lengths can be chosen<br />

individually to be di erent from each other.<br />

The algorithm runs as follows:<br />

Step 0: (Initialization)<br />

Choose a starting point x (00) = x (;1n) , an accuracy bound ">0, <strong>and</strong> initial<br />

step lengths s (0)<br />

i 6= 0 for all i = 1(1)n (e.g., s (0)<br />

1 = 1 if no more plausible<br />

values are at h<strong>and</strong>).<br />

Set k =0<strong>and</strong>i =1.


Multidimensional Strategies 45<br />

Step 1: (Exploratory move)<br />

Construct x0 = x (ki;1) + s (k)<br />

i ei (discrete step in positive direction).<br />

If F (x0 )


46 Hill climbing Strategies<br />

Step 10: (Iteration loop)<br />

Increase k k +1, set i = 1, <strong>and</strong> go to step 1.<br />

Figure 3.6, together with the following table, presents a possible sequence of iteration<br />

points. From the starting point (0), a successful step (1) <strong>and</strong> (3) is taken in each coordinate<br />

direction. Since the end point of this exploratory move is better than the starting point,<br />

it serves as a basis for the rst extrapolation. This leads to (4). It is not checked here<br />

whether or not any improvement over (3) has occurred. At the next exploratory move,<br />

from (4) to (5), the objective function value can only be improved in one coordinate<br />

direction. It is now checked whether the condition (5) is better than that of point (3).<br />

This is the case. The next extrapolation step, to (8), has a changed direction because<br />

of the partial failure of the exploration, but maintains its increased length. Now it will<br />

be assumed that, starting from (8) with the hitherto constant exploratory step length,<br />

no success will be scored in any coordinate direction compared to (8). The comparison<br />

with (5) shows that a reduction in the value of the objective function has nevertheless<br />

occurred. Thus the next extrapolation to (13) remains the same as the previous one with<br />

respect to direction <strong>and</strong> step length. The next exploratory move leads to a point (15),<br />

which although better than (13) is worse than (8). Now there is a return to (8). Only<br />

after the exploration again has no success here, are the step lengths halved in order to<br />

make further progress possible. The fact that at some points in this case the objective<br />

function was tested several times is not typical for n>2.<br />

(0)<br />

(2)<br />

(3)<br />

(1)<br />

(4)<br />

(7)<br />

(5)<br />

(21)<br />

(12)<br />

(22)<br />

(18)<br />

(17)<br />

(9)<br />

(6)<br />

(10)<br />

(19)<br />

(23)<br />

(24)<br />

(8)<br />

(25) (11)<br />

(20)<br />

Starting point<br />

Success<br />

Failure<br />

Extrapolation<br />

Final point<br />

(16)<br />

(15)<br />

(13) (14)<br />

Figure 3.6: Strategy of Hooke <strong>and</strong> Jeeves


Multidimensional Strategies 47<br />

Numbering Iteration Direction Variable Comparison Step Remarks<br />

index index values point lengths<br />

k i x 1 x 2 s 1 s 2<br />

(0) 0 0 0 9 - 2 2 starting point<br />

(1) 0 1 2 9 (0) success<br />

(2) 0 2 2 11 (1) failure<br />

(3) 0 2 2 7 (1) success<br />

(4) 1 0 4 5 - 2 ;2 extrapolation<br />

(5) 1 1 6 5 (4),(3) success, success<br />

(6) 1 2 6 3 (5) failure<br />

(7) 1 2 6 7 (5) failure<br />

(8) 2 0 10 3 -,(5) 2 ;2 extrapolation,<br />

success<br />

(9) 2 1 12 3 (8) failure<br />

(10) 2 1 8 3 (8) failure<br />

(11) 2 2 10 1 (8) failure<br />

(12) 2 2 10 5 (8) failure<br />

(13) 3 0 14 1 - 2 ;2 extrapolation<br />

(14) 3 1 16 1 (13) failure<br />

(15) 3 1 12 1 (13),(8) success, failure<br />

(16) 3 2 12 ;1 (15) failure<br />

(17) 3 2 12 3 (15) failure<br />

(8) 4 0 10 3 - 2 ;2 return<br />

(18) 4 1 12 3 (8) failure<br />

(19) 4 1 8 3 (8) failure<br />

(20) 4 2 10 1 (8) failure<br />

(21) 4 2 10 5 (8) failure<br />

(8) 5 0 10 3 - 1 ;1 step lengths<br />

halved<br />

(22) 5 1 11 3 (8) failure<br />

(23) 5 1 9 3 (8) success<br />

(24) 5 2 9 2 (23),(8) success, success<br />

(25) 6 0 8 1 - ;1 ;1 extrapolation<br />

A proof of convergence of the direct search ofHooke <strong>and</strong> Jeeves has been derived by<br />

Cea (1971) it is valid under the condition that the objective function F (x) is strictly<br />

convex <strong>and</strong> continuously di erentiable. The computational operations are very simple<br />

<strong>and</strong> even in unforeseen circumstances cannot lead to invalid arithmetical manipulations<br />

such as, for example, division by zero. A further advantage of the strategy is its small<br />

storage requirement. It is of order O(n). The selected pattern accelerates the search<br />

in valleys, provided they are not sharply bent. The extrapolation steps follow, in an<br />

approximate way, the gradient trajectory. However, the limitation of the trial steps to<br />

coordinate directions can also lead to a premature termination of the search here, as in<br />

the coordinate strategy.<br />

Further variations on the method, which have notachieved such popularity, are due<br />

to, among others, Wood (1960, 1962, 1965 see also Weisman <strong>and</strong> Wood, 1966 Weisman,


48 Hill climbing Strategies<br />

Wood, <strong>and</strong> Rivlin, 1965), Emery <strong>and</strong> O'Hagan (1966 spider method), Fend <strong>and</strong> Ch<strong>and</strong>ler<br />

(1961 moment rosetta search), B<strong>and</strong>ler <strong>and</strong> MacDonald (1969 razor search see also<br />

B<strong>and</strong>ler, 1969a,b), Pierre (1969 bunny-hop search), Erlicki <strong>and</strong> Appelbaum (1970), <strong>and</strong><br />

Houston <strong>and</strong> Hu man (1971). A more detailed enumeration of older methods can be<br />

found in Lavi <strong>and</strong> Vogl (1966). Some of these modi cations allow constraints in the form<br />

of inequalities to be taken into account directly. Similar to them is a program designed<br />

by M.Schneider (see Drenick, 1967). Aside from the fact that in order to use it one must<br />

specify which of the variables enter the individual constraints, it does not appear to work<br />

very e ectively. Excessively long computation times <strong>and</strong> inaccurate results, especially<br />

with many variables, made it seem reasonable to omit M. Schneider's procedure from the<br />

strategy comparison (see Chap. 6). The problem of howtotakeinto account constraints in<br />

a direct search has also been investigated by Klingman <strong>and</strong> Himmelblau (1964) <strong>and</strong> Glass<br />

<strong>and</strong> Cooper (1965). The resulting methods, to a greater or lesser extent, transform the<br />

original problem. They have nowadays been superseded by the general penalty function<br />

methods. Automatic \optimizators" for on-line optimization of chemical processes, which<br />

once were well known under the names Opcon ( Bernard <strong>and</strong> Sonderquist, 1959) <strong>and</strong><br />

Optimat (Weiss, Archer, <strong>and</strong> Burt, 1961), also apply modi ed versions of the direct search<br />

method. Another application is described by Sawaragi at al. (1971).<br />

3.2.1.3 Strategy of Rosenbrock: Rotating Coordinates<br />

Rosenbrock's idea (1960) was to remove the limitation on the number of search directions<br />

in the coordinate strategy so that the search steps can move parallel to the axes of a<br />

coordinate system that can rotate in the space IR n . One of the axes is set to point in<br />

the direction that appears most favorable. For this purpose the experience of successes<br />

<strong>and</strong> failures gathered in the course of the iterations is used in the manner of Hooke <strong>and</strong><br />

Jeeves' direct search. The remaining directions are xed normal to the rst <strong>and</strong> mutually<br />

orthogonal.<br />

To start with, the search directions comprise the unit vectors<br />

v (0)<br />

i = ei for all i = 1(1)n<br />

Starting from the point x (00) , a trial is made in each direction with the discrete initial<br />

step lengths s (00)<br />

i for all i =1(1)n. When a success is scored (including equality of the<br />

objective function values), the changed variable vector is retained <strong>and</strong> the step length is<br />

multiplied by a positivefactor >1 for a failure, the vector of variables is left unchanged<br />

<strong>and</strong> the step length is multiplied by a negative factor ;1 <


Multidimensional Strategies 49<br />

where<br />

<strong>and</strong><br />

wi =<br />

8<br />

><<br />

>:<br />

ai <br />

P<br />

ai ; i;1<br />

(a<br />

j=1<br />

T i v (k+1)<br />

j<br />

ai =<br />

nX<br />

j=i<br />

) v (k+1)<br />

<br />

j<br />

d (k)<br />

j v (k)<br />

j for all i = 1(1)n<br />

for i =1<br />

for i = 2(1)n<br />

(3.21)<br />

A scalar d (k)<br />

i represents the distance covered in direction v (k)<br />

i in the kth iteration.<br />

Thus v (k+1)<br />

1 points in the overall successful direction of the step k. It is expected that a<br />

particularly large search step can be taken in this direction at the next iteration. The<br />

requirement ofwaiting for at least one success in each direction has the e ect that no<br />

direction is lost, <strong>and</strong> the v (k)<br />

i always span the full n-dimensional Euclidean space. The<br />

termination rule or convergence criterion is determined by the lengths of the vectors<br />

a (k)<br />

1<br />

<strong>and</strong> a (k)<br />

2 . Before each orthonormalization there is a test whether ka (k)<br />

1 k < " <strong>and</strong><br />

ka (k)<br />

2 k > 0:3 ka (k)<br />

1 k. When this condition is satis ed in six consecutive iterations, the<br />

search is ended. The second condition is designed to ensure that a premature termination<br />

of the search does not occur just because the distances covered have become small. More<br />

signi cantly, the requirement is also that the main success direction changes su ciently<br />

rapidly something that Rosenbrock regards as a sure sign of the proximity of a minimum.<br />

As the strategy comparison will show (see Chap. 6), this requirement is often too strong.<br />

It even hinders the ending of the procedure in many cases.<br />

In his original publication Rosenbrock has already given detailed rules as to how<br />

inequality constraints can be treated. His procedure for doing this can be viewed as<br />

a partial penalty function method, since the objective function is only altered in the<br />

neighborhood of the boundaries. Immediately after each variation of the variables, the<br />

objective function value is tested. If the comparison is unfavorable, a failure is registered<br />

as in the unconstrained case. For equality oranimprovement, however, if the iteration<br />

point lies near a boundary of the region, the success criterion changes. For example, for<br />

constraints of the form Gj(x) 0 for all j = 1(1)m, the extended objective function<br />

~F (x) takes the form (this is one of several suggestions of Rosenbrock):<br />

in which<br />

<strong>and</strong><br />

'j(x) =<br />

~F(x) =F (x)+<br />

8<br />

><<br />

>:<br />

mX<br />

j=1<br />

0 <br />

3 ; 4 2 +2 3 <br />

1 <br />

'j(x)(fj ; F (x))<br />

=1; 1 Gj(x)<br />

if Gj(x)<br />

if 0


50 Hill climbing Strategies<br />

for constraints of the form aj(x) Gj(x) bj(x) this kind of double sided bounding is<br />

not always given however). The basis of the procedure is fully described in Rosenbrock<br />

<strong>and</strong> Storey (1966).<br />

Using the notations<br />

xi object variables<br />

si step sizes<br />

vi direction components<br />

di distances travelled<br />

i success/failure indications<br />

the extended algorithm of the strategy runs as follows:<br />

Step 0: (Initialization)<br />

Choose a starting point x (00) such that Gj(x (00) )<br />

Choose an accuracy parameter ">0<br />

>0 for all j = 1(1)m.<br />

(Rosenbrock takes " =10 ;4 =10 ;4 ).<br />

Set v (0)<br />

i = ei for all i = 1(1)n.<br />

Set k =0 (outer loop counter),<br />

=0 (inner loop counter).<br />

If there are constraints (m >0),<br />

set fj = F (x (00) ) for all j =1(1)m.<br />

Step 1: (Initialization of step sizes, distances travelled, <strong>and</strong> indicators)<br />

Set s (k0)<br />

i =0:1,<br />

d (k)<br />

i = 0, <strong>and</strong><br />

(k)<br />

i = ;1 for all i = 1(1)n.<br />

Set ` =0<strong>and</strong>i =1.<br />

Step 2: (Trial step)<br />

Construct x 0 = x (kn` + i ; 1) + s (k`)<br />

i<br />

v (k)<br />

i .<br />

If F (x 0 ) >F(x (kn` + i ; 1) ), go to step 6<br />

otherwise<br />

(<br />

=0 go to step 5<br />

if m<br />

6= 0 set ~ F = F (x0 ) <strong>and</strong> j =1:<br />

Step 3: (Test of feasibility)<br />

If Gj(x 0 )<br />

8<br />

><<br />

>:<br />

0 go to step 7<br />

set fj = F (x 0 ) <strong>and</strong> go to step 4<br />

otherwise, replace ~ F ~ F + 'j(x 0 )(fj ; ~ F ) as Equation (3.22).<br />

If ~F > F(x (kn` + i ; 1) ) go to step 6.<br />

Step 4: (Constraints loop)<br />

If j < mincrease j j +1 <strong>and</strong> go to step 3.


Multidimensional Strategies 51<br />

Step 5: (Store the success <strong>and</strong> update the internal memory)<br />

Set x (kn` + i) = x0 (k` +1)<br />

s i =3s (k`)<br />

i<br />

(k)<br />

(k)<br />

If i = ; 1 set i =0.<br />

Go to step 7.<br />

Step 6: (Internal memory update in case of failure)<br />

Set x (kn` + i) = x (kn` + i ; 1) ,<br />

(k` +1)<br />

s i = ; 1<br />

2 s(k`) i .<br />

If =0 set =1.<br />

(k)<br />

i<br />

(k)<br />

i<br />

, <strong>and</strong> replace d (k)<br />

i d (k)<br />

i<br />

Step 7: (Main loop)<br />

(k)<br />

If j = 1 for all j =1(1)n, gotostep8<br />

otherwise<br />

(<br />


52 Hill climbing Strategies<br />

(5)<br />

(0)<br />

(0)<br />

v<br />

2<br />

v<br />

1<br />

(0)<br />

(1)<br />

v<br />

2<br />

(2)<br />

(6)<br />

(1)<br />

(4)<br />

v<br />

1<br />

(1)<br />

(8)<br />

(7)<br />

(10)<br />

(2)<br />

v<br />

2<br />

(9)<br />

v<br />

(2)<br />

Figure 3.7: Strategy of Rosenbrock<br />

1<br />

(3)<br />

Starting point<br />

Success<br />

Failure<br />

Overall success<br />

(11)


Multidimensional Strategies 53<br />

Numbering Iteration Test index/ Variable Step Remarks<br />

index iteration values lengths<br />

k nl + i x 1 x 2 s 1 s 2<br />

(0) 0 0 0 9 2 2 starting point<br />

(1) 0 1 2 9 2 - success<br />

(2) 0 2 2 11 - 2 failure<br />

(3) 0 3 8 9 6 - failure<br />

(4) 0 4 2 8 - ;1 success<br />

(5) 0 5 ;1 8 ;3 - failure<br />

(6) 0 6 2 5 - ;3 failure<br />

(4) 1 0 2 8 2 2 transformation<br />

<strong>and</strong> orthogonalization<br />

(7) 1 1 3.8 7.1 2 - success<br />

(8) 1 2 2.9 5.3 - 2 success<br />

(9) 1 3 8.3 2.6 6 - success<br />

(10) 1 4 5.6 ;2.7 - 6 failure<br />

(11) 1 5 24.4 ;5.4 18 - failure<br />

(9) 2 0 8.3 2.6 2 2 transformation<br />

<strong>and</strong> orthogonalization<br />

In Figure 3.7, including the following table, a few iterations of the Rosenbrock strategy<br />

for n = 2 are represented geometrically. At the starting point x (00) the search directions<br />

are the same as the unit vectors. After three runs through (6 trials), the trial steps in each<br />

direction have led to a success followed by a failure. At the best condition thus attained,<br />

are generated. Five further trials<br />

(4) at x (04) = x (10) , new direction vectors v (1)<br />

1<br />

<strong>and</strong> v (1)<br />

2<br />

lead to the best point, (9) at x (13) = x (20) , of the second iteration, at which a new choice<br />

of directions is again made. The complete sequence of steps can be followed, if desired,<br />

with the help of the accompanying table.<br />

Numerical experiments show that within a few iterations the rotating coordinates<br />

become oriented such that one of the axes points along the gradient direction. The<br />

strategy thus allows sharp valleys in the topology of the objective function to be followed.<br />

Like the method of Hooke <strong>and</strong> Jeeves, Rosenbrock's procedure needs no information about<br />

partial derivatives <strong>and</strong> uses no line search method for exact location of relative minima.<br />

This makes it very robust. It has, however, one disadvantage compared to the direct<br />

pattern search: The orthogonalization procedure of Gram <strong>and</strong> Schmidt is very costly.<br />

It requires storage space of order O(n 2 ) for the matrices A = faijg <strong>and</strong> V = fvijg,<br />

<strong>and</strong> the number of computational operations even increases with O(n 3 ). At least in<br />

cases where the objective function call costs relatively little, the computation time for<br />

the orthogonalization with many variables becomes highly signi cant. Besides this, the<br />

number of parameters is in any case limited by the high storage space requirement.<br />

If there are constraints, care must be taken to ensure that the starting point is inside<br />

the allowed region <strong>and</strong> su ciently far from the boundaries. Examples of the application of


54 Hill climbing Strategies<br />

Rosenbrock's strategy can be found in Storey (1962), <strong>and</strong> Storey <strong>and</strong> Rosenbrock (1964).<br />

Among them is also a discretized functional optimization problem. For unconstrained<br />

problems there exists the code of Machura <strong>and</strong> Mulawa (1973). The Gram-Schmidt<br />

orthogonalization has been programmed, for example, by Clayton (1971).<br />

Lange-Nielsen <strong>and</strong> Lance (1972) have proposed, on the basis of numerical experiments,<br />

two improvements in the Rosenbrock strategy. The rst involves not setting constant<br />

step lengths at the beginning of a cycle or after each orthogonalization, but rather<br />

modifying them <strong>and</strong> simultaneously scaling them according to the successes <strong>and</strong> failures<br />

during the preceding cycle. The second improvement concerns the termination criterion.<br />

Rosenbrock's original version is replaced by the simpler condition that, according to the<br />

achievable computational accuracy, several consecutive trials yield the same value of the<br />

objective function.<br />

3.2.1.4 Strategy of Davies, Swann, <strong>and</strong> Campey (DSC)<br />

A combination of the Rosenbrock idea of rotating coordinates with one dimensional search<br />

methods is due to Swann (1964). It has become known under the name Davies-Swann-<br />

Campey (abbreviated DSC) strategy. The description of the procedure given by Box,<br />

Davies, <strong>and</strong> Swann (1969) di ers somewhat from that in Swann, <strong>and</strong> so several versions<br />

of the strategy have arisen in the subsequent literature. Preference is given here to the<br />

original concept of Swann, which exhibits some features in common with the method of<br />

conjugate directions of Smith (1962) (see also Sect. 3.2.2). Starting from x (00) , a line<br />

search is made in each of the unit directions v (0)<br />

i<br />

= ei for all i =1(1)n. This process is<br />

followed by a one dimensional minimization in the direction of the overall success so far<br />

achieved<br />

v (0)<br />

n+1 = x(0n) ; x (00)<br />

kx (0n) ; x (00) k<br />

with the result x (0n +1) .<br />

The orthogonalization follows this, e.g., by the Gram-Schmidt method. If one of the<br />

line searches was unsuccessful the new set of directions would no longer span the complete<br />

parameter space. Therefore only those old direction vectors along which a prescribed<br />

minimum distance has been moved are included in the orthogonalization process. The<br />

other directions remain unchanged. The DSC method, however, places a further hurdle<br />

before the coordinate rotation. If the distance covered in one iteration is smaller than<br />

the step length used in the line search, the latter is reduced by a factor 10, <strong>and</strong> the next<br />

iteration is carried out with the old set of directions.<br />

After an orthogonalization, one of the new directions (the rst) coincides with that of<br />

the (n+1)-th line search of the previous step. This can therefore also be interpreted as the<br />

rst minimization in the new coordinate system. Only n more one dimensional searches<br />

need be made to nish the iteration. As a termination criterion the DSC strategy uses<br />

the length of the total vector between the starting point <strong>and</strong>endpoint of an iteration.<br />

The search is ended when it is less than a prescribed accuracy bound.


Multidimensional Strategies 55<br />

The algorithm runs as follows:<br />

Step 0: (Initialization)<br />

Specify a starting point x (00) <strong>and</strong> an initial step length s (0)<br />

(the same for all directions).<br />

De ne an accuracy requirement ">0.<br />

Choose as a rst set of directions v (0)<br />

i = ei for all i = 1(1)n.<br />

Set k =0<strong>and</strong>i =1.<br />

Step 1: (Line search)<br />

Starting from x (ki;1) , seek the relative minimum x (ki)<br />

in the direction v (k)<br />

i<br />

such that<br />

F (x (ki) )=F (x (ki;1) + d (k)<br />

i v (k)<br />

i ) = min<br />

d fF (x (ki;1) + dv (k)<br />

i )g:<br />

Step 2: (Main 8 loop)<br />

>< :<br />

= n<br />

= n +1<br />

go to step 3<br />

go to step 4.<br />

Step 3: (Eventually one more line search)<br />

Construct z = x (kn) ; x (k0) .<br />

If kzk > 0, set v (k)<br />

n+1 = z=kzk i = n +1 , <strong>and</strong> go to step 1<br />

otherwise set x (kn+1) = x (kn) d (k)<br />

n+1 = 0 , <strong>and</strong> go to step 5.<br />

Step 4: (Check appropriateness of step length)<br />

If kx (kn+1) ; x (k0) k s (k) , go to step 6.<br />

Step 5: (Termination criterion)<br />

Set s (k+1) =0:1 s (k) .<br />

If s (k+1) " end the search<br />

otherwise set x (k+10) = x (kn+1) ,<br />

increase k k +1, set i = 1, <strong>and</strong> go to step 1.<br />

Step 6: (Check appropriateness of orthogonalization)<br />

Reorder the directions v (k)<br />

i <strong>and</strong> associated distances d (k)<br />

i such that<br />

jd (k)<br />

(<br />

>"for all i = 1(1)p<br />

i j<br />

" for all i = p +1(1)n:<br />

If p


56 Hill climbing Strategies<br />

No geometric representation has been attempted here, since the ne deviations from<br />

the Rosenbrock method would hardly be apparent on a simple diagram.<br />

The line search procedure of the DSC method has been described in detail by Box,<br />

Davies, <strong>and</strong> Swann (1969). It boxes in the minimum in the chosen direction using three<br />

equidistant points <strong>and</strong> then applies a single Lagrangian quadratic interpolation. The<br />

authors state that, in their experience, this is more economical with regard to the number<br />

of objective function calls than an exact line search with a sequence of interpolations.<br />

The algorithm of the line search is:<br />

Step 0: (Initialization)<br />

Specify a starting point x 0, a step length s, <strong>and</strong> a direction v<br />

(all given from the main program).<br />

Step 1: (Step forward)<br />

Construct x = x 0 + sv.<br />

If F (x) F (x 0), go to step 3.<br />

Step 2: (Step backward)<br />

Replace x x ; 2 sv <strong>and</strong> s ;s.<br />

If F (x) F (x 0), go to step 3<br />

otherwise (both rst trials without success) go to step 5.<br />

Step 3: (Further steps)<br />

Replace s 2 s <strong>and</strong> set x 0 = x.<br />

Construct x = x 0 + sv.<br />

If F (x) F (x 0), repeat step 3.<br />

Step 4: (Prepare interpolation)<br />

Replace s 0:5 s.<br />

Construct x = x 0 + sv.<br />

Of the four points just generated, x 0 ; s x 0x 0 + s, <strong>and</strong>x 0 +2s, reject<br />

the one which is furthest from the point that has the smallest value of the<br />

objective function.<br />

Step 5: (Interpolation)<br />

De ne the three available equidistant points x 1


Multidimensional Strategies 57<br />

Anumerical strategy comparison by M.J. Box (1966) shows the method to be a very<br />

e ective optimization procedure, in general superior both to the Hooke <strong>and</strong> Jeeves <strong>and</strong><br />

the Rosenbrock methods. However, the tests only refer to smooth objective functions with<br />

few variables. If the number of parameters is large, the costly orthogonalization process<br />

makes its inconvenient presence felt also in the DSC strategy.<br />

Several suggestions have been made to date as to how to simplify the Gram-Schmidt<br />

procedure <strong>and</strong> to reduce its susceptibilitytonumerical rounding error (Rice, 1966 Powell,<br />

1968a Palmer, 1969 Golub <strong>and</strong> Saunders, 1970, Householder method).<br />

Palmer replaces the conditions of Equation (3.21) by:<br />

v (k+1)<br />

i<br />

=<br />

8<br />

><<br />

>:<br />

v (k)<br />

i if<br />

nP<br />

j=1<br />

nP<br />

j=1<br />

dj v (k)<br />

s<br />

nP<br />

j = d<br />

j=1<br />

2<br />

j for i =1<br />

nP<br />

di;1<br />

j=i<br />

dj v (k)<br />

j<br />

; v (k)<br />

i;1<br />

nP<br />

j=i<br />

d 2<br />

j<br />

!<br />

s<br />

nP<br />

=<br />

j=i<br />

d 2<br />

j<br />

nP<br />

j=i;1<br />

d 2<br />

j for i = 2(1)n<br />

d 2<br />

j = 0 otherwise<br />

He shows that even if no success was obtained in one of the directions v (k)<br />

i , that is di =0,<br />

the new vectors v (k+1)<br />

i<br />

v (k+1)<br />

i+1<br />

for all i = 1(1)n still span the complete parameter space, because<br />

is set equal to ;v (k)<br />

i .Thus the algorithm does not need to be restricted to directions<br />

for which di >", as happens in the algorithm with Gram-Schmidt orthogonalization.<br />

The signi cant advantage of the revised procedure lies in the fact that the number<br />

of computational operations remains only of the order O(n 2 ). The storage requirement<br />

is also somewhat less since one n n matrix as an intermediate storage area is omitted.<br />

For problems with linear constraints (equalities <strong>and</strong> inequalities) Box, Davies, <strong>and</strong> Swann<br />

(1969) recommend a modi cation of the orthogonalization procedure that works in a<br />

similar way to the method of projected gradients of Rosen (1960, 1961) (see also Davies,<br />

1968). Non-linear constraints (inequalities) can be h<strong>and</strong>led with the created response<br />

surface technique devised by Carroll (1961), which is one of the penalty function methods.<br />

Further publications on the DSC strategy, also with comparison tests, are those of<br />

Swann (1969), Davies <strong>and</strong> Swann (1969), Davies (1970), <strong>and</strong> Swann (1972). Hoshino<br />

(1971) observes that in a narrow valley the search causes zigzag movements. His remedy<br />

for this is to add a further search, again in direction v (k)<br />

1 , after each setofn line searches.<br />

With the help of two examples, for n =2<strong>and</strong>n =3,heshows the accelerating e ect of<br />

this measure.<br />

3.2.1.5 Simplex Strategy of Nelder <strong>and</strong> Mead<br />

There are a group of methods called simplex strategies that work quite di erently to<br />

the direct search methods described so far. In spite of their common name they have<br />

nothing to do with the simplex method of linear programming of Dantzig (1966). The<br />

idea (Spendley, Hext, <strong>and</strong> Himsworth, 1962) originates in an attempt to reduce, as much<br />

as possible, the number of simultaneous trials in the experimental identi cation procedure


58 Hill climbing Strategies<br />

of factorial design (see for example Davies, 1954). The minimum number according to<br />

Brooks <strong>and</strong> Mickey (1961) is n +1. Thus instead of a single starting point, n +1vertices<br />

are used. They are arranged so as to be equidistant from each other: for n = 2 in an<br />

equilateral triangle for n =3atetrahedron <strong>and</strong> in general a polyhedron, also referred to<br />

as a simplex. The objective function is evaluated at all the vertices. The iteration rule is:<br />

Replace the vertex with the largest objective function value by a new one situated at its<br />

re ection in the midpoint of the other n vertices. This rule aims to locate the new point<br />

at an especially promising place. If one l<strong>and</strong>s near a minimum, the newest vertex can<br />

also be the worst. In this case the second worst vertex should be re ected. If the edge<br />

length of the polyhedron is not changed, the search eventually stagnates. The polyhedra<br />

rotate about the vertex with the best objective function value. A closer approximation to<br />

the optimum can only be achieved by halving the edge lengths of the simplex. Spendley,<br />

Hext, <strong>and</strong> Himsworth suggest doing this whenever a vertex is common to more than<br />

1:65 n +0:05 n 2 consecutive polyhedra. Himsworth (1962) holds that this strategy is<br />

especially advantageous when the number of variables is large <strong>and</strong> the determination of<br />

the objective function prone to error.<br />

To this basic procedure, various modi cations have been proposed by, among others,<br />

Nelder <strong>and</strong> Mead (1965), Box (1965), Ward, Nag, <strong>and</strong> Dixon (1969), <strong>and</strong> Dambrauskas<br />

(1970, 1972). Richardson <strong>and</strong> Kuester (1973) have provided a complete program. The<br />

most common version is that of Nelder <strong>and</strong> Mead, in which the main di erence from the<br />

basic procedure is that the size <strong>and</strong> shape of the simplex is modi ed during the run to<br />

suit the conditions at each stage.<br />

The algorithm, with an extension by O'Neill (1971), runs as follows:<br />

Step 0: (Initialization)<br />

Choose a starting point x (00) , initial step lengths s (0)<br />

i for all i = 1(1)n<br />

(if no better scaling is known, s (0)<br />

i = 1), <strong>and</strong> an accuracy parameter " > 0<br />

(e.g., " =10 ;8 ). Set c =1<strong>and</strong>k =0.<br />

Step 1: (Establish the initial simplex)<br />

x (k ) = x (k0) + cs (0) e for all = 1(1)n.<br />

Step 2: (Determine worst <strong>and</strong> best points for the normal re ection)<br />

Determine the indices w (worst point) <strong>and</strong> b (best point) such that<br />

F (x (kw) ) = maxfF (x (k ) ) = 0(1)ng<br />

F (x (kb) ) = minfF (x (k ) ) = 0(1)ng<br />

Construct x = 1<br />

n<br />

nP<br />

=0 6=w<br />

If F (x 0 ) <<br />

>:<br />

> 1 set x (k+1w) = x 0 <strong>and</strong> go to step 8<br />

=1 go to step 5<br />

=0 go to step 6:


Multidimensional Strategies 59<br />

Step 4: (Expansion)<br />

Construct x 00 =2x 0 ; x.<br />

If F (x 00 )


60 Hill climbing Strategies<br />

(1) (2)<br />

(3) (4)<br />

(6)<br />

(14)<br />

(5)<br />

(7)<br />

(17)<br />

(13)<br />

(12)<br />

(15)<br />

(16)<br />

(9)<br />

(11)<br />

(8)<br />

Starting point<br />

Vertex point<br />

First <strong>and</strong> last simplex<br />

(10)<br />

Figure 3.8: Simplex strategy of Nelder <strong>and</strong> Mead


Multidimensional Strategies 61<br />

Iteration Simplex vertices<br />

index worst best Remarks<br />

0 1 2 3 start simplex<br />

2 3 4 re ection<br />

2 3 5 expansion (successful)<br />

1 2 3 5<br />

3 6 5 re ection<br />

2 3 6 5<br />

6 5 7 re ection<br />

6 8 5 expansion (unsuccessful)<br />

3 6 5 7<br />

5 9 7 re ection<br />

4 5 9 7<br />

10 9 7 re ection<br />

9 11 7 partial outside contraction<br />

5 9 11 7<br />

11 7 12 re ection<br />

11 13 7 expansion (unsuccessful)<br />

6 11 7 12<br />

14 7 12 re ection<br />

15 7 12 partial inside contraction<br />

17 16 12 total contraction<br />

7 17 16 12<br />

The main di erence between this program <strong>and</strong> the original strategy of Nelder <strong>and</strong> Mead<br />

is that after a normal ending of the minimization there is an attempt to construct a new<br />

starting simplex. To this end, small trial steps are taken in each coordinate direction.<br />

If just one of these tests is successful, the search is started again but with a simplex<br />

of considerably reduced edge lengths. This restart procedure recommends itself because,<br />

especially for a large number of variables, the simplex tends to no longer span the complete<br />

parameter space, i.e., to collapse, without reaching the minimum.<br />

For few variables the simplex method is known to be robust <strong>and</strong> reliable, but also to be<br />

relatively costly. There are n +1 parameter vectors to be stored <strong>and</strong> the re ection requires<br />

anumber of computational operations of order O(n 2 ). According to Nelder <strong>and</strong> Mead, the<br />

number of function calls increases approximately as O(n 2:11 ) however, this empirical value<br />

is based only on test results with up to 10 variables. Parkinson <strong>and</strong> Hutchinson (1972a,b)<br />

describe a variant of the strategy in which the real storage requirement can be reduced<br />

by about half (see also Spendley, 1969). Masters <strong>and</strong> Drucker (1971) recommend altering<br />

the expansion or contraction factor after consecutive successes or failures respectively.<br />

3.2.1.6 Complex Strategy of Box<br />

M. J. Box (1965) calls his modi cation of the polyhedron strategy the complex method,<br />

an abbreviation for constrained simplex, since he conceived it also for problems with


62 Hill climbing Strategies<br />

inequality constraints. The starting point of the search does not need to lie in the feasible<br />

region. For this case Box suggests locating an allowed point by minimizing the function<br />

with<br />

until<br />

~F(x) =;<br />

j(x) =<br />

mX<br />

j=1<br />

Gj(x) j(x)<br />

( 0 if Gj(x) 0<br />

1 otherwise<br />

~F(x) =0<br />

(3.23)<br />

The two most important di erences from the Nelder-Mead strategy are the use of more<br />

vertices <strong>and</strong> the expansion of the polyhedron at each normal re ection. Both measures<br />

are intended to prevent the complex from eventually spanning only a subspace of reduced<br />

dimensionality, especially at active constraints. If an allowed starting point isgiven or<br />

has been found, it de nes one of the n +1 N 2 n vertices of the polyhedron. The<br />

remaining vertex points are xed by a r<strong>and</strong>om process in which each vector inside the<br />

closed region de ned by the explicit constraints has an equal probability of selection. If an<br />

implicit constraint is violated, the new point is displaced stepwise towards the midpoint<br />

of the allowed vertices that have already been de ned until it satis es all the constraints.<br />

Implicit constraints Gj(x) 0 are dealt with similarly during the course of the minimum<br />

search. If an explicit boundary is crossed, xi<br />

back in the allowed region to a value near the boundary.<br />

The details of the algorithm are as follows:<br />

ai, the o ending variable is simply set<br />

Step 0: (Initialization)<br />

Choose a starting point x (0) <strong>and</strong>anumber of vertices N n +1 (e.g.,<br />

N =2n). Number the constraints such that the rst j m 1 each depend<br />

only on one variable, x`j (Gj(x`j), explicit form).<br />

Test whether x (0) satis es all the constraints.<br />

If not, then construct a substitute objective function according to Equation<br />

(3.23).<br />

Set up the initial complex as follows:<br />

x (01) = x (0)<br />

<strong>and</strong> x (0 ) = x (0) + nP<br />

zi ei for = 2(1)N,<br />

i=1<br />

where the zi are uniformly distributed r<strong>and</strong>om numbers from the range<br />

( [aibi], if constraints are given in the form ai xi bi<br />

otherwise h x (0)<br />

i<br />

; 0:5 s x (0)<br />

i<br />

If Gj(x (0 ) ) < 0foranyj m1 > 1<br />

(0 )<br />

replace x `j 2 x (01) (0 )<br />

`j ; x `j :<br />

+0:5 s] where, e.g., s =1:<br />

If Gj(x (0 ) ) < 0foranyj m > 1<br />

replace x (0 ) 0:5[x (0 ) + 1 P;1<br />

x ;1<br />

=1<br />

(0 ) ].


Multidimensional Strategies 63<br />

(If necessary repeat this process until Gj(x (0 ) ) 0 for all j =1(1)m.)<br />

Set k =0.<br />

Step 1: (Re ection)<br />

Determine the index w (worst vertex) such that<br />

F (x (kw) ) = maxfF (x (k ) ) =1(1)Ng:<br />

Construct x = 1<br />

N;1<br />

NP<br />

=1<br />

6=w<br />

(k ) x<br />

<strong>and</strong> x 0 = x + (x ; x (kw) ) (over-re ection factor =1:3):<br />

Step 2: (Check for constraints)<br />

If m = 0, go to step 7 otherwise set j =1.<br />

If m 1 = 0 , go to step 5.<br />

Step 3: (Set vertex back into bounds for explicit constraints)<br />

Obtain g = Gj(x0 )=Gj(x0 `j).<br />

If g 0, go to step 4<br />

otherwise replace x0 `j x0 `j + g + " (backwards length " =10 ;6 ).<br />

If Gj(x0 ) < 0, replace x0 `j x0`j ; 2(g + ").<br />

Step 4: (Explicit constraints loop)<br />

Increase j j +1.<br />

If<br />

8<br />

><<br />

>:<br />

j m 1 go to step 3<br />

m 1 m go to step 7:<br />

Step 5: (Check implicit constraints)<br />

If Gj(x 0 ) 0, go to step 6<br />

otherwise go to step 8, unless the same constraint caused a failure ve times<br />

in a row without its function value Gj(x 0 ) being changed. In this case go to<br />

step 9.<br />

Step 6: (Implicit constraints loop)<br />

If j < m, increase j j +1 <strong>and</strong> go to step 5.<br />

Step 7: (Check for improvement)<br />

If F (x0 )


64 Hill climbing Strategies<br />

Step 9: (Termination)<br />

Determine the index b (best vertex) such that<br />

F (x (kb) )=minfF (x (k ) ) = 1(1)Ng:<br />

End the search with the result x (kb) <strong>and</strong> F (x (kb) ):<br />

Box himself reports that in numerical tests his complex strategy gives similar results<br />

to the simplex method of Nelder <strong>and</strong> Mead, but both are inferior to the method of<br />

Rosenbrock with regard to the number of objective function calls. He actually uses his<br />

own modi cation of the Rosenbrock method. Investigation of the e ect of the number of<br />

vertices of the complex <strong>and</strong> the expansion factor (in this case 2 n <strong>and</strong> 1.3 respectively)<br />

lead him to the conclusion that neither value has a signi cant e ect on the e ciency of<br />

the strategy. For n>5 he considers that a number of vertices N =2n is unnecessarily<br />

high, especially when there are no constraints.<br />

The convergence criterion appears very reliable. While Nelder <strong>and</strong> Mead require that<br />

the st<strong>and</strong>ard deviation of all objective function values at the polyhedron vertices, referred<br />

to its midpoint, must be less than a prescribed size, the complex search is only ended<br />

when several consecutive values of the objective function are the same to computational<br />

accuracy.<br />

Because of the larger number of polyhedron vertices the complex method needs even<br />

more storage space than the simplex strategy. The order of magnitude, O(n 2 ), remains<br />

the same. No investigations are known of the computational e ort in the case of many<br />

variables. Modi cations of the strategy are due to Guin (1968), Mitchell <strong>and</strong> Kaplan<br />

(1968), <strong>and</strong> Dambrauskas (1970, 1972). Guin de nes a contraction rule with which an<br />

allowed point can be generated even if the allowed region is not convex. This is not always<br />

the case in the original method because the midpointtowhichtheworst vertex is re ected<br />

is not tested for feasibility.<br />

Mitchell nds that the initial con guration of the complex in uences the results obtained.<br />

It is therefore better to place the vertices in a deterministic way rather than to<br />

make a r<strong>and</strong>om choice. Dambrauskas combines the complex method with the step length<br />

rule of the stochastic approximation. He requires that the step lengths or edge lengths of<br />

the polyhedron go to zero in the limit of an in nite number of iterations, while their sum<br />

tends to in nity. This measure may well increase the reliability of convergence however,<br />

it also increases the cost. Beveridge <strong>and</strong> Schechter (1970) describe how the iteration rules<br />

must be changed if the variables can take only discrete values. A practical application,<br />

in which a process has to be optimized dynamically, is described by Tazaki, Shindo, <strong>and</strong><br />

Umeda (1970) this is the original problem for which Spendley, Hext, <strong>and</strong> Himsworth<br />

(1962) conceived their simplex EVOP (evolutionary operation) procedure.<br />

Compared to other numerical optimization procedures the polyhedra strategies have<br />

the disadvantage that in the closing phase, near the optimum, they converge rather slowly<br />

<strong>and</strong> sometimes even stagnate. The direction of progress selected by the re ection then<br />

no longer coincides at all with the gradient direction. To remove this di culty it has<br />

been suggested that information about the topology of the objective function, as given by<br />

function values at the vertices of the polyhedron, be exploited to carry out a quadratic<br />

interpolation. Such surface tting is familiar from the related methods of test planning <strong>and</strong>


Multidimensional Strategies 65<br />

evaluation (lattice search, factorial design), in which the task is to set up mathematical<br />

models of physical or other processes. This territory is entered for example byG.E.P.Box<br />

(1957), Box <strong>and</strong> Wilson (1951), Box <strong>and</strong>Hunter (1957), Box <strong>and</strong> Behnken (1960), Box<br />

<strong>and</strong> Draper (1969, 1987), Box et al. (1973), <strong>and</strong> Beveridge <strong>and</strong> Schechter (1970). It will<br />

not be covered in any more detail here.<br />

3.2.2 Gradient Strategies<br />

The Gauss-Seidel strategy very straightforwardly uses only directions parallel to the coordinate<br />

axes to successively improve the objective function value. All other direct search<br />

methods strive toadvance more rapidly by taking steps in other directions. To doso<br />

they exploit the knowledge about the topology of the objective function gleaned from<br />

the successes <strong>and</strong> failures of previous iterations. Directions are viewed as most promising<br />

in which the objective function decreases rapidly (for minimization) or increases rapidly<br />

(for maximization). Southwell (1946), for example, improves the relaxation by choosing<br />

the coordinate directions, not cyclically, but in order of the size of the local gradient in<br />

them. If the restriction of parallel axes is removed, the local best direction is given by<br />

the (negative) gradient vector<br />

with<br />

rF (x)=(Fx1(x)Fx2(x):::Fxn(x)) T<br />

Fxi(x) = @F<br />

(x) for all i = 1(1)n<br />

@xi<br />

at the point x (0) . All hill climbing procedures that orient their choice of search directions<br />

v (0) according to the rst partial derivatives of the objective function are called gradient<br />

strategies. They can be thought of as analogues of the total step procedure of Jacobi for<br />

solving systems of linear equations (see Schwarz, Rutishauser, <strong>and</strong> Stiefel, 1968).<br />

So great is the number of methods of this type which have been suggested or applied<br />

up to the present, that merely to list them all would be di cult. The reason lies in<br />

the fact that the gradient represents a local property of a function. To follow the path<br />

of the gradient exactly would mean determining in general a curved trajectory in the ndimensional<br />

space. This problem is only approximately soluble numerically <strong>and</strong> is more<br />

di cult than the original optimization problem. With the help of analogue computers<br />

continuous gradient methods have actually been implemented (Bekey <strong>and</strong> McGhee, 1964<br />

Levine, 1964). They consider the trajectory x(t) as a function of time <strong>and</strong> obtain it as<br />

the solution of a system of rst order di erential equations.<br />

All the numerical variants of the gradient method di er in the lengths of the discrete<br />

steps <strong>and</strong> thereby also with regard to how exactly they follow the gradient trajectory.<br />

The iteration rule is generally<br />

x (k+1) = x (k) ; s (k) rF (x(k) )<br />

krF (x (k) )k<br />

It assumes that the partial derivatives everywhere exist <strong>and</strong> are unique. If F (x) is continuously<br />

di erentiable then the partial derivatives exist <strong>and</strong> F (x) iscontinuous.


66 Hill climbing Strategies<br />

A distinction is sometimes drawn between short step methods, which evaluate the gradients<br />

again after a small step in the direction rF (x (k) ) (for maximization) or ;rF (x (k) )<br />

(for minimization), <strong>and</strong> their equivalent long step methods. Since for nite step lengths<br />

s (k) it is not certain whether the new variable vector is really better than the old, after the<br />

step the value of the objective function must be tested again. Working with small steps<br />

increases the number of objective function calls <strong>and</strong> gradientevaluations. Besides F (x) n<br />

partial derivatives must be evaluated. Even if the slopes can be obtained analytically <strong>and</strong><br />

can be speci ed as functions, there is no reason to suppose that the number of computational<br />

operations per function call is much less than for the objective function itself.<br />

Except in special cases, the total cost increases roughly as the weighting factor (n +1)<br />

<strong>and</strong> the number of objective function calls. This also holds if the partial derivatives are<br />

approximated by di erential quotients obtained by means of trial steps<br />

Fxi(x) = @F(x)<br />

=<br />

@xi<br />

F (x + ei) ; F (x) 2<br />

+ O( ) for all i =1(1)n<br />

Additional di culties arise here since for values of that are too small the subtraction<br />

is subject to rounding error, while for trial steps that are too large the neglect of terms<br />

O( 2 ) leads to incorrect values. The choice of suitable deviations requires special care<br />

in all cases (Hildebr<strong>and</strong>, 1956 Curtis <strong>and</strong> Reid, 1974).<br />

Cauchy (1847), Kantorovich (1940, 1945), Levenberg (1944), <strong>and</strong> Curry (1944) are the<br />

originators of the gradient strategy, which started life as a method of solving equations<br />

<strong>and</strong> systems of equations. It is rst referred to as an aid to solving variational problems<br />

by Hadamard (1908) <strong>and</strong> Courant (1943). Whereas Cauchy works with xed step lengths<br />

s (k) , Curry tries to determine the distance covered in the (not normalized) direction<br />

v (k) = ;rF (x (k) ) so as to reach a relative minimum (see also Brown, 1959). In principle,<br />

any one of the one dimensional search methods of Section 3.1 can be called upon to nd<br />

the optimal value for s (k) :<br />

F (x (k) + s (k) v (k) ) = min<br />

s fF (x (k) + sv (k) )g<br />

This variant of the basic strategy could thus be called a longest step procedure. It is<br />

better known however under the name optimum gradient method, ormethod ofsteepest<br />

descent (for maximization, ascent). Theoretical investigations of convergence <strong>and</strong> rate<br />

of convergence of the method can be found, e.g., in Akaike (1960), Goldstein (1962),<br />

Ostrowski (1967), Forsythe (1968), Elkin (1968), Zangwill (1969), <strong>and</strong> Wolfe (1969, 1970,<br />

1971). Zangwill proves convergence based on the assumptions that the line searches are<br />

exact <strong>and</strong> the objective function is continuously twice di erentiable. Exactness of the one<br />

dimensional minimization is not, however, a necessary assumption (Wolfe, 1969). It is<br />

signi cant that one can only establish theoretically that a stationary point willbereached<br />

(rF (x) = 0) or approached (krF (x)k < "" > 0). The stationary point is a minimum,<br />

only if F (x) is convex <strong>and</strong> three times di erentiable (Akaike, 1960). Zellnik, Sondak, <strong>and</strong><br />

Davis (1962), however, show that saddle points are in practice an obstacle, only if the<br />

search is started at one, or on a straight gradient trajectory passing through one. In other<br />

cases numerical rounding errors ensure that the path to a saddle point is unstable.


Multidimensional Strategies 67<br />

The gradient strategy, however, cannot distinguish global from local minima. The<br />

optimum at which it aims depends only on the choice of the starting point for the search.<br />

The only chance of nding absolute extrema is to start su ciently often from various<br />

initial values of the variables <strong>and</strong> to iterate each timeuntil the convergence criterion is<br />

satis ed (Jacoby, Kowalik, <strong>and</strong> Pizzo, 1972). The termination rules usually recommended<br />

for gradient methods are that the absolute value of the vector<br />

or the di erence<br />

krF (x (k) )k


68 Hill climbing Strategies<br />

<strong>and</strong> those derived from previous iteration points<br />

v (k) = x (k) ; x (k;3) for k 2 even (with x (;1) = x (0) )<br />

For quadratic functions the minimum is reached after at most 2 n ; 1 line searches<br />

(Shah, Buehler, <strong>and</strong> Kempthorne, 1964). This desirable property of converging after a<br />

nite number of iterations, that is also called quadratic convergence, is only shared by<br />

strategies that apply conjugate gradients, of which thePartan methods can be regarded<br />

as forerunners (Pierre, 1969 Sorenson, 1969).<br />

In the fties, simple gradient strategies were very popular, especially the method of<br />

steepest descent. Today they are usually only to be found as components of program<br />

packages together with other hill climbing methods, e.g., in GROPE of Flood <strong>and</strong> Leon<br />

(1966), in AID of Casey <strong>and</strong> Rustay (1966), in AESOP of Hague <strong>and</strong> Glatt (1968), <strong>and</strong> in<br />

GOSPEL of Huelsman (1968). McGhee (1967) presents a detailed ow diagram. Wasscher<br />

(1963a,b) has published two ALGOL codings (see also Haubrich, 1963 Wallack, 1964<br />

Varah, 1965 Wasscher, 1965). The partial derivatives are obtained numerically. A comprehensive<br />

bibliography by Leon (1966b) names most of the older versions of strategies<br />

<strong>and</strong> gives many examples of their application. Numerical comparison tests have been carried<br />

out by Fletcher (1965), Box (1966), Leon (1966a), Colville (1968, 1970), <strong>and</strong> Kowalik<br />

<strong>and</strong> Osborne (1968). They show the superiority of rst (<strong>and</strong> second) order methods over<br />

direct search strategies for objective functions with smooth topology. Gradient methods<br />

for solving systems of di erential equations are described for example by Talkin (1964).<br />

For such problems, as well as for functional optimization problems, analogue <strong>and</strong> hybrid<br />

computers have often been applied (Rybashov, 1965a,b, 1969 Sydow, 1968 Fogarty<br />

<strong>and</strong> Howe, 1968, 1970). A literature survey on this subject has been compiled by Gilbert<br />

(1967). For the treatmentofvariational problems see Kelley (1962), Altman (1966), Miele<br />

(1969), Bryson <strong>and</strong> Ho (1969), Cea (1971), Daniel (1971), <strong>and</strong> Tolle (1971).<br />

In the experimental eld, there are considerable di culties in determining the partial<br />

derivatives. Errors in the values of the objective function can cause the predicted direction<br />

of steepest descent to lie almost perpendicular to the true gradient vector (Kowalik <strong>and</strong><br />

Osborne, 1968). Box <strong>and</strong> Wilson (1951) attempt to compensate for the perturbations<br />

by repeating the trial steps or increasing their number above the necessary minimum of<br />

(n +1). With 2 n trials, for example, a complete factorial design can be constructed<br />

(e.g., Davies, 1954). The slope in one direction is obtained by averaging the function<br />

value di erences over 2 n;1 pairs of points (Lapidus et al., 1961). Another possibility isto<br />

determine the coe cients of a linear polynomial such that the sum of the squares of the<br />

errors between measured <strong>and</strong> model function values at N n +1 points is a minimum.<br />

The linear function then represents the tangent plane of the objective function at the point<br />

under consideration. The cost of obtaining the gradients when there are many variables<br />

is too great for practical application, <strong>and</strong> only justi ed if the aim is rather to set up a<br />

mathematical model of the system than simply to perform the optimization.<br />

In the EVOP (acronym for evolutionary operation) scheme,G.E.P.Box (1957) has<br />

presented a practical simpli cation of this gradient method. It actually counts as a direct<br />

search strategy because it does not obtain the direction of the gradient but only one of a<br />

nite number of especially good directions. Spendley, Hext, <strong>and</strong> Himsworth (1962) have


Multidimensional Strategies 69<br />

devised a variant of the procedure (see also Sections 3.2.1.5 <strong>and</strong> 3.2.1.6). Lowe (1964)<br />

has gathered together the various schemes of trial steps for the EVOP strategy. The<br />

philosophy oftheEVOP strategy is treated in detail by Box <strong>and</strong> Draper (1969). Some<br />

examples of applications are given by Kenworthy (1967). The e ciency of methods of<br />

determining the gradient in the case of stochastic perturbations is dealt with by Mlynski<br />

(1964a,b, 1966a,b), Sergiyevskiy <strong>and</strong> Ter-Saakov (1970), <strong>and</strong> others.<br />

3.2.2.1 Strategy of Powell: Conjugate Directions<br />

The most important idea for overcoming the convergence di culties of the gradient strategy<br />

is due to Hestenes <strong>and</strong> Stiefel (1952), <strong>and</strong> again comes from the eld of linear algebra<br />

(see also Ginsburg, 1963 Beckman, 1967). It trades under the names conjugate directions<br />

or conjugate gradients. The directions fvi i = 1(1)ng are said to be conjugate with<br />

respect to a positive de nite n n matrix A if (Hestenes, 1956)<br />

v T<br />

i Avj =0 for all i j = 1(1)ni 6= j<br />

A further property of conjugate directions is their linear independence, i.e.,<br />

nX<br />

i=1<br />

i vi =0<br />

only holds if all the constants f i i =1(1)ng are zero. If A is replaced by the unit matrix,<br />

A = I, thenthevi are mutually orthogonal. With A = r 2 F (x) (Hessian matrix) the<br />

minimum of a quadratic function is obtained exactly in n line searches in the directions<br />

vi. This is a factor two better than the gradient Partan method. For general non-linear<br />

problems the convergence rate cannot be speci ed. As it is frequently assumed, however,<br />

that many problems behave roughly quadratically near the optimum, it seems worthwhile<br />

to use conjugate directions. The quadratic convergence of the search with conjugate<br />

directions comes about because second order properties of the objective function are<br />

taken into account. In this respect it is not, in fact, a rst order gradient method, but<br />

a second order procedure. If all the n rst <strong>and</strong> n<br />

(n +1) second partial derivatives are<br />

2<br />

available, the conjugate directions can be generated in one process corresponding to the<br />

Gram-Schmidt orthogonalization (Kowalik <strong>and</strong> Osborne, 1968). It calls for expensive<br />

matrix operations. Conjugate directions can, however, be constructed without knowledge<br />

of the second derivatives: for example, from the changes in the gradient vector in the<br />

course of the iterations (Fletcher <strong>and</strong> Reeves, 1964). Because of this implicit exploitation<br />

of second order properties, conjugate directions has been classi ed as a gradient method.<br />

The conjugate gradients method of Fletcher <strong>and</strong> Reeves consists of a sequence of line<br />

searches with Hermitian interpolation (see Sect. 3.1.2.3.4). As a rst search direction v (0)<br />

at the starting point x (0) , the simple gradient direction<br />

v (0) = ;rF (x (0) )<br />

is used. The recursion formula for the subsequent iterations is<br />

v (k) = (k) v (k;1) ;rF (x (k) ) for all k = 1(1)n (3.25)


70 Hill climbing Strategies<br />

with the correction factor<br />

(k) = rF (x (k) ) T rF (x (k) )<br />

rF (x (k;1) ) T rF (x (k;1) )<br />

For a quadratic objective function with a positive de nite Hessian matrix, conjugate<br />

directions are generated in this way <strong>and</strong> the minimum is found with n line searches. Since<br />

at any time only the last direction needs to be stored, the storage requirement increases<br />

linearly with the number of variables. This often signi es a great advantage over other<br />

strategies. In the general, non-linear, non-quadratic case more than n iterations must be<br />

carried out, for which the method of Fletcher <strong>and</strong> Reeves must be modi ed. Continued<br />

application of the recursion formula (Equation (3.25)) can lead to linear dependence of<br />

the search directions. For this reason it seems necessary to forget from time to time the<br />

accumulated information <strong>and</strong> to start afresh with the simple gradient direction (Crowder<br />

<strong>and</strong> Wolfe, 1972). Various suggestions have been made for the frequency of this restart<br />

rule (Fletcher, 1972a). Absolute reliability of convergence in the general case is still not<br />

guaranteed by this approach. If the Hessian matrix of second partial derivatives has<br />

points of singularity, then the conjugate gradient strategy can fail. The exactness of<br />

the line searches also has an important e ect on the convergence rate (Kawamura <strong>and</strong><br />

Volz, 1973). Polak (1971) de nes conditions under which the method of Fletcher <strong>and</strong><br />

Reeves achieves greater than linear convergence. Fletcher (1972c) himself has written a<br />

FORTRAN program.<br />

Other conjugate gradient methods have been proposed by Powell (1962), Polak <strong>and</strong><br />

Ribiere (1969) (see also Klessig <strong>and</strong> Polak, 1972), Hestenes (1969), <strong>and</strong> Zoutendijk (1970).<br />

Schley (1968) has published a complete FORTRAN program. Conjugate directions are<br />

also produced by theprojected gradient methods (Myers, 1968 Pearson, 1969 Sorenson,<br />

1969 Cornick <strong>and</strong> Michel, 1972) <strong>and</strong> the memory gradient methods (Miele <strong>and</strong> Cantrell,<br />

1969, 1970 see also Cantrell, 1969 Cragg <strong>and</strong> Levy, 1969 Miele, 1969 Miele, Huang,<br />

<strong>and</strong> Heidemann, 1969 Miele, Levy, <strong>and</strong> Cragg, 1971 Miele, Tietze, <strong>and</strong> Levy, 1972<br />

Miele et al., 1974). Relevant theoretical investigations have been made by, among others,<br />

Greenstadt (1967a), Daniel (1967a, 1970, 1973), Huang (1970), Beale (1972), <strong>and</strong> Cohen<br />

(1972).<br />

Conjugate gradient methods are encountered especially frequently in the elds of<br />

functional optimization <strong>and</strong> optimal control problems (Daniel, 1967b, 1971 Pagurek <strong>and</strong><br />

Woodside, 1968 Nenonen <strong>and</strong> Pagurek, 1969 Roberts <strong>and</strong> Davis, 1969 Polyak, 1969<br />

Lasdon, 1970 Kelley <strong>and</strong> Speyer, 1970 Kelley <strong>and</strong> Myers, 1971 Speyer et al., 1971<br />

Kammerer <strong>and</strong> Nashed, 1972 Szego <strong>and</strong> Treccani, 1972 Polak, 1972 McCormick <strong>and</strong><br />

Ritter, 1974). Variable metric strategies are also sometimes classi ed as conjugate gradient<br />

procedures, but more usually as quasi-Newton methods. For quadratic objective<br />

functions they generate the same sequence of points as the Fletcher-Reeves strategy <strong>and</strong><br />

its modi cations (Myers, 1968 Huang, 1970). In the non-quadratic case, however, the<br />

search directions are di erent. With the variable metric, but not with conjugate directions,<br />

Newton directions are approximated.<br />

For many practical problems it is very di cult if not impossible to specify the partial<br />

derivatives as functions. The sensitivity of most conjugate gradient methods to imprecise


Multidimensional Strategies 71<br />

speci cation of the gradient directions makes it seem inadvisable to apply nite di erence<br />

methods to approximate the slopes of the objective function. This is taken into account<br />

by some procedures that attempt to construct conjugate directions without knowledge<br />

of the derivatives. The oldest of these was devised by Smith (1962). On the basis of<br />

numerical tests by Fletcher (1965), however, the version of Powell (1964) has proved to<br />

be better. It will be brie y presented here. It is arguable whether it should be counted<br />

as a gradient strategy. Its intermediate position between direct search methods that only<br />

use function values, <strong>and</strong> Newton methods that make use of second order properties of the<br />

objective function (if only implicitly), nevertheless makes it come close to this category.<br />

The strategy of conjugate directions is based on the observation that a line through the<br />

minimum of a quadratic objective function cuts all contours at the same angle. Powell's<br />

idea is then to construct such special directions by a sequence of line searches. The<br />

unit vectors are taken as initial directions for the rst n line searches. After these, a<br />

minimization is carried out in the direction of the overall result. Then the rst of the old<br />

direction vectors is eliminated, the indices of the remainder are reduced by one <strong>and</strong> the<br />

direction that was generated <strong>and</strong> used last is put in the place freed by the nth vector. As<br />

shown by Powell, after n cycles, each ofn +1 line searches, a set of conjugate directions<br />

is obtained provided the objective function is quadratic <strong>and</strong> the line searches are carried<br />

out exactly.<br />

Zangwill (1967) indicates how this scheme might fail. If no success is obtained in<br />

one of the search directions, i.e., the distance covered becomes zero, then the direction<br />

vectors are linearly dependent <strong>and</strong> no longer span the complete parameter space. The<br />

same phenomenon can be provoked by computational inaccuracy. To prevent this, Powell<br />

has modi ed the basic algorithm. First of all, he designs the scheme of exchanging<br />

directions to be more exible, actually by maximizing the determinant of the normalized<br />

direction vectors. It can be shown that, assuming a quadratic objective function, it is<br />

most favorable to eliminate the direction in which the largest distance was covered (see<br />

Dixon, 1972a). Powell would also sometimes leave the set of directions unchanged. This<br />

depends on how thevalue of the determinant would change under exchange of the search<br />

directions. The objective function is here tested at the position given by doubling the<br />

distance covered in the cycle just completed. Powell makes the termination of the search<br />

depend on all variables having changed by less than 0:1 " within an iteration, where "<br />

represents the required accuracy. Besides this rst convergence criterion, he o ers a second,<br />

stricter one, according to which the state reached at the end of the normal procedure<br />

is slightly displaced <strong>and</strong> the minimization repeated until the termination conditions are<br />

again ful lled. This is followed by a line search in the direction of the di erence vector<br />

between the last two endpoints. The optimization is only nally ended when the result<br />

agrees with those previously obtained to within the allowed deviation of 0:1 " for each<br />

component.<br />

The algorithm of Powell runs as follows:<br />

Step 0: (Initialization)<br />

Specify a starting point x (0)<br />

<strong>and</strong> accuracy requirements "i > 0 for all i =1(1)n.


72 Hill climbing Strategies<br />

Step 1: (Specify rst set of directions)<br />

Set v (0)<br />

i = ei for all i = 1(1)n<br />

<strong>and</strong> set k =0.<br />

Step 2: (Start outer loop)<br />

Set x (k0) = x (k) <strong>and</strong> i =1.<br />

Step 3: (Line search)<br />

Determine x (ki) such that<br />

F (x (ki) )=minfF<br />

(x<br />

s (ki;1) + sv (k)<br />

i )g:<br />

Step 4: (Inner loop)<br />

If i


Multidimensional Strategies 73<br />

otherwise set x (0) = y (3) v (0)<br />

1<br />

v (0)<br />

i<br />

= v (k)<br />

i<br />

= y (3) ; y (1) <br />

for i = 2(1)n, k = 0, <strong>and</strong> go to step 2.<br />

Figure 3.9 illustrates a few iterations for a hypothetical two parameter function. Each<br />

of the rst loops consists of n +1 = 3 line searches <strong>and</strong> leads to the adoption of a new<br />

search direction. If the objective function had been of second order, the minimum would<br />

certainly have been found by the last line search of the second loop. In the third <strong>and</strong><br />

fourth loops it has been assumed that the trial steps have led to a decision not to exchange<br />

directions, thus the old direction vectors, numbered v 3 <strong>and</strong> v 4 are retained. Further loops,<br />

e.g., according to step 9, are omitted.<br />

The quality of the line searches has a strong in uence on the construction of the<br />

conjugate directions. Powell uses a sequence of Lagrangian quadratic interpolations. It is<br />

terminated as soon as the required accuracy is reached. For the rst minimization within<br />

an iteration three points <strong>and</strong> Equation (3.16) are used. The argument values taken in<br />

direction vi are: x (the starting point), x + si vi, <strong>and</strong> either x +2si vi or x ; si vi,<br />

according to whether F (x + si vi)


74 Hill climbing Strategies<br />

i = @2<br />

@s 2(P (x + svi)) = ;2 (b ; c) Fa +(c ; a) Fb +(a ; b) Fc<br />

(b ; c)(c ; a)(a ; b)<br />

Powell uses this quantity i for all subsequent interpolations in the direction vi as a scale<br />

for the second partial derivative of the objective function. He scales the directions vi,<br />

which in his case are not normalized, by1= p i. This allows the possibility of subsequently<br />

carrying out a simpli ed interpolation with only two argument values, x <strong>and</strong> x + si vi.<br />

It is a worthwhile procedure, since each direction is used several times. The predicted<br />

minimum, assuming that the second partial derivatives have value unity, is then<br />

x 0 = x + 1<br />

2 si ; 1<br />

si<br />

[F (x + si vi) ; F (x)] vi<br />

For the trial step lengths si, Powell uses the empirical recursion formula<br />

s (k)<br />

q<br />

(k) (k;1)<br />

i =0:4 F (x ) ; F (x )<br />

Because of the scaling, all the step lengths actually become the same. A more detailed<br />

justi cation can be found in Ho mann <strong>and</strong> Hofmann (1970).<br />

Contrary to most other optimization procedures, Powell's strategy is available as a<br />

precise algorithm in a tested code (Powell, 1970f). As Fletcher (1965) reports, this method<br />

of conjugate directions is superior for the case of a few variables both to the DSC method<br />

<strong>and</strong> to a strategy of Smith, especially in the neighborhood of minima. For manyvariables,<br />

however, the strict criterion for adopting a new direction more frequently causes the old<br />

set of directions to be retained <strong>and</strong> the procedure then converges slowly. A problem which<br />

had a singular Hessian matrix at the minimum made the DSC strategy look better. In<br />

a later article, Fletcher (1972a) de nes a limit of n = 10 to 20, above whichthePowell<br />

strategy should no longer be applied. This is con rmed by the test results presented<br />

in Chapter 6. Zangwill (1967) combines the basic idea of Powell with relaxation steps<br />

in order to avoid linear dependence of the search directions. Some results of Rhead<br />

(1971) lead to the conclusion that Powell's improved concept is superior to Zangwill's.<br />

Brent (1973) also presents a variant of the strategy without derivatives, derived from<br />

Powell's basic idea, which is designed to prevent the occurrence of linear dependence of<br />

the search directions without endangering the quadratic convergence. After every n +1<br />

iterations the set of directions is replaced by an orthogonal set of vectors. So as not to<br />

lose all the information, however, the unit vectors are not chosen. For quadratic objective<br />

functions the new directions remain conjugate to each other. This procedure requires<br />

O(n 3 ) computational operations to determine orthogonal eigenvectors. As, however, they<br />

are only performed every O(n 2 ) line searches, the extra cost is O(n) per function call <strong>and</strong><br />

is thus of the same order as the cost of evaluating the objective function itself. Results of<br />

tests by Brent con rm the usefulness of the strategy.<br />

3.2.3 Newton Strategies<br />

Newton strategies exploit the fact that, if a function can be di erentiated any number of<br />

times, its value at the point x (k+1) can be represented by a series of terms constructed at


Multidimensional Strategies 75<br />

another point x (k) :<br />

where<br />

F (x (k+1) )=F (x (k) )+h T rF (x (k) )+ 1<br />

2 hT r 2 F (x (k) ) h + ::: (3.26)<br />

h = x (k+1) ; x (k)<br />

In this Taylor series, as it is called, all the terms of higher than second order are<br />

zero if F (x) is quadratic. Di erentiating Equation (3.26) with respect to h <strong>and</strong> setting<br />

the derivative equal to zero, one obtains a condition for the stationary points of a second<br />

order function:<br />

or<br />

rF (x (k+1) )=rF (x (k) )+r 2 F (x (k) )(x (k+1) ; x (k) )=0<br />

x (k+1) = x (k) ; [r 2 F (x (k) )] ;1 rF (x (k) ) (3.27)<br />

If F (x) is quadratic <strong>and</strong> r 2 F (x (0) )ispositive-de nite, Equation (3.27) yields the<br />

solution x (1) in a single step from any starting point x (0) without needing a line search. If<br />

Equation (3.27) is taken as the iteration rule in the general case it represents the extension<br />

of the Newton-Raphson method to functions of several variables (Householder, 1953). It<br />

is also sometimes called a second order gradient method with the choice of direction <strong>and</strong><br />

step length (Crockett <strong>and</strong> Cherno , 1955)<br />

v (k) = ;[r 2 F (x (k) )] ;1 rF (x (k) )<br />

s (k) = 1 (3.28)<br />

The real length of the iteration step is hidden in the non-normalized Newton direction<br />

v (k) . Since no explicit value of the objective function is required, but only its derivatives,<br />

the Newton-Raphson strategy is classi ed as an indirect or analytic optimization method.<br />

Its ability to predict the minimum of a quadratic function in a single calculation at rst<br />

sight looksvery attractive. This single step, however, requires a considerable e ort. Apart<br />

from the necessityofevaluating n rst <strong>and</strong> n(n<br />

+1) second partial derivatives, the Hessian<br />

2<br />

matrix r 2 F (x (k) )must be inverted. This corresponds to the problem of solving a system<br />

of linear equations<br />

r 2 F (x (k) ) 4 x (k) = ;rF (x (k) ) (3.29)<br />

for the unknown quantities 4x (k) . All the st<strong>and</strong>ard methods of linear algebra, e.g., Gaussian<br />

elimination (Brown <strong>and</strong> Dennis, 1968 Brown, 1969) <strong>and</strong> the matrix decomposition<br />

method of Cholesky (Wilkinson, 1965), need O(n 3 ) computational operations for this<br />

(see Schwarz, Rutishauser, <strong>and</strong> Stiefel, 1968). For the same cost, the strategies of conjugate<br />

directions <strong>and</strong> conjugate gradients can execute O(n) steps. Thus, in principle, the<br />

Newton-Raphson iteration o ers no advantage in the quadratic case.<br />

If the objective function is not quadratic, then<br />

v (0) does not in general point towards a minimum. The iteration rule (Equation<br />

(3.27)) must be applied repeatedly.


76 Hill climbing Strategies<br />

s (k) = 1 may lead to a point withaworse value of the objective function. The<br />

search diverges, e.g., when r 2 F (x (k) ) is not positive-de nite.<br />

It can happen that r 2 F (x (k) ) is singular or almost singular. The Hessian matrix<br />

cannot be inverted.<br />

Furthermore, it depends on the starting point x (0) whether a minimum, a maximum,<br />

or a saddle point is approached, or the whole iteration diverges. The strategy itself does<br />

not distinguish the stationary points with regard to type.<br />

If the method does converge, then the convergence is better than of linear order<br />

(Goldstein, 1965). Under certain, very strict conditions on the structure of the objective<br />

function <strong>and</strong> its derivatives even second order convergence can be achieved (e.g., Polak,<br />

1971) that is, the number of exact signi cant gures in the approximation to the minimum<br />

solution doubles from iteration to iteration. This phenomenon is exhibited in the solution<br />

of some test problems, particularly in the neighborhood of the desired extremum.<br />

All the variations of the basic procedure to be described are aimed at increasing the<br />

reliability of the Newton iteration, without sacri cing the high convergence rate. A distinction<br />

is made here between quasi-Newton strategies, which donotevaluate the Hessian<br />

matrix explicitly, <strong>and</strong>modi ed Newton methods, for which rst <strong>and</strong> second derivatives<br />

must be provided at each point. The only strategy presently known which makes use of<br />

higher than second order properties of the objective function is due to Biggs (1971, 1973).<br />

The simplest modi cation of the Newton-Raphson scheme consists of determining the<br />

step length s (k) by a line search in the Newton direction v (k) (Equation (3.28)) until the<br />

relative optimum is reached (e.g., Dixon, 1972a):<br />

F (x (k) + s (k) v (k) ) = min<br />

s fF (x (k) + sv (k) )g (3.30)<br />

To save computational operations, the second partial derivatives can be redetermined<br />

less frequently <strong>and</strong> used for several iterations. Care must always be taken, however, that<br />

v (k) always points \downhill," i.e., the angle between v (k) <strong>and</strong> ;rF (x (k) ) is less than<br />

90 0 . The Hessian matrix must also be positive-de nite. If the eigenvalues of the matrix<br />

are calculated when it is inverted, their signs show whether this condition is ful lled.<br />

If a negative eigenvalue appears, Pearson (1969) suggests proceeding in the direction of<br />

the associated eigenvector until a point isreached with positive-de nite r 2 F (x). Greenstadt<br />

(1967a) simply replaces negative eigenvalues by their absolute value <strong>and</strong> vanishing<br />

eigenvalues by unity. Other proposals have been made to keep the Hessian matrix positivede<br />

nite by addition of a correction matrix (Goldfeld, Qu<strong>and</strong>t, <strong>and</strong> Trotter, 1966, 1968<br />

Shanno, 1970a) or to include simple gradient steps in the iteration scheme (Dixon <strong>and</strong><br />

Biggs, 1972). Further modi cations, which operate on the matrix inversion procedure<br />

itself, have beensuggestedby Goldstein <strong>and</strong> Price (1967), Fiacco <strong>and</strong> McCormick (1968),<br />

<strong>and</strong> Matthews <strong>and</strong> Davies (1971). A good survey has been given by Murray (1972b).<br />

Very few algorithms exist that determine the rst <strong>and</strong> second partial derivatives numerically<br />

from trial step operations (Whitley, 1962 see also Wasscher, 1963c Wegge,<br />

1966). The inevitable approximation errors too easily cancel out the advantages of the<br />

Newton directions.


Multidimensional Strategies 77<br />

3.2.3.1 DFP: Davidon-Fletcher-Powell Method<br />

(Quasi-Newton Strategy, Variable Metric Strategy)<br />

Much greater interest has been shown for a group of second order gradient methods that<br />

attempt to approximate the Hessian matrix <strong>and</strong> its inverse during the iterations only from<br />

rst order data. This now extensive class of quasi-Newton strategies has grown out of<br />

the work of Davidon (1959). Fletcher <strong>and</strong> Powell (1963) improved <strong>and</strong> translated it into<br />

a practical procedure. The Davidon-Fletcher-Powell or DFP method <strong>and</strong> some variants<br />

of it are also known as variable metric strategies. They are sometimes also regarded<br />

as conjugate gradient methods, because in the quadratic case they generate conjugate<br />

directions. For higher order objective functions this is no longer so. Whereas the variable<br />

metric concept is to approximate Newton directions, this is not the case for conjugate<br />

gradient methods. The basic recursion formula for the DFP method is<br />

with<br />

<strong>and</strong><br />

x (k+1) = x (k) + s (k) v (k)<br />

v (k) = ;H (k)T rF (x (k) )<br />

H (0) = I<br />

H (k+1) = H (k) + A (k)<br />

The correction A (k) to the approximation for the inverse Hessian matrix, H (k) ,is<br />

derived from information collected during the last iteration thus from the change in the<br />

variable vector<br />

y (k) = x (k+1) ; x (k) = s (k) v (k)<br />

<strong>and</strong> the change in the gradient vector<br />

it is given by<br />

z (k) = rF (x (k+1) ) ;rF (x (k) )<br />

A (k) = y(k) y (k)T<br />

y (k)T z (k) ; H (k) z (k) (H (k) z (k) ) T<br />

z (k)T H (k) z (k)<br />

(3.31)<br />

The step length s (k) is obtained by a line search along v (k) (Equation (3.30)). Since<br />

the rst partial derivatives are needed in any case they can be made use of in the one<br />

dimensional minimization. Fletcher <strong>and</strong> Powell do so in the context of a cubic Hermitian<br />

interpolation (see Sect. 3.1.2.3.4). A corresponding ALGOL program has been<br />

published by Wells (1965) (for corrections see Fletcher, 1966 Hamilton <strong>and</strong> Boothroyd,<br />

1969 House, 1971). The rst derivatives must be speci ed as functions, which is usually<br />

inconvenient <strong>and</strong> often impossible. The convergence properties of the DFP method have<br />

been thoroughly investigated, e.g., by Broyden (1970b,c), Adachi (1971), Polak (1971),<br />

<strong>and</strong> Powell (1971, 1972a,b,c). Numerous suggestions have thereby been made for improvements.<br />

Convergence is achieved if F (x) isconvex. Under stricter conditions it can<br />

be proved that the convergence rate is greater than linear <strong>and</strong> the sequence of iterations


78 Hill climbing Strategies<br />

converges quadratically, i.e., after a nite number (maximum n) of steps the minimum of<br />

a quadratic function is located. Myers (1968) <strong>and</strong> Huang (1970) show that, if the same<br />

starting point ischosen <strong>and</strong> the objective function is of second order, the DFP algorithm<br />

generates the same iteration points as the conjugate gradient method of Fletcher <strong>and</strong><br />

Reeves.<br />

All these observations are based on the assumption that the computational operations,<br />

including the line searches, are carried out exactly. Then H (k) always remains positivede<br />

nite if H (0) was positive-de nite <strong>and</strong> the minimum search is stable, i.e., the objective<br />

function is improved at each iteration. Numerical tests (e.g., Pearson, 1969 Tabak,<br />

1969 Huang <strong>and</strong> Levy, 1970 Murtagh <strong>and</strong> Sargent, 1970 Himmelblau, 1972a,b), <strong>and</strong><br />

theoretical considerations (Bard, 1968 Dixon, 1972b) show that rounding errors <strong>and</strong><br />

especially inaccuracies in the one dimensional minimization frequently cause stability<br />

problems the matrix H (k) can easily lose its positive-de niteness without this being due<br />

to a singularity intheinverse Hessian matrix. The simplest remedy for a singular matrix<br />

H (k) , or one of reduced rank, is to forget from time to time all the experience stored within<br />

H (k) <strong>and</strong> to begin again with the unit matrix <strong>and</strong> simple gradient directions (Bard, 1968<br />

McCormick <strong>and</strong> Pearson, 1969). To do so certainly increases the number of necessary<br />

iterations, but in optimization as in other activities it is wise to put safety before speed.<br />

Stewart (1967) makes use of this procedure. His algorithm is of very great practical<br />

interest since he obtains the required information about the rst partial derivatives from<br />

function values alone by means of a cleverly constructed di erence scheme.<br />

3.2.3.2 Strategy of Stewart:<br />

Derivative-free Variable Metric Method<br />

Stewart (1967) focuses his attentiononchoosing the length of the trial step d (k)<br />

i<br />

approximation<br />

g (k)<br />

i ' Fx i x (k) = @F(x)<br />

@xi x (k)<br />

for the<br />

to the rst partial derivatives in such away astominimize the in uence of rounding<br />

errors on the actual iteration process. Two di erence schemes are available:<br />

<strong>and</strong><br />

g (k)<br />

i<br />

g (k)<br />

i = 1<br />

d (k)<br />

i<br />

=<br />

1<br />

2 d (k)<br />

i<br />

h F (x (k) + d (k)<br />

i ei) ; F (x (k) ) i<br />

h F (x (k) + d (k)<br />

i ei) ; F (x (k) ; d (k)<br />

i ei) i<br />

(forward di erence)<br />

(central di erence) (3.32)<br />

Application of the one sided (forward) di erence (Equation (3.32)) is preferred, since<br />

it only involves one extra function evaluation. To simplify the computation, Stewart<br />

introduces the vector h (k) ,whichcontains the diagonal elements of the matrix (H (k) ) ;1<br />

representing information about the curvature of the objective function in the coordinate<br />

directions.<br />

The algorithm for determining the g (k)<br />

i<br />

i = 1(1)n runs as follows:


Multidimensional Strategies 79<br />

Step 0:<br />

Step 1:<br />

Step 2:<br />

(<br />

Set =max"b<br />

"c<br />

jjx (k)<br />

i j<br />

F (x (k) )<br />

jg (k;1)<br />

i<br />

)<br />

("b represents an estimate of the error in the calculation of F (x). Stewart sets<br />

"b =10 ;10 <strong>and</strong> "c =5 10 ;13 :)<br />

If g (k;1)<br />

i<br />

2<br />

de ne 0 i =2<br />

otherwise de ne<br />

0<br />

vu<br />

u<br />

i =2 3<br />

Set d 0 (k)<br />

<strong>and</strong> d (k)<br />

i<br />

If h(k;1) i<br />

h (k;1)<br />

i F (x (k) ) <br />

v uuut F (x (k) )<br />

h (k;1)<br />

i<br />

t F (x(k) ) g (k;1)<br />

i<br />

0<br />

<strong>and</strong> i = 0 @1 i ;<br />

(h (k;1)<br />

i ) 2 <strong>and</strong> i = 0 i<br />

i = i sign(h (k;1)<br />

i<br />

=<br />

d (k)<br />

i<br />

2 g (k;1)<br />

i<br />

( 0<br />

(k)<br />

di if d 0 (k)<br />

i<br />

replace d (k)<br />

i 1<br />

h (k;1)<br />

i<br />

) sign(g (k;1)<br />

)<br />

6= 0<br />

d (k;1)<br />

i if d 0 (k)<br />

i =0:<br />

i<br />

0<br />

@1 ;<br />

3 0 i h (k;1)<br />

i<br />

3 0 i h (k;1)<br />

i<br />

10 ; , use Equation (3.32) otherwise<br />

; g (k;1)<br />

i<br />

<strong>and</strong> use Equation (3.33). (Stewart chooses =2.)<br />

Stewart's main algorithm takes the following form:<br />

+<br />

0<br />

i h (k;1)<br />

i<br />

2 g (k;1)<br />

i<br />

+4 g (k;1)<br />

i<br />

+4 g (k;1)<br />

i<br />

1<br />

A :<br />

1<br />

A <br />

r<br />

(g (k;1)<br />

i ) 2 +2 10 F (x (k) ) h (k;1)<br />

i<br />

Step 0: (Initialization)<br />

Choose an initial value x (0) , accuracy requirements "ai > 0i = 1(1)n, <strong>and</strong><br />

initial step lengths d (0)<br />

for the gradient determination, e.g.,<br />

d (0)<br />

i<br />

=<br />

8<br />

<<br />

: 0:05 x(0)<br />

i if x (0)<br />

i<br />

0:05 if x (0)<br />

i<br />

i<br />

6= 0<br />

=0:<br />

Calculate the vector g (0) from Equation (3.32) using the step lengths d (0)<br />

i :<br />

Set H (0) = Ih (0)<br />

i =1foralli = 1(1)n <strong>and</strong> k =0:<br />

Step 1: (Prepare for line search)<br />

Determine v (k) = ;H (k) g (k) :<br />

If k = 0, go to step 3.<br />

If g (k)T v (k) < 0, go to step 3.<br />

If h (k)<br />

i > 0 for all i = 1(1)n, go to step 3.


80 Hill climbing Strategies<br />

Step 2: (Forget second order information)<br />

Replace H (k) H (0) = I<br />

h (k)<br />

i h (0)<br />

i<br />

<strong>and</strong> v (k) ;g (k) :<br />

=1foralli = 1(1)n<br />

Step 3: (Line search <strong>and</strong> eventual break-o )<br />

Determine x (k+1) such that<br />

F (x (k+1) ) = min<br />

s fF (x (k) + sv (k) )g:<br />

If F (x (k+1) ) F (x (k) ), end the search with result x (k) <strong>and</strong> F (x (k) ).<br />

Step 4: (Prepare for inverse Hessian update)<br />

Determine g (k+1) by the above di erence scheme.<br />

Construct y (k) = x (k+1) ; x (k) <strong>and</strong> z (k) = g (k+1) ; g (k) :<br />

If k>n<strong>and</strong> v (k)<br />

i


Multidimensional Strategies 81<br />

Brown <strong>and</strong> Dennis (1972) <strong>and</strong> Gill <strong>and</strong> Murray (1972)have suggested other schemes<br />

for obtaining the partial derivatives numerically from values of the objective function.<br />

Stewart himself reports tests that show the usefulness of his rules insofar as the results are<br />

completely comparable to others obtained with the help of analytically speci ed derivatives.<br />

This may be simply because rounding errors are in any case more signi cant here,<br />

due to the matrix operations, than for example in conjugate gradient methods. Kelley<br />

<strong>and</strong> Myers (1971), therefore, recommend carrying out the matrix operations with double<br />

precision.<br />

3.2.3.3 Further Extensions<br />

The ability of the quasi-Newton strategy of Davidon, Fletcher, <strong>and</strong> Powell (DFP) to<br />

construct Newton directions without needing explicit second partial derivatives makes<br />

it very attractive from a computational point of view. All e orts in the further rapid<br />

<strong>and</strong> intensive development of the concept have been directed to modifying the correction<br />

Equation (3.31) so as to reduce the tendency to instability because of rounding errors<br />

<strong>and</strong> inexact line searches while retaining as far as possible the quadratic convergence.<br />

There has been a spate of corresponding proposals <strong>and</strong> both theoretical <strong>and</strong> experimental<br />

investigations on the subject up to about 1973, for example:<br />

Adachi (1973a,b)<br />

Bass (1972)<br />

Broyden (1967, 1970a,b,c, 1972)<br />

Broyden, Dennis, <strong>and</strong> More (1973)<br />

Broyden <strong>and</strong> Johnson (1972)<br />

Davidon (1968, 1969)<br />

Dennis (1970)<br />

Dixon (1972a,b,c, 1973)<br />

Fiacco <strong>and</strong> McCormick (1968)<br />

Fletcher (1969a,b, 1970b, 1972b,d)<br />

Gill <strong>and</strong> Murray (1972)<br />

Goldfarb (1969, 1970)<br />

Goldstein <strong>and</strong> Price (1967)<br />

Greenstadt (1970)<br />

Hestenes (1969)<br />

Himmelblau (1972a,b)<br />

Hoshino (1971)<br />

Huang (1970, 1974)<br />

Huang <strong>and</strong> Chambliss (1973, 1974)<br />

Huang <strong>and</strong> Levy (1970)<br />

Jones (1973)<br />

Lootsma (1972a,b)<br />

Mamen <strong>and</strong> Mayne (1972)<br />

Matthews <strong>and</strong> Davies (1971)<br />

McCormick <strong>and</strong>Pearson (1969)


82 Hill climbing Strategies<br />

McCormick <strong>and</strong> Ritter (1972, 1974)<br />

Murray (1972a,b)<br />

Murtagh (1970)<br />

Murtagh <strong>and</strong> Sargent (1970)<br />

Oi, Sayama, <strong>and</strong> Takamatsu (1973)<br />

Oren (1973)<br />

Ortega <strong>and</strong> Rheinboldt (1972)<br />

Pierson <strong>and</strong> Rajtora (1970)<br />

Powell (1969, 1970a,b,c,g, 1971, 1972a,b,c,d)<br />

Rauch (1973)<br />

Ribiere (1970)<br />

Sargent <strong>and</strong> Sebastian (1972, 1973)<br />

Shanno <strong>and</strong> Kettler (1970a,b)<br />

Spedicato (1973)<br />

Tabak (1969)<br />

Tokumaru, Adachi, <strong>and</strong> Goto (1970)<br />

Werner (1974)<br />

Wolfe (1967, 1969, 1971)<br />

Many of the di erently sophisticated strategies, e.g., the classes or families of similar<br />

methods de ned by Broyden (1970b,c) <strong>and</strong> Huang (1970), are theoretically equivalent.<br />

They generate the same conjugate directions v (k) <strong>and</strong>, with an exact line search, the same<br />

sequence x (k) of iteration points if F (x) is quadratic. Dixon (1972c) even proves this<br />

identity for more general objective functions under the condition that no term of the<br />

sequence H (k) is singular.<br />

The important nding that under certain assumptions convergence can also be achieved<br />

without line searches is attributed to Wolfe (1967). A recursion formula satisfying these<br />

conditions is as follows:<br />

H (k+1) = H (k) + B (k)<br />

where<br />

B (k) = (y(k) ; H (k) z (k) )(y (k) ; H (k) z (k) ) T<br />

(y (k) ; H (k) z (k) ) z (k)T<br />

(3.33)<br />

The formula was proposed independently by Broyden (1967), Davidon (1968, 1969), Pearson<br />

(1969), <strong>and</strong> Murtagh <strong>and</strong> Sargent (1970) (see Powell, 1970a). The correction matrix<br />

B (k) has rank one, while A (k) in Equation (3.31) is of rank two. Rank one methods, also<br />

called variance methods byDavidon, cannot guarantee that H (k) remains positive-de nite.<br />

It can also happen, even in the quadratic case, that H (k) becomes singular or B (k) increases<br />

without bound. Hence in order to make methods of this type useful in practice<br />

anumber of additional precautions must be taken (Powell, 1970a Murray, 1972c). The<br />

following compromise proposal<br />

H (k+1) = H (k) + A (k) + (k) B (k)<br />

(3.34)<br />

where the scalar parameter (k) > 0 can be freely chosen, is intended to exploit the advantages<br />

of both concepts while avoiding their disadvantages (e.g., Fletcher, 1970b). Broyden


Multidimensional Strategies 83<br />

(1970b,c), Shanno (1970a,b), <strong>and</strong> Shanno <strong>and</strong> Kettler (1970) give criteria for choosing suitable<br />

(k) .However, the mixed correction, also known as BFS or Broyden-Fletcher-Shanno<br />

formula, cannot guarantee quadratic convergence either unless line searches are carried<br />

out. It can be proved that there will merely be a monotonic decrease in the eigenvalues of<br />

the matrix H (k) .From numerical tests, however, it turns out that the increased number<br />

of iterations is usually more than compensated for by thesaving in function calls made<br />

by dropping the one dimensional optimizations (Fletcher, 1970a). Fielding (1970) has designed<br />

an ALGOL program following Broyden's work with line searches (Broyden, 1965).<br />

With regard to the number of function calls it is usually inferior to the DFP method but<br />

it sometimes also converges where the variable metric method fails. Dixon (1973) de nes<br />

a correction to the chosen directions,<br />

where<br />

<strong>and</strong><br />

v (k) = ;H (k) rF (x (k) )+w (k)<br />

w (0) =0<br />

w (k+1) = w (k) + (x(k+1) ; x (k) ) T rF (x (k+1) )<br />

(x (k+1) ; x (k) ) T z (k)<br />

(x (k+1) ; x (k) )<br />

by which, together with a matrix correction as given by Equation (3.35), quadratic convergence<br />

can be achieved without line searches. He shows that at most n +2 function<br />

calls <strong>and</strong> gradient calculations are required each time if, after arriving at v (k) = 0, an<br />

iteration<br />

x (k+1) = x (k) ; H (k) rF (x (k) )<br />

is included. Nearly all the procedures de ned assume that at least the rst partial derivatives<br />

are speci ed as functions of the variables <strong>and</strong> are therefore exact to the signi cant<br />

gure accuracy of the computer used. The more costly matrix computations should<br />

wherever possible be executed with double precision in order to keep down the e ect of<br />

rounding errors.<br />

Just two more suggestions for derivative-free quasi-Newton methods will be mentioned<br />

here: those of Greenstadt (1972) <strong>and</strong> of Cullum (1972). While Cullum's algorithm, like<br />

Stewart's, approximates the gradient vector by function value di erences, Greenstadt attempts<br />

to get away from this. Analogously to Davidon's idea of approximating the Hessian<br />

matrix during the course of the iterations from knowledge of the gradients, Greenstadt<br />

proposes approximating the gradients by using information from objective function values<br />

over several subiterations. Only at the starting point must a di erence scheme for the rst<br />

partial derivatives be applied. Another interesting variable metric technique described by<br />

Elliott <strong>and</strong> Sworder (1969a,b, 1970) combines the concept of the stochastic approximation<br />

for the sequence of step lengths with the direction algorithms of the quasi-Newton<br />

strategy.<br />

Quasi-Newton strategies of degree one are especially suitable if the objective function<br />

is a sum of squares (Bard, 1970). Problems of minimizing a sum of squares arise<br />

for example from the problem of solving systems of simultaneous, non-linear equations,


84 Hill climbing Strategies<br />

or determining the parameters of a mathematical model from experimental data (nonlinear<br />

regression <strong>and</strong> curve tting). Such objective functions are easier to h<strong>and</strong>le because<br />

Newton directions can be constructed straight away without second partial derivatives,<br />

as long as the Jacobian matrix of rst derivatives of each term of the objective function<br />

is given. The oldest iteration procedure constructed on this basis is variously known as<br />

the Gauss-Newton (Gauss, 1809) method, generalized least squares method, or Taylor<br />

series method. It has all the advantages <strong>and</strong> disadvantages of the Newton-Raphson strategy.<br />

Improvements on the basic procedure are given by Levenberg (1944) <strong>and</strong> Marquardt<br />

(1963). Wolfe's secant method (Wolfe, 1959b see also Jeeves, 1958) is the forerunner of<br />

many variants which do not require the Jacobian matrix to be speci ed at the start but<br />

construct it in the course of the iterations. Further details will not be described here the<br />

reader is referred to the specialist literature, again up to 1973:<br />

Barnes, J.G.P. (1965)<br />

Bauer, F.L. (1965)<br />

Beale (1970)<br />

Brown <strong>and</strong> Dennis (1972)<br />

Broyden (1965, 1969, 1971)<br />

Davies <strong>and</strong> Whitting (1972)<br />

Dennis (1971, 1972)<br />

Fletcher (1968, 1971)<br />

Golub (1965)<br />

Jarratt (1970)<br />

Jones (1970)<br />

Kowalik <strong>and</strong> Osborne (1968)<br />

Morrison (1968)<br />

Ortega <strong>and</strong> Rheinboldt (1970)<br />

Osborne (1972)<br />

Peckham (1970)<br />

Powell (1965, 1966, 1968b, 1970d,e, 1972a)<br />

Powell <strong>and</strong> MacDonald (1972)<br />

Rabinowitz (1970)<br />

Ross (1971)<br />

Smith <strong>and</strong> Shanno (1971)<br />

Spath (1967) (see also Silverman, 1969)<br />

Stewart (1973)<br />

Vitale <strong>and</strong> Taylor (1968)<br />

Zeleznik (1968)<br />

Brent (1973) gives further references. Peckham's strategy is perhaps of particular<br />

interest. It represents a modi cation of the simplex method of Nelder <strong>and</strong> Mead (1965)<br />

<strong>and</strong> Spendley (1969) <strong>and</strong> in tests it proves to be superior to Powell's strategy (1965)<br />

with regard to the number of function calls. It should be mentioned here at least that<br />

non-linear regression, where parameters of a model that enter the model in a non-linear<br />

way (e.g., as exponents) have to be estimated, in general requires a global optimization


Multidimensional Strategies 85<br />

method because the squared sum of residuals de nes a multimodal function.<br />

Reference has been made to a number of publications in this <strong>and</strong> the preceding chapter<br />

in which strategies are described that can hardly be called genuine hill climbing methods<br />

they would fall more naturally under the headings of mathematical programming or<br />

functional optimization. It was not, however, the intention to give anintroduction to the<br />

basic principles of these two very wide subjects. The interested reader will easily nd out<br />

that although a nearly exponentially increasing number of new books <strong>and</strong> journals have<br />

become available during the last three decades, she or he will look in vain for new direct<br />

search strategies in that realm. Such methods form the core of this book.


86 Hill climbing Strategies


Chapter 4<br />

R<strong>and</strong>om Strategies<br />

One group of optimization methods has been completely ignored in Chapter 3: methods in<br />

which the parameters are varied according to probabilistic instead of deterministic rules<br />

even the methods of stochastic approximation are deterministic. As indicated by the title<br />

there is not one r<strong>and</strong>om strategy but many, someofwhich di er considerably from each<br />

other.<br />

It is common to resort to r<strong>and</strong>om decisions in optimization whenever deterministic<br />

rules do not have the desired success, or lead to a dead end on the other h<strong>and</strong> r<strong>and</strong>om<br />

strategies are often supposed to be essentially more costly. The opinion is widely held that<br />

with careful thought leading to cleverly constructed deterministic rules, better results can<br />

always be achieved than with decisions that are in some way made r<strong>and</strong>omly. The strategies<br />

that follow should show that r<strong>and</strong>omness is not, however, the same as arbitrariness,<br />

but can also be made to obey very re ned rules. Sometimes only this kind of method<br />

solves a problem e ectively.<br />

Profound considerations do not underlie all the procedures used in hill climbing strategies.<br />

The cyclic choice of coordinate directions in the Gauss-Seidel strategy could just as<br />

well be replaced by a r<strong>and</strong>om sequence. One can also consider increasing the number of<br />

directions used. Since there is no good reason for preferring to search for the optimum<br />

along directions parallel to the axes, one could also use, instead of only n di erent unit<br />

vectors, any numberofr<strong>and</strong>omlychosen direction vectors. In fact, suggestions along<br />

these lines have been made (Brooks, 1958) in order to avoid a premature termination of<br />

the minimum search innarrow oblique valleys (compare Chap. 3, Sect. 3.2.1.1). Similar<br />

concepts have been developed for example by O'Hagan <strong>and</strong> Moler (after Wilde <strong>and</strong><br />

Beightler, 1967), Emery <strong>and</strong> O'Hagan (1966), Lawrence <strong>and</strong> Steiglitz (1972), <strong>and</strong> Beltrami<br />

<strong>and</strong> Indusi (1972), to improve the pattern search ofHooke<strong>and</strong>Jeeves (1961, see<br />

Chap. 3, Sect. 3.2.1.2). The limitation to a nite number of search directions is not only<br />

a disadvantage in narrow oblique valleys but also at the border of the feasible region<br />

as determined by inequality constraints. All the deterministic remedies against prematurely<br />

ending the iteration sequence assume that more information can be gathered, for<br />

example in the form of partial derivatives of the constraint functions (see Klingman <strong>and</strong><br />

Himmelblau, 1964 Glass <strong>and</strong> Cooper, 1965 Paviani <strong>and</strong> Himmelblau, 1969). Providing<br />

this information usually means a high extra cost <strong>and</strong> is sometimes not possible at all.<br />

87


88 R<strong>and</strong>om Strategies<br />

R<strong>and</strong>om directions that are not oriented with respect to the structure of the objective<br />

function <strong>and</strong> the allowed region also imply a higher cost because they do not take optimal<br />

single steps. They can, however, be applied in every case.<br />

Many deterministic optimization methods, especially those which are guided by the<br />

gradient of the objective function, haveconvergence di culties at points where the partial<br />

derivatives are discontinuous. On the contour diagram of a two parameter objective<br />

function, of which the maximum is sought, such positions correspond to sharp ridges<br />

leading to the summit (e.g., Zwart, 1970). Anarrowvalley{the geometric picture in<br />

the case of minimization{leads to the same problem if the nite step lengths are greater<br />

than its width. Then all attempts fail to make improvements in the coordinate directions<br />

or, from trial steps in these directions, fail to predict a locally best direction in which<br />

to continue (gradient direction). The same phenomenon can also occur when the partial<br />

derivatives are speci ed analytically, because of the rounding errors involved in computing<br />

with a nite number of signi cant gures. To avoid premature termination of a search in<br />

such cases, Norkin (1961) has suggested the following procedure. When the optimization<br />

according to the conventional scheme has ended, a step is taken away from the supposed<br />

optimum in an arbitrary coordinate direction. The extremum is sought again, excluding<br />

this one variable, <strong>and</strong> the search is only nally ended when deviations in all directions<br />

have led back to the same point. This rule should also prevent stagnation at saddle points.<br />

Even the simplex method of linear programming makes r<strong>and</strong>om decisions if the search<br />

for the extremum threatens to be endless because the problem is degenerate. Then following<br />

Dantzig's suggestion (1966) the iteration scheme should be interrupted in favor of a<br />

r<strong>and</strong>om exchange step. A problem is only degenerate, however, because the general rules<br />

do not cover the special case (see also Chap. 6, Sect. 6.2). A further example of resorting<br />

to chance when a dead end has been reached is Brent's modi cation of the strategy with<br />

conjugate directions (Brent, 1973). Powell's algorithm (Powell, 1964) when applied to<br />

problems in many dimensions tends to generate linearly dependent directions <strong>and</strong> then<br />

to proceed within a subspace of IR n . For this reason Brent now <strong>and</strong> then interrupts the<br />

line searches with steps in r<strong>and</strong>omly chosen directions (see also Chap. 3, Sect. 3.2.2.1).<br />

One very frequently comes across proposals to let chance take control when the problem<br />

is to nd global minimaofmultimodal objective functions. Such problems frequently<br />

crop up in process design (Motskus, 1967 Mockus, 1971) but can also be the result of recasting<br />

discrete problems into continuous form (Katkovnik <strong>and</strong> Shimelevich, 1972). Practically<br />

all sequential search procedures can only lead to a local optimum{as a rule, the one<br />

nearest to the starting point. There are a few proposals for ensuring global convergence of<br />

sequential optimization methods (e.g., Motskus <strong>and</strong> Feldbaum, 1963 Chichinadze, 1967,<br />

1969 Goldstein <strong>and</strong> Price, 1971 Ueing, 1971, 1972 Branin <strong>and</strong> Hoo, 1972 McCormick,<br />

1972 Sutti, Trabattoni, <strong>and</strong> Brughiera, 1972 Treccani, Trabattoni, <strong>and</strong> Szego, 1972<br />

Brent, 1973 Hesse, 1973 Opacic, 1973 Ritter <strong>and</strong> Tui as mentioned by Zwart, 1973).<br />

They are often in the form of additional, heuristic rules. Gran (1973), for example, considers<br />

gradient methods that are supposed to achieve globalconvergence by the addition<br />

of a r<strong>and</strong>om process to the deterministic changes. Hill (1964 see also Hill <strong>and</strong> Gibson,<br />

1965) suggests subdividing the interval to be explored <strong>and</strong> gathering su cient information<br />

in each section to carry out a cubic interpolation. The best of the results for the


R<strong>and</strong>om Strategies 89<br />

parts is taken as an approximation to the global optimum. However, for n-dimensional<br />

interpolations the cost increases rapidly with n thisscheme thus looks impractical for<br />

more than two variables. To work with several, r<strong>and</strong>omly chosen starting points <strong>and</strong> to<br />

compare each of the local minima (or maxima) obtained is usually regarded as the only<br />

course of action for determining the global optimum with at least a certain probability<br />

(so-called multistart techniques). Proposals along these lines have beenmadeby, among<br />

others, Gelf<strong>and</strong> <strong>and</strong> Tsetlin (1961), Bromberg (1962), Bocharov <strong>and</strong>Feldbaum (1962),<br />

Zellnik, Sondak, <strong>and</strong> Davis (1962), Krasovskii (1962), Gurin <strong>and</strong> Lobac (1963), Flood <strong>and</strong><br />

Leon (1964, 1966), Kwakernaak (1965), Casey <strong>and</strong> Rustay (1966), Weisman <strong>and</strong> Wood<br />

(1966), Pugh (1966), McGhee (1967), Crippen <strong>and</strong> Scheraga (1971), <strong>and</strong> Brent (1973).<br />

A further problem faces deterministic strategies if the calculated or measured values<br />

of the objective function are subject to stochastic perturbations. In the experimental<br />

eld, for example in the on-line optimum search, or for control of the optimal conditions<br />

in processes, perturbations must be taken into account from the start (e.g., Tovstucha,<br />

1960 Feldbaum, 1960, 1962 Krasovskii, 1963 Medvedev, 1963, 1968 Kwakernaak, 1966<br />

Zypkin, 1967). However, in computational optimization too, where the objective function<br />

is analytically speci ed, a similar e ect arises because of rounding errors (Brent, 1973),<br />

especially if one uses hybrid analogue computers for solving functional optimization problems<br />

(e.g., Gilbert, 1967 Korn <strong>and</strong> Korn, 1964 Bekey <strong>and</strong> Karplus, 1971). A simple, if<br />

expensive (in the sense of cost in computations or trials) method of dealing with this is the<br />

repetition of measurements until a de nite conclusion is possible. This is the procedure<br />

adopted by Box <strong>and</strong> Wilson (1951) in the experimental gradient method, <strong>and</strong> by Box<br />

(1957) in his EVOP strategy. Instead of a xed number of repetitions, which while on<br />

the safe side may be unnecessarily high, one can follow the concept of sequential analysis<br />

of statistical data (Wald, 1966 see also Zigangirov, 1965 Schumer, 1969 Kivelidi <strong>and</strong><br />

Khurgin, 1970 Langguth, 1972), which istomake only as many trials as the trial results<br />

seem to make absolutely necessary. More detailed investigations on this subject havebeen<br />

made, for example, by Mlynski (1964a,b, 1966a,b).<br />

As opposed to attempting to improve the decisive data, Brooks <strong>and</strong> Mickey (1961)<br />

have found that one should work with the minimum number of n + 1 comparison points<br />

in order to determine a gradient direction, even if this is a perturbed one. One must<br />

however depart from the requirement thateach step should yield a success, or even the<br />

locally greatest success. The motto that following locally the best possible route seldom<br />

leads to the best overall result is true not only for rst order gradient strategies but also for<br />

Newton <strong>and</strong> quasi-Newton methods . Harkins (1964), for example, maintains that inexact<br />

line searches not only do not worsen the convergence of a minimization procedure but in<br />

some cases actually improve it. Similar experiences led Davies, Swann, <strong>and</strong> Campey in<br />

their strategy (see Chap. 3, Sect. 3.2.1.4) to make only one quadratic interpolation in<br />

each direction. Also Spendley, Hext, <strong>and</strong> Himsworth (1962), in the formulation of their<br />

simplex method, which generates only near-optimal directions, work on the assumption<br />

that r<strong>and</strong>om decisions are not necessarily a total disadvantage (see also Himsworth, 1962).<br />

Based on similar arguments, the modi cation of this strategy by M.J.Box (1965) sets<br />

up the initial simplex or complex by means of r<strong>and</strong>om numbers. Imamura et al. (1970)<br />

even go so far as to superimpose arti cial stochastic variations on an objective function


90 R<strong>and</strong>om Strategies<br />

in order to prevent convergence to inferior local optima.<br />

The rigidity of an algorithm based on a xed internal model of the objective function,<br />

with which the information gathered during the iterations is interpreted, is advantageous<br />

if the objective function corresponds closely enough to the model. If this is not the case,<br />

the advantage disappears <strong>and</strong> may even turn into a disadvantage. Second order methods<br />

with quadratic models seem more sensitive in this respect than rst order methods with<br />

only linear models. Even more robust are the direct search strategies that work without<br />

an explicit model, such as the strategy of Hooke <strong>and</strong> Jeeves (1961). It makes no use of<br />

the sizes of the changes in the objective functionvalues, but only of their signs.<br />

A method that uses a kind of minimal model of the objective function is the stochastic<br />

approximation (Schmetterer, 1961 see also Chap. 2, Sect. 2.3). This purely deterministic<br />

method assumes that the measured or calculated function values are samples of a normally<br />

distributed r<strong>and</strong>om quantity, of which the expectation value is to be minimized or<br />

maximized. The method feels its way to the optimum with alternating exploratory <strong>and</strong><br />

work steps, whose lengths form convergent series with prescribed bounds <strong>and</strong> sums. In<br />

the multidimensional case this st<strong>and</strong>ard concept can be the basis of various strategies for<br />

choosing the directions of the work steps (Fabian, 1968). Usually gradient methods show<br />

themselves to best advantage here. The stochastic approximation itself is very versatile.<br />

Constraintscanbetaken into account (Kaplinskii <strong>and</strong> Propoi, 1970), <strong>and</strong> problems of<br />

functional optimization can be treated (Gersht <strong>and</strong> Kaplinskii, 1971) as well as dynamic<br />

problems of maintaining or seeking optima (Chang, 1968). Tsypkin (1968a,b,c, 1970a,b<br />

see also Zypkin, 1966, 1967, 1970) discusses these topics very thoroughly. There are also,<br />

however, arguments against the reliability of convergence for certain types of objective<br />

function (Aizerman, Braverman <strong>and</strong> Rozonoer, 1965). The usefulness of the strategy in<br />

the multidimensional case is limited by its high cost. Hence there has been no shortage<br />

of attempts to accelerate the convergence (Fabian, 1967 Berlin, 1969 Saridis, 1968,<br />

1970 Saridis <strong>and</strong> Gilbert, 1970 Janac, 1971 Kwatny, 1972 see also Chap. 2, Sect. 2.3).<br />

Ideas for using r<strong>and</strong>om directions look especially promising some of the many investigations<br />

of this topic which have been published are Loginov (1966), Stratonovich (1968,<br />

1970), Schmitt (1969), Ermoliev (1970), Svechinskii (1971), Tsypkin (1971), Antonov <strong>and</strong><br />

Katkovnik (1972), Berlin (1972), Katkovnik <strong>and</strong> Kulchitskii (1972), Kulchitskii (1972),<br />

Poznyak (1972), <strong>and</strong> Tsypkin <strong>and</strong> Poznyak (1972).<br />

The original method is not able to determine global extrema reliably. Extensions of<br />

the strategy in this direction are due to Kushner (1963, 1972) <strong>and</strong> Vaysbord <strong>and</strong> Yudin<br />

(1968). The sequence of work steps is so designed that the probability of the following<br />

state being the global optimum is maximized. In contrast to the gradient concept, the<br />

information gathered is not interpreted in terms of local but of global properties of the<br />

objective function. In the case of two local minima, the e ort of the search is gradually<br />

concentrated in their neighborhood <strong>and</strong> only when one of them is signi cantly better is<br />

the other ab<strong>and</strong>oned in favor of the one that is also a global minimum. In terms of the<br />

cost of the strategy, the acceleration of the local search <strong>and</strong> the reliability of the global<br />

search are diametrically opposed. Hill <strong>and</strong> Gibson (1965) show that their global strategy<br />

is superior to Kushner's, as well as to one of Bocharov <strong>and</strong> Feldbaum. However, they only<br />

treat cases with n 2 parameters. More recent research results have been presented by


R<strong>and</strong>om Strategies 91<br />

Pardalos <strong>and</strong> Rosen (1987), Torn <strong>and</strong> Zilinskas (1989), Floudas <strong>and</strong> Pardalos (1990),<br />

Zhigljavsky (1991), <strong>and</strong> Rudolph (1991, 1992b). Now thereareeven specialized journals<br />

established in the eld, see Horst (1991).<br />

All the strategies mentioned so far are fundamentally deterministic. They only resort<br />

to chance in dead-end situations, or they operate on the assumption that the objective<br />

function is stochastically perturbed. Jarvis (1968), who compares deterministic <strong>and</strong><br />

probabilistic optimization methods, nds that r<strong>and</strong>om methods that do not stick toany<br />

particular model are most suitable when an optimum must be located under particularly<br />

di cult conditions, such as a perturbed objective function or a \pathological" problem<br />

structure with several extrema, discontinuities, plateaus, forbidden regions, etc. The<br />

homeostat of Ashby (1960) is probably the oldest example of the application of a r<strong>and</strong>om<br />

strategy. Its objective istomaintain a condition of equilibrium against stochastic<br />

disturbances. It may happen that no optimum is sought, but only a point in an allowed<br />

region (today one calls such taskaconstraints satisfaction problem or CSP). Nevertheless,<br />

corresponding solution methods are closely tied to optimization, <strong>and</strong> there are a series of<br />

various heuristic planning methods available (e.g., Weinberg <strong>and</strong> Zehnder, 1969). Ashby's<br />

strategy, which he calls a blind homeostatic process, becomes activewhenever the apparatus<br />

strays from equilibrium. Then the controllable parameters are r<strong>and</strong>omly varied until<br />

the desired condition is restored. The nite number (in this case) of discrete settings of<br />

the variables all enter the search process with equal probability. Chichinadze (1960) later<br />

constructed an electronic model on the same principle <strong>and</strong> used it for synthesizing simple<br />

optimal control systems.<br />

Brooks (1958), probably stimulated by R. L. Anderson (1953), is generally regarded as<br />

the initiator of the use of r<strong>and</strong>om strategies for optimization problems. He describes the<br />

simple, later also called blind or pure r<strong>and</strong>om search for nding a minimum or maximum<br />

in the experimental eld. In a closed interval a x b several points are chosen at<br />

r<strong>and</strong>om. The probability density w(x) is constant everywhere within the region <strong>and</strong> zero<br />

outside.<br />

(<br />

1=V for all a x b<br />

w(x) =<br />

0 otherwise<br />

V , the volume of the cube with corners ai <strong>and</strong> bi for i =1(1)n, is given by<br />

V =<br />

nY<br />

i=1<br />

(bi ; ai)<br />

The value of the objective function must be determined at all selected points. The point<br />

that has the lowest or highest function value is taken as optimum. How well the true<br />

extremum is approximated depends on the numberoftrialsaswell as on the actual<br />

r<strong>and</strong>om results. Thus one can only give a probability p that the optimum will be found<br />

within a given number N of trials with a prescribed accuracy.<br />

p =1; (1 ; v=V ) N<br />

(4.1)<br />

The volume v < V < 1 contains all points that satisfy the accuracy requirement. By


92 R<strong>and</strong>om Strategies<br />

rearranging Equation (4.1), the number of trials is obtained<br />

N =<br />

ln (1 ; p)<br />

ln (1 ; v<br />

V )<br />

(4.2)<br />

that is required in order to place with probability p at least one trial in the volume v.<br />

Brooks concludes from this that the cost is independent ofthenumber of variables. In<br />

their criticism Hooke <strong>and</strong> Jeeves (1958) point out that it is not feasible to consider the<br />

accuracy in terms of the volume ratio for problems with many variables. For n = 100<br />

parameters, a volume ratio of v<br />

=0:1 corresponds to a length ratio of the side length<br />

V<br />

D of V <strong>and</strong> d of v of<br />

s<br />

d n v<br />

= ' 0:98<br />

D V<br />

This means that the uncertainty in the variables xi is 98% of the original interval [aibi],<br />

although the volume containing the optimum has been reduced to one tenth of the original.<br />

Shimizu (1969) makes the same mistake as Brooks <strong>and</strong> attempts to implement the strategy<br />

for problems with more general constraints.<br />

A comparison of the pure r<strong>and</strong>om search <strong>and</strong> deterministic search methods known at<br />

the time for experimental optimization problems (Brooks, 1959) also shows no advantage<br />

of the stochastic strategy. The test only covers four di erent objective functions, each<br />

with two variables. Brooks then recommends applying his r<strong>and</strong>om method if the number<br />

of parameters is large or if the determination of objective function values is subject to<br />

large perturbations. McArthur (1961) concludes on the basis of numerical experiments<br />

that the r<strong>and</strong>om strategy is also preferable for complicated problem structures. Just this<br />

circumstance has led to the use, even today, of the pure r<strong>and</strong>om search, often called<br />

the Monte-Carlo method, for example in computer optimization of building construction<br />

(Golinski <strong>and</strong> Lesniak, 1966 Lesniak, 1970 Hupfer, 1970).<br />

In principle, all the trials of the simple r<strong>and</strong>om strategy can be made simultaneously.<br />

It is thus numbered among the simultaneous optimization methods. The decision to<br />

choose a particular state vector of variables does not depend on the results of preceding<br />

trials, since the probability of scoring according to the uniform distribution is the same at<br />

all times. However, in applications on the traditional, serially operating computers, the<br />

trials must be made sequentially. This can be used to advantage by storing the current<br />

best value of the objective function <strong>and</strong> its associated variable value. In Chapter 3,<br />

Section 3.1.1 <strong>and</strong> 3.2 the grid or tabulation method was referred to as optimal in the<br />

minimax sense. The blind r<strong>and</strong>om strategy should thus not be any better. De ning the<br />

interval length Di = bi ; ai for the variable xi, with required accuracy di, <strong>and</strong> assuming<br />

that all the Di = D <strong>and</strong> di = d for i =1(1)n, then for the volume ratio in Equations<br />

(4.1) <strong>and</strong> (4.2)<br />

v<br />

V =<br />

If v<br />

is small, which when there are many variables must be the case, one can use the<br />

V<br />

approximation<br />

ln (1 + y) ' y for y 1<br />

d<br />

D<br />

! n


R<strong>and</strong>om Strategies 93<br />

to write the number of required trials as<br />

Assuming that D<br />

d<br />

N ';ln(1 ; p) D<br />

d<br />

is an integer, the grid method requires<br />

N = D<br />

d<br />

trials (compare Chap. 3, Sect. 3.2, Equation (3.19)). The value is the same for both<br />

procedures if p ' 0:63. Supposing that the probability of at least one score of the<br />

required accuracy is p =0:90, then the r<strong>and</strong>om strategy results in<br />

N ' 2:3 D<br />

d<br />

which is clearly worse than the grid strategy (Spang, 1962). The reason for the extra<br />

cost, however, should not be attributed to the r<strong>and</strong>omness of decisions itself, but to the<br />

fact that for an equiprobable, continuous selection of variables, the trials can be very<br />

close together or, in the discrete case, they can repeat themselves. If one can avoid that,<br />

the disadvantage would no longer exist. A r<strong>and</strong>omized sequence of trials even might hit<br />

upon the optimal result earlier than an ordered one. Nevertheless Spang's proof has for<br />

some time brought all r<strong>and</strong>om methods, not only the simple Monte-Carlo strategy, into<br />

disrepute.<br />

Nowadays the term Monte-Carlo methods is understood to cover, in general, simulation<br />

methods that have to do with stochastic events. They are applied e ectively to<br />

solving di cult di erential equations (Little, 1966) or for evaluating integrals (Cowdrey<br />

<strong>and</strong> Reeves, 1963 McGhee <strong>and</strong> Walford, 1968). Besides the simple hit-or-miss scheme,<br />

however, greatly improved variants have been developed (e.g., W. F. Bauer, 1958 Hammersley<br />

<strong>and</strong> H<strong>and</strong>scomb, 1964 Korn, 1966, 1968 Hull, 1967 Br<strong>and</strong>l, 1969). Amann<br />

(1968a,b) reports a Monte-Carlo method with information storage <strong>and</strong> a sequential extension<br />

for the solution of a linear boundary value problem, <strong>and</strong> Curtiss (1956) describes<br />

a Monte-Carlo procedure for solving systems of linear equations. Both are supposed to be<br />

less costly than comparable deterministic strategies. Pinkham (1964) <strong>and</strong> Pincus (1970)<br />

describe modi cations for the problems of nding zeros of a non-linear function <strong>and</strong> of constrained<br />

optimization. Since only relatively few publications treat r<strong>and</strong>om optimization<br />

methods in any depth (Karnopp, 1961, 1963 Idelsohn, 1964 Dickinson, 1964 Rastrigin,<br />

1963, 1965a,b, 1966, 1967, 1968, 1969, 1972 Lavi <strong>and</strong> Vogl, 1966 Schumer, 1967 Jarvis,<br />

1968 Heydt, 1970 Cockrell, 1970 White, 1970, 1971 Aoki, 1971 Kregting <strong>and</strong> White,<br />

1971), the improved strategies will be brie y presented here. They all operate with sequential<br />

<strong>and</strong> sometimes bothsimultaneous <strong>and</strong> sequential r<strong>and</strong>om trials <strong>and</strong> in one way<br />

or another exploit the information from preceding trials to accelerate the convergence.<br />

Brooks himself already suggests several improvements. Thus to exclude repetitions or<br />

closely situated trials, the volume to be investigated can be subdivided into, for example,<br />

cubic subspaces, into each of which only one r<strong>and</strong>om trial is placed. According to one's<br />

n<br />

n<br />

n


94 R<strong>and</strong>om Strategies<br />

knowledge of the approximate position of the optimum, the subspaces will be assigned<br />

di erent sizes (Idelsohn, 1964). The original uniform distribution is thereby replaced by<br />

one with a greater density in the neighborhood of the expected optimum. Karnopp (1961,<br />

1963, 1966) has treated this problem in detail without, however, giving any practical<br />

procedure. Mathematically based investigations of the same topic are due to Motskus<br />

(1965), Hupfer (1970), Pluznikov, Andreyev, <strong>and</strong> Klimenko (1971), Yudin (1965, 1966,<br />

1972), Vaysbord (1967, 1968, 1969), Taran (1968a,b), Karumidze (1969), <strong>and</strong> Meerkov<br />

(1972). If after several (simultaneous) samples the search is continued in an especially<br />

promising looking subregion, the procedure becomes sequential in character. Suggestions<br />

of this kind have been made for example by McArthur (1961), Motskus (1965), <strong>and</strong><br />

Hupfer (1970) (shrinkage r<strong>and</strong>om search). Zakharov (1969, 1970) applies the stochastic<br />

approximation for the successive shrinkage of the region in which Monte-Carlo samples<br />

are placed. The most thoroughly worked out strategy is that of McMurtry <strong>and</strong> Fu (1966,<br />

probabilistic automaton see also McMurtry, 1965). The problem considered is to adjust<br />

the variable parameters of a control system for a dynamic process in such away that the<br />

optimum of the system is found <strong>and</strong> maintained despite perturbations <strong>and</strong> (slow) drift<br />

(Hill, McMurtry, <strong>and</strong> Fu, 1964 Hill <strong>and</strong> Fu, 1965). Initially the probabilities are equal<br />

for all subregions, at the center of which the function values are measured (assumed to be<br />

stochastically perturbed). In the course of the iterations the probability matrix is altered<br />

so that regions with better objective function values are tested more often than others.<br />

The search ends when only one subregion remains: the one with the highest probability<br />

of containing the global optimum. McMurtry <strong>and</strong> Fu use a so-called linear intensi cation<br />

to adjust the probability matrix. Suggestions for further improving the convergence rate<br />

have been made by Nikolic <strong>and</strong> Fu (1966), Fu <strong>and</strong> Nikolic (1966), Shapiro <strong>and</strong> Narendra<br />

(1969), Asai <strong>and</strong> Kitajima (1972), Viswanathan <strong>and</strong> Narendra (1972), <strong>and</strong> Witten (1972).<br />

Strongin (1970, 1971) treats the same problem from the point of view of decision theory.<br />

All these methods lay great emphasis on the reliability of global convergence. The<br />

quality of the approximation depends to a large extent on the number of subdivisions<br />

of the n-dimensional region under investigation. High accuracy requirements cannot be<br />

met for many variables since, at least initially, the number of subregions to investigate<br />

rises exponentially with the number of parameters. To improve the local convergence<br />

properties, there are suggestions for replacing the midpoint tests in a subvolume by the<br />

result of an extreme value search. This could be done with one of the familiar search<br />

strategies such as a gradient method (Hill, 1969) or any other purely sequential r<strong>and</strong>om<br />

search method (Jarvis 1968, 1970) with a high convergence rate, even if it were only<br />

guaranteed to converge locally. Application, however, is limited to problems with at most<br />

seven or eight variables, as reported.<br />

Another possibility for giving a sequential character to r<strong>and</strong>om methods consists of<br />

gradually shifting the expectation value of a r<strong>and</strong>om variable with a restricted probability<br />

density distribution. Brooks (1958) calls his proposal of this type the creeping r<strong>and</strong>om<br />

search. Suitable r<strong>and</strong>om numbers are provided for example by a Gaussian distribution<br />

with expectation value <strong>and</strong> st<strong>and</strong>ard deviation . Starting from a chosen initial condition<br />

x (0) , several simultaneous trials are made, which most likely fall in the neighborhood of the<br />

starting point ( = x (0) ). The coordinates of the point with the best function value form


R<strong>and</strong>om Strategies 95<br />

the expectation value for the next set of r<strong>and</strong>om trials. In contrast to other procedures, the<br />

data from the other trials are not exploited to construct a linear or even quadratic model<br />

from which to calculate a best possible step (e.g., Brooks <strong>and</strong> Mickey, 1961 Aleks<strong>and</strong>rov,<br />

Sysoyev, <strong>and</strong> Shemeneva, 1968 Pugachev, 1970). For small <strong>and</strong> a large number of<br />

samples, the best value will in any case fall in the locally most favorable direction. In<br />

order to approach a solution with high accuracy, the variance 2 must be successively<br />

reduced. Brooks, however, gives no practical rule for this adjustment. Many algorithms<br />

have since been published that are extensions of Brooks' basic concept of the creeping<br />

r<strong>and</strong>om search. Most of them no longer choose the best of several trials they accept each<br />

improvement <strong>and</strong> reject each worsening (Favreau <strong>and</strong> Franks, 1958 Munson <strong>and</strong> Rubin,<br />

1959 Wheeling, 1960).<br />

The iteration rule of a creeping r<strong>and</strong>om search is, for the minimum search:<br />

x (k+1) =<br />

( x (k) + z (k) if F (x (k) + z (k) ) F (x (k) ) (success)<br />

x (k) otherwise (failure)<br />

The r<strong>and</strong>om vector z (k) ,which in this notation e ects the change in the state vector x,<br />

belongs to an n-dimensional (0 2 ) normal distribution with the expectation value =0<br />

<strong>and</strong> the variance 2 , which in the simplest case is the same for all components. One can<br />

thus regard , or better p n, as a kind of average step length. The direction of z (k) is uniformly<br />

distributed in IR n , i.e., purely r<strong>and</strong>om. Gaussian distributions for the increments<br />

are also used by Bekey et al. (1966), Stewart, Kavanaugh, <strong>and</strong> Brocker (1967), <strong>and</strong> De<br />

Graag (1970). Gonzalez (1970) <strong>and</strong> White (1970) use instead of a normal distribution a<br />

uniform distribution that covers a small region in the form of an n-dimensional cube centered<br />

on the starting point. This clearly favors the diagonal directions, in which the total<br />

step lengths are on average a factor p n greater than in the coordinate directions. Pierre<br />

(1969) therefore restricts the uniformly distributed r<strong>and</strong>om probe to an n-dimensional<br />

hypersphere of xed radius. Rastrigin (1960{1972) gives the total step length<br />

s =<br />

vu<br />

u<br />

t nX<br />

a xed value. Instead of the normal distribution he thus obtains a circumferential or<br />

hypersphere-surface distribution. In addition, he repeats the evaluation of the objective<br />

function when there is a failure in order to reduce the e ect of stochastic perturbations.<br />

Taking two model functions<br />

F1(x) = F1(x1:::xn) =<br />

F2(x) = F2(x1:::xn) =<br />

i=1<br />

nX<br />

xi<br />

i=1<br />

vu<br />

u<br />

t nX<br />

i=1<br />

z 2 i<br />

x 2 i<br />

(inclined plane)<br />

(hypercone)<br />

he investigates the average convergence rate of his strategy <strong>and</strong> compares it with that<br />

of an experimental gradient method, in which the partial derivatives are approximated<br />

by quotients of di erences obtained from exploratory steps . He shows that for a linear


96 R<strong>and</strong>om Strategies<br />

problem structure like F1 the r<strong>and</strong>om strategy needs only O( p n) trials, whereas the<br />

gradient strategy needs O(n) trials to cover a prescribed distance. For n>3, the r<strong>and</strong>om<br />

strategy is always superior to the deterministic method. Whereas Rastrigin shows that the<br />

r<strong>and</strong>om search always does better than the gradient searchin the spherically symmetric<br />

eld F2, Movshovich (1966) maintains the opposite. The discrepancy can be traced to<br />

di ering assumptions about the choice of step length (see also Yvon, 1972 Gaviano <strong>and</strong><br />

Fagiuoli, 1972).<br />

To choose suitable step lengths or variances poses the same problems for sequential<br />

r<strong>and</strong>om searches as are familiar from deterministic strategies. Here too, a closely related<br />

problem is to achieve global convergence with reference to a suitable termination rule, the<br />

convergence criterion, <strong>and</strong> with a degree of reliability. Khovanov (1967) has conceived<br />

an individual manner of controlling the r<strong>and</strong>om step lengths. He accepts every r<strong>and</strong>om<br />

change, irrespective of success or failure, increases the variance at each failure <strong>and</strong> reduces<br />

it otherwise. The objective is to increase the probability of lingering in the more promising<br />

regions <strong>and</strong> to ab<strong>and</strong>on states that are irrelevant to the optimum search. No applications<br />

of the strategy are known to the author. Favreau <strong>and</strong> Franks (1958), Bekey et al. (1966),<br />

<strong>and</strong> Adams <strong>and</strong> Lew (1966) use a constant ratio between i <strong>and</strong> xi for i =1(1)n. This<br />

measure does have the e ect of continuously altering the \step lengths," but its merit is<br />

not obvious. Just because a variable value xi is small in no way indicates that it is near to<br />

the extreme position being sought. Karnopp (1961) was the rst to propose a step length<br />

rule based on the number of successes or failures, according to which the i or s are all<br />

uniformly reduced or enlarged such that a success always occurs after two or three trials.<br />

Schumer (1967), <strong>and</strong> Schumer <strong>and</strong> Steiglitz (1968), submit Rastrigin's circumferential<br />

r<strong>and</strong>om direction method to a thorough examination by probability theory. For the<br />

model<br />

F3(x) =<br />

nX<br />

i=1<br />

x 2<br />

i = r 2<br />

with the condition n 1 <strong>and</strong> the continuously optimal step length<br />

s ' 1:225 r<br />

p n<br />

they obtain a rate of progress ', which istheaverage distance covered in the direction of<br />

the objective (minimum) per r<strong>and</strong>om step:<br />

' ' 0:203 r<br />

n<br />

<strong>and</strong> a success rate ws which istheaverage number of successes per trial:<br />

ws ' 0:270<br />

They are only able to treat the general quadratic case theoretically for n = 2. Their<br />

result can be interpreted in the sense that ' is dependent on the smallest radius of<br />

curvature of the elliptic contour passing through r. Since neither r nor s can be assumed<br />

to be known in advance, it is not clear how tokeep to the optimal step length. Schumer<br />

<strong>and</strong> Steiglitz (1968) give an adaptive method with which the correct size of s can be


R<strong>and</strong>om Strategies 97<br />

maintained at least approximately during the course of the iterations. At the starting<br />

point x (0) two r<strong>and</strong>om changes are made with step lengths s (0) <strong>and</strong> s (0) (1 + a), where<br />

0 < a 1. If both samples are successful, for the next iteration s (1) = s (0) (1 + a) is<br />

taken, i.e., the greater value. If only one sample yields an improvement in the objective<br />

function, its step length is taken nally if no success is scored, s (1) remains equal to s (0) .<br />

A reduction in s is only made if several consecutive trials are unsuccessful. This is<br />

also the procedure of Maybach (1966). This adjustment to the local conditions assists the<br />

strategy in achieving high convergence rates but reduces the chances of locating global<br />

optima among several local ones. For this reason a sample with a signi cantly larger step<br />

length (a > 1) should be included from time to time. Numerical tests show that the<br />

computation cost, or number of trials, actually only increases linearly with the number<br />

of variables. Schumer <strong>and</strong> Steiglitz have tested this using the model functions F3 <strong>and</strong><br />

F4(x) =<br />

A comparison with a Newton-Raphson strategy, inwhich the partial rst <strong>and</strong> second<br />

derivatives are determined numerically <strong>and</strong> the cost increases as O(n 2 ), favors the r<strong>and</strong>om<br />

method when n>78 for F3 <strong>and</strong> when n>2 for F4. For the second, biquadratic model<br />

function, Nelder <strong>and</strong> Mead (1965) state that the numberoftrialsorfunctionevaluations<br />

in their simplex strategy grows as O(n 2:11 ), so that the sequential r<strong>and</strong>om method is<br />

superior from n>10. White <strong>and</strong> Day (1971) report numerical tests in which the cost in<br />

iterations with Schumer's strategy increases more sharply than linearly with n, whereas<br />

a modi cation by White (1970) shows exact linear dependence. A comparison with the<br />

strategy of Fletcher <strong>and</strong> Powell (1963) favors the latter, especially for truly quadratic<br />

functions.<br />

Rechenberg (1973), with an n-dimensional normal distribution (see Chap. 5, Sect. 5.1),<br />

reaches almost the same theoretical results as Schumer for the circumferential distribution,<br />

if one notes that the overall step length<br />

tot =<br />

vu<br />

u<br />

t nX<br />

i=1<br />

nX<br />

i=1<br />

x 4<br />

i<br />

2<br />

i = p n<br />

for equal variances 2<br />

i = 2 in each r<strong>and</strong>om component zi is proportional to the square<br />

root of the number of variables. The reason for this lies in the property of Euclidean<br />

space that, as the number of dimensions increases, the volume of a hypersphere becomes<br />

concentrated more <strong>and</strong> more in the boundary region near the surface. Rechenberg's adaptation<br />

rule is founded on the relation between optimal variance <strong>and</strong> probability of success<br />

derived from two essentially di erent models of the objective function. The adaptation<br />

rule which is thereby formulated makes the frequency <strong>and</strong> size of the increments respectively<br />

dependent on the number of variables <strong>and</strong> independent of the structure of the<br />

objective function. This will be discussed in more detail in Chapter 5, Section 5.1.<br />

Convergence proofs for the sequential r<strong>and</strong>om strategy have been given by Matyas<br />

(1965, 1967) <strong>and</strong> Rechenberg (1973) only for the case of constant variance 2 . Gurin<br />

(1966) has proved convergence also for stochastically perturbed objective functions. The


98 R<strong>and</strong>om Strategies<br />

convergence rate is still reduced by perturbations (Gurin <strong>and</strong> Rastrigin, 1965), but not<br />

as much as in gradient methods. Global convergence can be achieved if the reference<br />

value of the objective function is measured more than once at the comparison point<br />

(Saridis <strong>and</strong> Gilbert, 1970). As soon as any attempt is made to achieve higher rates of<br />

convergence by adjusting the variances or step lengths, the chance of nding a global<br />

optimum diminishes. Then the r<strong>and</strong>om strategy itself becomes a path-oriented instead<br />

of a volume-oriented strategy. The probability of global convergence still always remains<br />

nite it may simply become very small, especially in the case of many dimensions.<br />

Apart from adjusting the step lengths, one can consider modifying the directions. Several<br />

proposals of this kind have been published: Satterthwaite (1959a following McArthur,<br />

1961), Wheeling (1960), Smith <strong>and</strong> Rudd (1964 following Dickinson, 1964), Matyas (1965,<br />

1967), Bekey et al. (1966), Stewart, Kavanaugh, <strong>and</strong> Brocker (1967), De Graag (1970),<br />

<strong>and</strong> Lawrence <strong>and</strong> Emad (1973). They are all heuristic in nature. In the simplest case<br />

of a directed r<strong>and</strong>om search, a successful r<strong>and</strong>om direction is maintained until a failure<br />

occurs (Satterthwaite). Bekey, Lawrence, <strong>and</strong> Rastrigin actually make use of each r<strong>and</strong>om<br />

direction. If the rst step leads to a failure, they use the opposite direction (positive<br />

<strong>and</strong> negative absolute biasing). Smith <strong>and</strong> Rudd store the two currently best points from<br />

a larger series of samples <strong>and</strong> obtain from their separation a step length for continuing<br />

the optimization. Wheeling's history vector method adds to each r<strong>and</strong>om increment a<br />

deterministic portion, derived from experience. This additional vector is initially zero. It<br />

is increased at each success by a fraction of the increment vector, <strong>and</strong> correspondingly<br />

decreased at each failure. Such alearning <strong>and</strong> forgetting process also forms the basis of<br />

the algorithms of De Graag <strong>and</strong> Matyas. The latter has received the most attention,<br />

in spite of the fact that it gives no precise guidance on how tochoose the variances.<br />

Schrack <strong>and</strong> Borowski (1972), who apply their own step length rule in Matyas' strategy,<br />

were able to show bynumerical tests that the simple algorithm of Schumer <strong>and</strong> Steiglitz,<br />

without direction orientation, is at least as good as Matyas' for unperturbed as well as<br />

perturbed measurements of the objective function. A quite di erent kind of method, due<br />

to Kjellstrom (1965), in which the r<strong>and</strong>om search takes place in varying three dimensional<br />

subspaces of the space IR n , shows itself here to be very much worse.<br />

Another method that sets out to accept only especially favorable directions is the<br />

threshold strategy of Stewart, Kavanaugh <strong>and</strong> Brocker (1967), in which only those r<strong>and</strong>om<br />

changes are accepted that result in a speci ed minimum improvement in the objective<br />

function value. A more recent version of the same idea has been given by Dueck <strong>and</strong><br />

Scheuer (1990). The simultaneous adjustment of step lengths <strong>and</strong> directions has seldom<br />

been attempted. The suggestions of Favreau <strong>and</strong> Franks (1958) <strong>and</strong> Matyas (1965, 1967)<br />

remain too imprecise to be practicable. Gaidukov (1966 see also Hupfer, 1970) <strong>and</strong><br />

Furst, Muller, <strong>and</strong> Nollau (1968) provide more exact information for this purpose, based<br />

on either the concepts of Rastrigin or Matyas. Modi cation of the expectation values <strong>and</strong><br />

variances of the r<strong>and</strong>om vectors is made according to the success or failure of iterations. No<br />

applications of the strategy are known, however, so that for the time being the observation<br />

of Schrack <strong>and</strong> Borowski (1972) still st<strong>and</strong>s, namely that a careful choice of the step lengths<br />

is the most important prerequisite for the rapid convergence of a r<strong>and</strong>om method.<br />

A method devised by Rastrigin (1965a,b, 1968) <strong>and</strong> developed further by Heydt (1970)


R<strong>and</strong>om Strategies 99<br />

works entirely with a restricted choice of directions. With a xed step length, a direction<br />

can be r<strong>and</strong>omly selected only from within an n-dimensional hypercone. The angle subtended<br />

by the cone <strong>and</strong> its height (<strong>and</strong>thus the overall step length) are controlled in an<br />

adaptiveway. For a spherical objective function, e.g., the model functions F2 (hypercone),<br />

F3 (hypersphere), or F4 (something intermediate between hypersphere <strong>and</strong> hypercube),<br />

there is no improvement in the convergence behavior. Advantages can only be gained<br />

if the search has to follow a particular direction for a long time along a narrow valley.<br />

Sudden changes in direction present a problem, however, which leads Heydt to consider<br />

substituting for the cone con guration a hyper-parabolic or hyper-hyperbolic distribution,<br />

with which at least small step lengths would retain su cient freedom of direction.<br />

In every case the striving for rapid convergence is directly opposed to the reliabilityof<br />

global convergence. This has led Jarvis (1968, 1970) to investigate a combination of the<br />

method of Matyas (1965, 1967) with that of McMurtry <strong>and</strong> Fu (1966). Numerical tests<br />

by Cockrell (1969, 1970 see also Fu <strong>and</strong> Cockrell, 1970) show that even here the basic<br />

strategy of Matyas (1965) or Schumer <strong>and</strong> Steiglitz (1967) is clearly the better alternative.<br />

It o ers high convergence rates besides a fair chance of locating global optima, at least<br />

for a small number of variables. In the case of many dimensions, every attempt to reach<br />

global reliability isthwarted by the excessive cost. This leaves the globally convergent<br />

stochastic approximation method of Vaysbord <strong>and</strong> Yudin (1968) far behind the rest of<br />

the eld. Furthermore, the sequential or creeping r<strong>and</strong>om search is the least susceptible<br />

if perturbations act on the objective function.<br />

Users of r<strong>and</strong>om strategies always draw attention to their simplicity, exibility <strong>and</strong><br />

resistance to perturbations. These properties are especially important if one wishes to<br />

construct automatic optimalizers (e.g., Feldbaum, 1958 Herschel, 1961 Medvedev <strong>and</strong><br />

Ruban, 1967 Krasnushkin, 1970). Rastrigin actually built the rst optimalizer with a<br />

r<strong>and</strong>om search strategy, whichwas designed for automatic frequency control of an electric<br />

motor. Mitchell (1964) describes an extreme value controller that consists of an analogue<br />

computer with a permanently wired-in digital part. The digital part serves for storage <strong>and</strong><br />

ow control, while the analogue part evaluates the objective function. The developmentof<br />

hybrid analogue computers, in which the computational inaccuracy is determined by the<br />

system, has helped to bring r<strong>and</strong>om methods, especially of the sequential type, into more<br />

general use. For examples of applications besides those of the authors mentioned above,<br />

the following publications can be referred to: Meissinger (1964), Meissinger <strong>and</strong> Bekey<br />

(1966), Kavanaugh, Stewart, <strong>and</strong> Brocker (1968), Korn <strong>and</strong> Kosako (1970), Johannsen<br />

(1970, 1973), <strong>and</strong> Chatterji <strong>and</strong> Chatterjee (1971). Hybrid computers can be applied to<br />

best advantage for problems of optimal control <strong>and</strong> parameter identi cation, because they<br />

are able to carry out integrations <strong>and</strong> di erentiations more rapidly than digital computers.<br />

Mutseniyeks <strong>and</strong> Rastrigin (1964) have devised a special algorithm for the dynamic control<br />

problem of keeping an optimum. Instead of the variable position vector x,avelocityvector<br />

with components @xi=@t is varied. A r<strong>and</strong>omly chosen combination is retained as long as<br />

the objective function is decreasing in value (for minimization @F=@t < 0). As soon as<br />

it begins to increase again, a new velocity vector is chosen at r<strong>and</strong>om.<br />

It is always striking, if one observes living beings, how well adapted they are in shape,<br />

function, <strong>and</strong> lifestyle . In many cases, biological structures, processes, <strong>and</strong> systems even


100 R<strong>and</strong>om Strategies<br />

surpass the capabilities of highly developed technical systems. Recognition of this has for<br />

years led many authors to suspect that nature is in possession of optimal solutions to her<br />

problems. In some cases the optimality of biological subsystems can even be demonstrated<br />

mathematically, for example for the ratios of diameters in branching arteries (Cohn, 1954),<br />

for the hematocrit value (the volume fraction of solid particles in the blood Lew, 1972),<br />

<strong>and</strong> the position of branch points in a level system of blood vessels (Kamiya <strong>and</strong>Togawa,<br />

1972 see also Grassmann, 1967, 1968 Rosen, 1967 Rein <strong>and</strong> Schneider, 1971).<br />

According to the theory of the descent of the species, all organisms that exist today<br />

are the (intermediate) result of a long process of development: evolution. Based on the<br />

multitude of nds of transitional species that have since become extinct, paleontology is<br />

providing a gradually more complete picture of this development. Leaving aside supernatural<br />

explanations, one must assume that the development of optimal or at least very<br />

good structures is a property ofevolution, i.e., evolution is, or possesses, an optimization<br />

(or better, meliorization) strategy.<br />

In evolution, the mechanism of variation is the occurrence of r<strong>and</strong>om exchanges, even<br />

\errors," in the transfer of genetic information from one generation to the next. The selection<br />

criterion favors the better suited individuals in the so-called struggle for existence.<br />

The similarityofvariation <strong>and</strong> selection to the iteration rules of direct optimization methods<br />

is, in fact, striking. This analogy is most often drawn for r<strong>and</strong>om strategies, since<br />

mutations can best be interpreted as r<strong>and</strong>om changes. Thus Ashby (1960) regards as<br />

mutations the stochastic parameter variations in his blind homeostatic process. For many<br />

variables, however, the pure or blind r<strong>and</strong>om search requires so many trials that it offers<br />

no acceptable explanation of the capabilities of natural structures, processes, <strong>and</strong><br />

systems. With the highest possible physical rate of transfer of information, as given by<br />

Bremermann (1962 see also Ashby, 1965, 1968) of 10 47 bits per second <strong>and</strong> gram of computer<br />

mass, the mass of the earth <strong>and</strong> the extent of its lifetime up to now would not<br />

be su cient to solve even simple combinatorial problems by complete enumeration or a<br />

blind r<strong>and</strong>om search, never mind to determine the optimal con guration of the 10 4 to 10 5<br />

genes with their information content of around 10 10 bits (Bremermann, 1963). <strong>Evolution</strong><br />

must rather be considered as a sequential process that exploits the information from preceding<br />

successes <strong>and</strong> failures in order to follow a trajectory, although not a completely<br />

deterministic one, in the n-dimensional parameter space. Brooks (1958) <strong>and</strong> Favreau <strong>and</strong><br />

Franks (1958) are therefore right to compare their creeping r<strong>and</strong>om search with biological<br />

evolution. Yet it is also certainly a very much simpli ed imitation of the natural process<br />

of development. In the 1960s, two proposals that consciously think of higher evolution<br />

principles as optimization rules to be simulated are due to Rechenberg (1964, 1973) <strong>and</strong><br />

Bremermann (1962, 1963, 1967, 1968a,b,c, 1970, 1971, 1973a,b see also Bremermann,<br />

Rogson, <strong>and</strong> Sala , 1965, 1966 Bremermann <strong>and</strong> Lam, 1970). Bremermann reasons from<br />

the (nowadays!) low mutation rates observed in nature that only one component of the<br />

variable vector should be varied at a time. He then encounters with this scheme the<br />

same di culties as arise in the coordinate method. On the basis of his failure with the<br />

mutation-selection scheme, for example on linear programming problems, he comes to the<br />

conclusion that ecological niches are actually only stagnation points in development, <strong>and</strong><br />

they do not represent optimal states of adaptation. None of his many attempts to invoke


R<strong>and</strong>om Strategies 101<br />

the principles of population, sexual inheritance, recombination, dominance, <strong>and</strong> recessiveness<br />

to improve the convergence behavior yield the hoped for breakthrough. He thus<br />

eventually resigns himself to a largely deterministic strategy. In the linear programming<br />

problem, he chooses from the starting point several r<strong>and</strong>om directions <strong>and</strong> follows these in<br />

turn up to the boundary of the feasible region. The best states on the individual bounding<br />

hyperplanes are used to determine a new starting point by taking the arithmetic mean<br />

of the component parameters. Because of the convexity of the allowed region, the new<br />

starting point isalways within it. The simultaneous choice of several search directions<br />

is supposed to be the analogue of the population principle <strong>and</strong> the construction of the<br />

average the analogue of recombination in sexual propagation. To tackle the problem of<br />

nding the minimum or maximum of an unconstrained, non-linear function, Bremermann<br />

even applies a ve point Lagrangian interpolation to determine relative extrema in the<br />

r<strong>and</strong>om directions.<br />

Rechenberg's evolution strategy changes all the components of the variable vector at<br />

each mutation. In his case, the low mutation rate for many dimensions is expressed by<br />

choosing small values for the step lengths, or the spread in the r<strong>and</strong>om changes. On the<br />

basis of theoretical work with two model functions he nds that the st<strong>and</strong>ard deviations<br />

of the r<strong>and</strong>om components are set optimally when they are inversely proportional to the<br />

number of parameters. His two membered evolution strategy resembles the scheme of<br />

Schumer <strong>and</strong> Steiglitz (1968), which isacknowledged to be particularly good, except that<br />

a(0 2 ) normally distributed r<strong>and</strong>om quantity replaces the xed step length s. He has<br />

also added to it a step length modi cation rule, again derived from theory, whichmakes<br />

this look a very promising search method. It is re ned in Chapter 5, Section 5.1 to meet<br />

the requirements of numerical optimization with digital computers. Amultimembered<br />

strategy is treated in Section 5.2, which follows the same basic concept however, by imitating<br />

the principles of population <strong>and</strong> recombination, it can operate without external<br />

control of the step lengths. Incorporating more than one descendant at a time <strong>and</strong> forgetting<br />

\parental wisdom" at the end of each iteration loop has provoked erce objections<br />

against a more natural evolution strategy.<br />

Box (1957) also considers that his EVOP (evolutionary operation) strategy resembles<br />

the biological mutation-selection process. He regards the vertices of his pattern of<br />

trial points, of which the best becomes the center of the next pattern, as individuals of<br />

a population, of which only the best \survives." The \o spring" are, however, generated<br />

by purely deterministic rules. R<strong>and</strong>om decisions, as used by Satterthwaite (1959a<br />

after Lowe, 1964) in his REVOP (r<strong>and</strong>om evolutionary operation) variant, are actually<br />

explicitly rejected by Box(seeYouden et al., 1959 Satterthwaite, 1959b Budne, 1959<br />

Anscombe, 1959).<br />

From a biological or cybernetic pointofview,Pask (1962, 1971), Schmalhausen (1964),<br />

Berg <strong>and</strong> Timofejew-Ressowski (1964), Dobzhansky (1965), Moran (1967), <strong>and</strong> Kussul <strong>and</strong><br />

Luk (1971) among others have examined the analogy between optimization <strong>and</strong> evolution.<br />

The fact that no practical algorithms have come out of this is no doubt because the<br />

evolutionary processes are described only verbally. Although they sometimes even include<br />

their more subtle e ects, they have so far not produced a really quantitative, predictive<br />

theory. Exceptions, such as the work of Eigen (1971 see also Schuster, 1972), Merzenich


102 R<strong>and</strong>om Strategies<br />

(1972), <strong>and</strong> Papentin (1972) are so di erent in emphasis that they are not applicable to<br />

the kind of problems considered here. The ways in which a process of mathematization<br />

can be implemented in theoretical biology are documented in for example the books by<br />

Waddington (1968) <strong>and</strong> Locker (1973), whichcontain a number of contributions of interest<br />

from the optimization point of view, as well as many articles in the journal Mathematical<br />

Biosciences, which has been published by R. W. Bellman since 1967, <strong>and</strong> some papers<br />

from two Berkeley symposia (LeCam <strong>and</strong> Neyman, 1967 LeCam, Neyman, <strong>and</strong> Scott,<br />

1972). Whereas many modern books on biology, such as Riedl (1976) <strong>and</strong> Roughgarden<br />

(1979), still give mainlyverbal explanations of organic evolution, in general, this is no<br />

longer the case. Physicists like Ebeling <strong>and</strong> Feistel (see Feistel <strong>and</strong> Ebeling, 1989) <strong>and</strong><br />

biologists like Maynard Smith (1982, 1989) meanwhile have contributed mathematical<br />

models. The following two paragraphs thus no longer represent the actual situation, but<br />

before we add some new aspects they will be presented, nevertheless, to characterize the<br />

situation as perceived by the author in the early 1970s (Schwefel, 1975a):<br />

Relationships have been seen between r<strong>and</strong>om strategies <strong>and</strong> biological evolution<br />

on the one h<strong>and</strong> <strong>and</strong> the psychology of recognition processes on the other,<br />

for example, by Campbell (1960) <strong>and</strong> Khovanov (1967). The imitation of such<br />

processes{the catch phrase is arti cial intelligence{always leads to the problem<br />

of choosing or designing a suitable search algorithm, which should rather<br />

be heuristic than strictly deterministic. Their simplicity, reliability (even in<br />

extreme, unfamiliar situations), <strong>and</strong> exibility give the r<strong>and</strong>om strategies a<br />

special r^ole in this eld. The topic will not be discussed more fully here, except<br />

to mention some publications that explicitly deal with the relationship<br />

to optimization strategies: Friedberg (1958), Friedberg, Dunham, <strong>and</strong> North<br />

(1959), Minsky (1961), Samuel (1963), J. L. Barnes (1965), Vagin <strong>and</strong> Rudelson<br />

(1968), Thom (1969), Minot (1969), Ivakhnenko (1970), Michie (1971),<br />

<strong>and</strong> Slagle (1972). A particularly impressive example is given by the work of<br />

Fogel, Owens, <strong>and</strong> Walsh (1965, 1966a,b), in which imitation of the biological<br />

evolutionary principles of mutation <strong>and</strong> selection gives a (mathematical)<br />

automaton the ability to recognize prescribed sequences of numbers.<br />

It may be that in order to match the capabilities of the human brain{<strong>and</strong><br />

to underst<strong>and</strong> them{there must be a move away from the digital methods of<br />

present serial computers to quite di erent kinds of switching elements <strong>and</strong><br />

coupling principles. Such concepts, as pursued in neurocybernetics <strong>and</strong> neurobionics,<br />

are described, for example, by Brajnes <strong>and</strong> Svecinskij (1971). The<br />

developmentoftheperceptron by Rosenblatt (1958) can be seen as a rst step<br />

in this direction.<br />

Two research teams that have emphasized the adaptive capacity ofevolutionary procedures<br />

<strong>and</strong> who have shown interesting computer simulations are Allen <strong>and</strong> McGlade<br />

(1986), <strong>and</strong> Galar, Kwasnicka, <strong>and</strong> Kwasnicki (see Galar, Kwasnicka, <strong>and</strong> Kwasnicki,<br />

1980 Galar, 1994). In terms of the optimization tasks looked at throughout this book,<br />

one might call their point of view dynamic or on-line optimization, including optimum<br />

holding against environmental changes. As Schwefel <strong>and</strong> Kursawe (1992)have pointed


R<strong>and</strong>om Strategies 103<br />

out, a limited life span of all individuals is an important ingredient in such cases (principle<br />

of forgetting).<br />

Two others who have tried to explain brain processes on an evolutionary, at least<br />

selectionist, basis are Edelman (1987) <strong>and</strong> Conrad (1988). Though their approach has<br />

not yet been embraced by the main stream of neural network research, this mighthappen<br />

in the near future (e.g., Banzhaf <strong>and</strong> Schmutz, 1992).<br />

An even more general paradigm shift in the eld of arti cial intelligence (AI) has<br />

emerged under the label arti cial life (AL see Langton, 1989, 1994a,b Langton et al.,<br />

1992 Varela <strong>and</strong> Bourgine, 1992). Whereas Lindenmayer (see Prusinkiewicz <strong>and</strong> Lindenmayer,<br />

1990) demonstrates the possibility of (re-)creating plant forms by means of<br />

rather simple computer algorithms, the AL community tries to imitate animal behavior<br />

computationally. In most cases the goal is to design \intelligent" robots, sometimes called<br />

knowbots or animats (Meyer <strong>and</strong> Wilson, 1991 Meyer, 1992 Meyer, Roitblat, <strong>and</strong> Wilson,<br />

1993).<br />

The attraction of even simple evolutionary models (re-)producing fairly complex behavior<br />

of multi-individual systems simulated on computers is already spreading across<br />

the narrow bounds of computer science as such. New ideas are emerging from evolutionary<br />

computation, not only towards the organization of software development (Huberman,<br />

1988), but also into the eld of economics (e.g., Witt, 1992 Nissen, 1993, 1994) <strong>and</strong> even<br />

beyond (Schwefel, 1988 Haefner, 1992). It may be questionable whether worthwhile conclusions<br />

from the new ndings can reach as far as that, but ecology at least should be a<br />

eld that could bene t from a consequent use of evolutionary thinking (see Wol , Soeder,<br />

<strong>and</strong> Drepper, 1988).<br />

Computers have opened a third way of systems analysis aside from the classical mathematical/analytical<br />

<strong>and</strong> experimental/empirical main roads: i.e., numerical <strong>and</strong>/or symbolical<br />

simulation experiments. There is some hope that we may learn this way quickly<br />

enough so that we can maintain life on earth before we more or less unconsciously destroy<br />

it. Real evolution always had to deal with unpredictable environmental changes, not only<br />

those resulting from exogenous in uences, but also self-induced endogenous ones. The<br />

l<strong>and</strong>scape is some kind of n-dimensional trampoline, <strong>and</strong> every good imitation of organic<br />

evolution, whether it be called adaptive or meliorizing, must be able to work properly under<br />

such hard conditions. The multimembered evolution strategy (see Chap. 5, Sect. 5.2)<br />

with limited life span of the individuals ful lls that requirement to some extent.


104 R<strong>and</strong>om Strategies


Chapter 5<br />

<strong>Evolution</strong> Strategies for Numerical<br />

Optimization<br />

The task of mimicking biological structures <strong>and</strong> processes with the object of solving<br />

technical problems is as old as engineering itself. Mimicry itself, as a natural \strategy",<br />

is even older than mankind. The legend of Daedalus <strong>and</strong> Icarus bears early witness<br />

to such human endeavor. A sign of its scienti c coming of age is the formation of the<br />

distinct branch of science known as bionics (e.g., Hertel, 1963 Gerardin, 1968 Beier<br />

<strong>and</strong> Gla , 1968 Nachtigall, 1971 Heynert, 1972 Zerbst, 1987), which is concerned with<br />

the recognition of existing biological solutions to problems that also happen to arise<br />

in engineering, <strong>and</strong> with the adequate emulation of these examples. It is always thereby<br />

supposed that evolution has found particularly good, perhaps even optimal solutions. This<br />

assumption has often proved to be correct, or at any rate useful. Only a few attempts to<br />

imitate the actual methods of natural development are known (Ashby, 1960 Bremermann,<br />

1962{1973 Rechenberg, 1964, 1973 Fogel, Owens, <strong>and</strong> Walsh, 1965, 1966a,b Holl<strong>and</strong>,<br />

1975 see also Chap. 4) since they are curiously regarded a priori as being especially bad,<br />

meaning costly.<br />

Rechenberg proposed the hypothesis \that the method of organic evolution represents<br />

an optimal strategy for the adaptation of living things to their environment," <strong>and</strong> he<br />

concludes \it should therefore be worthwhile to take over the principles of biological<br />

evolution for the optimization of technical systems."<br />

5.1 The Two Membered <strong>Evolution</strong> Strategy<br />

Rechenberg's two membered evolution scheme, suggested in similar form by other authors<br />

as a r<strong>and</strong>om strategy (see Chap. 4) will be expressed in this chapter as an algorithm for<br />

solving non-discrete, non-stochastic, parameter optimization problems. As in Chapter 3,<br />

the problem is<br />

F (x) ! min<br />

where x 2 IR n . In the constrained case x has to be in an allowed region G IR n , where<br />

G = fx 2 IR n j Gj(x) 0 j = 1(1)nGj restriction functionsg<br />

105


106 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

In this, as in all direct search methods, it is not possible to deal with constraints in the<br />

form of equalities.<br />

5.1.1 The Basic Algorithm<br />

The two membered scheme is the minimal concept for an imitation of organic evolution.<br />

The two principles of mutation <strong>and</strong> selection, which Darwin (1859) recognized to be most<br />

important, are taken as rules for variation of the parameters <strong>and</strong> for ltering during the<br />

iteration sequence respectively.<br />

In the language of biology, the rules are as follows:<br />

Step 0: (Initialization)<br />

A given population consists of two individuals, one parent <strong>and</strong> one descendant.<br />

They are each identi ed by their genotype according to a set of n genes. Only<br />

the parental genotype has to be speci ed as starting point.<br />

Step 1: (Mutation)<br />

The parent E (g) of the generation g produces a descendant N (g) , whose genotype<br />

is slightly di erent from that of the parent. The deviations refer to the<br />

individual genes <strong>and</strong> are r<strong>and</strong>om <strong>and</strong> independent of each other.<br />

Step 2: (Selection)<br />

Because of their di erent genotypes, the two individuals have a di erent capacity<br />

for survival (in the same environment). Only one of them can produce<br />

further descendants in the next generation, namely the one which represents<br />

the higher survival value. It becomes the parent E (g+1) of the generation g +1.<br />

Thus the simplest possible assumptions are made:<br />

The population size remains constant<br />

An individual has in principle an in nitely long life span <strong>and</strong> capacity for producing<br />

descendants (asexually)<br />

No di erence exists between genotype (encoding) <strong>and</strong> phenotype (appearance), or<br />

that one is unambiguously <strong>and</strong> reproducibly associated with the other<br />

Only point mutations occur, independently of each other at all single parameter<br />

locations<br />

The environment <strong>and</strong>thus the criterion of survival is constant over time<br />

This minimal concept takes no accountoftheevolutionary factors familiar to the modern<br />

synthetic evolution theory (e.g., Stebbins, 1968 Cizek <strong>and</strong> Hodanova, 1971 Osche,<br />

1972), such aschromosome mutations, bisexuality, recombination, diploidy, dominance<br />

<strong>and</strong> recessiveness, population size, niching, isolation, migration, etc. Even the concepts<br />

of mutation <strong>and</strong> selection are not applied here with their full biological meaning. Natural<br />

selection does not simply mean the struggle between just two individuals in which the


The Two Membered <strong>Evolution</strong> Strategy 107<br />

better survives, but far more accurately that an individual with more favorable properties<br />

produces on average more descendants than one less well adapted to the environment.<br />

Neither does the present work go more deeply into the connections between cause <strong>and</strong><br />

e ect in the transmission of inherited information, of which somuch has been revealed<br />

by molecular biology. Mutation is used in the widest biological sense as a synonym for<br />

all types of alteration of the substance of inheritance. In his book <strong>Evolution</strong>sstrategie,<br />

Rechenberg (1973) examines in more detail the analogy between natural evolution <strong>and</strong><br />

technical optimization. He compares in particular the biological with the technical parameter<br />

space, <strong>and</strong> interprets mutations as steps in the nucleotide space.<br />

Expressed in mathematical language, the rules are as follows:<br />

Step 0: (Initialization)<br />

There should be storage allocated in a (digital) computer for two points of<br />

an n-dimensional Euclidean space. Each point ischaracterized by a position<br />

vector consisting of a set of n components.<br />

Step 1: (Variation)<br />

Starting from point E (g) , with position vector x (g)<br />

E , in iteration g, a second<br />

point N (g) , with position vector x (g)<br />

N<br />

, is generated, the components x(g)<br />

Ni of<br />

which di er only slightly from the x (g)<br />

Ei. The di erences come about by the<br />

addition of (pseudo) r<strong>and</strong>om numbers z (g)<br />

i ,whicharemutually independent.<br />

Step 2: (Filtering)<br />

The two points or vectors are associated with di erent values of an objective<br />

function F (x). Only one of them serves as a starting point for the new<br />

variation in the next iteration g + 1: namely the one with the better (for<br />

minimization, smaller) value of the objective function.<br />

Taking account of constraints in the form of a barrier penalty function, this algorithm<br />

can be formalized as follows:<br />

Step 0: (Initialization)<br />

De ne x (0)<br />

E = fx (0)<br />

Ei i = 1(1)ng T , such that Gj(x (0)<br />

E ) 0 for all j =1(1)m.<br />

Set g =0.<br />

Step 1: (Mutation)<br />

Construct x (g)<br />

N<br />

x (g)<br />

Ni = x (g)<br />

Ei + z (g)<br />

i<br />

= x(g)<br />

E + z(g) with components<br />

for all i = 1(1)n.<br />

Step 2: (Selection)<br />

Decide<br />

x (g+1)<br />

( (g)<br />

(g)<br />

x N if F (x(g)<br />

N ) F (x(g)<br />

E )<strong>and</strong>Gj(x N ) 0 for all j = 1(1)m<br />

E =<br />

x (g)<br />

E otherwise:<br />

Increase g g + 1 <strong>and</strong> go to step 1 as long as the termination criterion does<br />

not hold.


108 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

The question remains of how tochoose the r<strong>and</strong>om vectors z (g) .Thischoice has the<br />

r^ole of mutation. Mutations are understood nowadays to be r<strong>and</strong>om, purposeless events,<br />

which furthermore only occur very rarely. Ifoneinterprets them, as is done here, as a sum<br />

of many individual events, it is natural to choose a probability distribution according to<br />

which small changes occur frequently, but large ones only rarely (the central limit theorem<br />

of statistics). For discrete variations one can use a binomial distribution, for example, for<br />

continuous variations a Gaussian or normal distribution.<br />

Two requirements then arise together by analogy with natural evolution:<br />

That the expectation value i for a component zi has the value zero<br />

2<br />

That the variance i ,theaverage squared deviation from the mean, is small<br />

The probability density function for normally distributed r<strong>and</strong>om events is (e.g., Heinhold<br />

<strong>and</strong> Gaede, 1972):<br />

w(zi) =<br />

1<br />

p<br />

2 i<br />

exp<br />

!<br />

(5.1)<br />

; (zi ; i) 2<br />

2 2 i<br />

2<br />

If i = 0, one obtains a so-called (0 i ) normal distribution. There are still however a<br />

total of n free parameters f i i= 1(1)ng with which to specify the st<strong>and</strong>ard deviations<br />

of the individual r<strong>and</strong>om components. By analogy with other, deterministic search strategies,<br />

the i can be called step lengths, in the sense that they represent average values of<br />

the lengths of the r<strong>and</strong>om steps.<br />

For the occurrence of a particular r<strong>and</strong>om vector z = fzi i = 1(1)ng, with the<br />

2<br />

independent (0 i ) distributed components zi, the probability density function is<br />

w(z1z2:::zn) =<br />

nY<br />

i=1<br />

w(zi) =<br />

(2 ) n 2<br />

or more compactly, if i = for all i =1(1)n,<br />

w(z) =<br />

1<br />

p2<br />

! n<br />

1<br />

nQ<br />

i=1<br />

exp<br />

i<br />

exp<br />

;zz T<br />

2 2<br />

!<br />

; 1<br />

2<br />

nX<br />

i=1<br />

zi<br />

i<br />

2 !<br />

(5.2)<br />

(5.3)<br />

q Pn<br />

i=1 z 2 i a p 2 distribution is ob-<br />

For the length of the overall r<strong>and</strong>om vector S =<br />

2 tained. The distribution with n degrees of freedom approximates, for large n, to<br />

a( q n ; 1<br />

2 2<br />

) normal distribution. Thus the expectation value for the total length<br />

2<br />

of the r<strong>and</strong>om vector for many variables is E(S) = p n,thevariance is D2 (S) =<br />

E((S ; E(S)) 2 )= 2<br />

, <strong>and</strong> the coe cient ofvariation is<br />

2<br />

D(S)<br />

E(S)<br />

= 1<br />

p 2 n<br />

This means that the most probable value for the length of the r<strong>and</strong>om vector at constant<br />

increases as the square root of the number of variables <strong>and</strong> the relative width of variation<br />

decreases with the reciprocal square root of parameters.


The Two Membered <strong>Evolution</strong> Strategy 109<br />

x 2<br />

Line of equal<br />

probability<br />

density<br />

(g)<br />

x<br />

N<br />

(g)<br />

N<br />

x (g)<br />

E<br />

z (g)<br />

x<br />

E<br />

(g+2)<br />

(g) (g+1)<br />

E = E<br />

z (g+1)<br />

Opt.<br />

N (g+1)<br />

= E (g+2)<br />

x<br />

1<br />

Figure 5.1: Two membered evolution strategy<br />

Contours<br />

F (x) = const.<br />

E : Parent<br />

N : Descendant<br />

(g) : Generation index<br />

The geometric locus of equally likely changes in variation of the variables can be<br />

derived immediately from the probability density function, Equation (5.2). It is an ndimensional<br />

hyperellipsoid (n-fold variance ellipse) with the equation<br />

nX<br />

i=1<br />

zi<br />

i<br />

2<br />

= const:<br />

referred to its center, which is the starting point x (g)<br />

E . In the multidimensional case, the<br />

r<strong>and</strong>om changes can be regarded as a vector ending on the surface of a hyperellipsoid<br />

with the semi-axes i orif i = for all i = 1(1)n, in the language of two dimensions<br />

they are distributed circumferentially. Figure 5.1 serves to illustrate two iteration steps<br />

of the evolution strategy on a two dimensional contour diagram. Whereas in other,<br />

fully deterministic search strategies both the direction <strong>and</strong> length of the search step are<br />

determined in the procedure in a xed manner, or on the basis of previously gathered<br />

information <strong>and</strong> plausible assumptions about the topology of the objective function, in<br />

the evolution strategy the direction is purely r<strong>and</strong>om <strong>and</strong> the step length{except for<br />

a small number of variables{is practically xed. This should be emphasized again to<br />

distinguish this r<strong>and</strong>om method from Monte-Carlo procedures, in which the selected trial<br />

point isalways fully independent of the previous choice <strong>and</strong> its outcome. Darwin (1874)<br />

himself emphasized that the evolution of living things is not a purely r<strong>and</strong>om process. Yet<br />

against his theory of descendancy, a polemic is still waged in which the impossibility is<br />

demonstrated that life could arise by a purely r<strong>and</strong>om process (e.g., Jordan, 1970). Even<br />

at the level of the simplest imitation of organic evolution, a suitable choice of the step<br />

lengths or variances turns out to be of fundamental signi cance.


110 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

5.1.2 The Step Length Control<br />

In experimental optimization, the appropriate step lengths can frequently be predicted.<br />

The values of the variables usually have to be determined exactly at only a few points.<br />

Thus constant values of the variances are often all that is required to complete an extreme<br />

value search. It is a matter of fact that in most experimental applications of the simple<br />

evolution strategy xed (<strong>and</strong> discrete) distributions of mutations have been used.<br />

By contrast, in mathematically formulated problems that are to be solved on a digital<br />

computer, the variables often run over muchofthenumber range of the computer, which<br />

corresponds to many powers of 10. In a numerical optimum search the step lengths<br />

must be continuously modi ed if the algorithm is to be e cient{a problem reminiscent<br />

of steering safely between Scylla <strong>and</strong> Charybdis for if the step length is too small the<br />

search takes an unnecessarily large number of iterations if it is too large, on the other<br />

h<strong>and</strong>, the optimum can only be crudely approached <strong>and</strong> the search caneven get stuck<br />

far from the optimum, for example, if the route to the minimum passes along a narrow<br />

valley. Thus in all optimization strategies the step length control is the most important<br />

part of the algorithm after the recursion formula, <strong>and</strong> it is furthermore closely linked to<br />

the convergence behavior.<br />

The corresponding remarks hold for the evolution strategy, with the following di erence:<br />

In place of a predetermined step length for a parameter of the objective function<br />

there is the variance of the r<strong>and</strong>om changes in this parameter, <strong>and</strong> instead of the statement<br />

that an improvement will or will not be made in a given direction with a speci ed<br />

step length, there can only be a statement of probability of the success or failure for a<br />

chosen variance.<br />

In his theoretical investigations of the two membered evolution strategy, Rechenberg<br />

discovered using two basically di erent model objective functions (sphere model = Problem<br />

1.1, corridor model = Problem 3.8 of the problem catalogue see Appendix A) that<br />

the maximal rate of convergence corresponds to a particular value for the probability ofa<br />

success, i.e., an improvement in the objective function value. He was thus led to formulate<br />

the following rule for controlling the size of the r<strong>and</strong>om changes:<br />

The 1=5 success rule:<br />

From time to time during the optimum search obtain the frequency of successes,<br />

i.e., the ratio of the number of successes to the total number of trials<br />

(mutations). If the ratio is greater than 1=5, increase the variance,ifitisless<br />

than 1=5, decrease the variance.<br />

In many problems this rule proves to be extremely e ective inmaintaining approximately<br />

the highest possible rate of progress towards the optimum. While in the rightangled<br />

corridor model the variances are adjusted once <strong>and</strong> for all in accordance with this<br />

rule <strong>and</strong> subsequently remain constant, in the sphere model they must steadily become<br />

smaller. The question then arises as to how often the success criterion should be tested<br />

<strong>and</strong> by what factor the variances are most e ectively reduced or increased.<br />

This question will be answered with reference to the sphere model introduced by<br />

Rechenberg, since this is the simplest non-linear model objective function <strong>and</strong> requires


The Two Membered <strong>Evolution</strong> Strategy 111<br />

the greatest <strong>and</strong> most frequent changes in the step lengths. The following results of<br />

Rechenberg's theory can be used here:<br />

For the maximum rate of progress<br />

'max = k1<br />

with a common variance 2 ,which is always optimal given by<br />

opt = k2<br />

r<br />

n k1 ' 0:2025 (5.4)<br />

r<br />

n k2 ' 1:224 (5.5)<br />

for all components zi of the r<strong>and</strong>om vector z. In these expressions r is the current distance<br />

from the goal (optimum) <strong>and</strong> n is the number of variables. The rate of progress is de ned<br />

as the expectation value of the radial di erence covered per trial (mutation), as illustrated<br />

in Figure 5.2.<br />

' (g) = r (g) ; r (g+1)<br />

(5.6)<br />

From Equations (5.4) to (5.6) one obtains a relation for the changes in the variance<br />

after a generation (iteration, or mutation) under the condition of maximum convergence<br />

rate<br />

x 2<br />

(g)<br />

ϕ<br />

(g)<br />

r<br />

(g+1)<br />

r<br />

E (g)<br />

(g+1)<br />

E<br />

Line of constant<br />

probability density<br />

x<br />

1<br />

Contours<br />

2 2<br />

F(x) = x + x = const.<br />

1 2<br />

Figure 5.2: The rate of progress for the sphere model


112 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

or after n generations<br />

(g+1)<br />

opt<br />

(g)<br />

opt<br />

(g+n)<br />

opt<br />

(g)<br />

opt<br />

= r(g+1) k1<br />

=1;<br />

r (g) n<br />

=<br />

1 ; k1<br />

! n<br />

n<br />

If n is large compared to one, <strong>and</strong> the formulae derived by Rechenberg are only valid<br />

under this assumption, the step length factor tends to a constant:<br />

lim<br />

n!1<br />

1 ; k1<br />

n<br />

! n<br />

= e ;k1 ' 0:817 ' 1<br />

1:224<br />

The same result is obtained by considering the rate of progress as a di erential quotient<br />

' = dr=dg, in which g represents the iteration number.<br />

This matches the limiting case of very manyvariables because, according to Equation<br />

(5.4) the rate of progress is inversely proportional to the number of variables. The fact<br />

that the rate of progress ' near its maximum is quite insensitive to small changes in the<br />

variances, together with the fact that the probability of success can only be determined<br />

from an average over several mutations, leads to the following more precise formulation<br />

of the 1=5 success rule for numerical optimization:<br />

After every n mutations, check howmany successes have occurred over the<br />

preceding 10 n mutations. If this number is less than 2 n, multiply the step<br />

lengths by thefactor0:85 divide them by 0:85 if more than 2 n successes<br />

occurred.<br />

The 1=5 success rule enables the step lengths or variances of the r<strong>and</strong>om variations<br />

to be controlled. One might doeven better by looking for a control mechanism with<br />

additional di erential <strong>and</strong> integral coe cients to avoid the oscillatory behavior of a mere<br />

proportional feedback. However, the probability of success unfortunately gives no indi-<br />

2<br />

cation of how appropriate are the ratios of the variances i to each other. The step<br />

lengths can only be all reduced together, or all increased. One would sometimes rather<br />

like to build in a scaling of the variables, i.e., to determine ratios of the step lengths to<br />

each other. This can be achieved by a suitable formulation of the objective function, in<br />

which new parameters are introduced in place of the original variables. The functional<br />

dependence can be freely chosen <strong>and</strong> in the simplest case it is given by multiplicative<br />

factors. In the formulation of the numerical procedure for the two membered evolution<br />

strategy (Appendix B, Sect. B.1) the possibility is therefore included of specifying an<br />

initial step length for each individual variable. The ratios of the variances to each other<br />

remain constant during the optimum search, unless speci ed lower bounds to the step<br />

lengths are not operating at the same time.<br />

All digital computers h<strong>and</strong>le data only in the form of a nite number of units of<br />

information (bits). The number of signi cant gures <strong>and</strong> the range of numbers is thereby<br />

limited. If a quantity is repeatedly divided by a factor greater than one, the stored value of


The Two Membered <strong>Evolution</strong> Strategy 113<br />

the quantity eventually becomes zero after a nite number of divisions. Every subsequent<br />

multiplication leaves the value as zero. If it happens to one of the st<strong>and</strong>ard deviations i,<br />

the a ected variable xi remains constant thereafter. The optimization continues only in a<br />

subspace of IR n .To guard against this it must be required that i > 0 for all i = 1(1)n.<br />

The r<strong>and</strong>om changes should furthermore be su ciently large that at least the last stored<br />

place of a variable is altered. There are therefore two requirements:<br />

Lower limits for the \step lengths":<br />

<strong>and</strong><br />

where<br />

"a > 0<br />

1+"b > 1<br />

(g)<br />

i "a for all i =1(1)n<br />

(g)<br />

i "b x (g)<br />

i for all i = 1(1)n<br />

)<br />

according to the computational accuracy<br />

It is thereby ensured that the r<strong>and</strong>om variations are always active <strong>and</strong> the region of the<br />

search stays spanned in all dimensions.<br />

5.1.3 The Convergence Criterion<br />

In experimental optimization it is usually decided heuristically when to terminate the<br />

series of trials: for example, when the trial results indicate that no further signi cant<br />

improvement can be gained. One always has an overall view of how the experiment is<br />

running. In numerical optimization, if the calculations are made by computer, one must<br />

build into the program a rule saying when the iteration sequence is to be terminated. For<br />

this purpose objective, quantitative criteria are needed that refer to the data available at<br />

any time. Sometimes, although not always, one will be concerned to obtain a solution as<br />

exactly as possible, i.e., accurate to the last stored digit. This requirement can relate to<br />

the variables or to the objective function. Remember that the optimum may beaweak<br />

one.<br />

Towards the minimum, the step lengths <strong>and</strong> distances covered normally become<br />

smaller <strong>and</strong> smaller. A frequently used convergence criterion consists of ending the search<br />

when the changes in the variables become zero (in which case no further improvementin<br />

the objective function is made), or when the step lengths have become zero. As a rule one<br />

sets the lower bound not to zero but to a su ciently small, nite value. This procedure<br />

has however one disadvantage that can be serious. Small step lengths occur not only if<br />

the minimum is nearby, but also if the search ismoving through a narrow valley. The<br />

optimization may then be practically halted long before the extreme value being sought is<br />

reached. In Equations (5.4) <strong>and</strong> (5.5), r can equally well be thought of as the local radius<br />

of curvature. Neither ', the distance covered, nor , the step length, are a measure of the<br />

closeness to the optimum. Rather they convey information about the complexity ofthe<br />

minimum problem: the number of variables <strong>and</strong> the narrowness of valleys encountered.<br />

The requirement >"or kx (g) ; x (g;1) k >"for the continuation of the search isthus<br />

no guarantee of su cient convergence.


114 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Gradient methods, which seek a point withvanishing rst derivatives, frequently also<br />

apply this necessary condition for the existence of an extremum as a termination criterion.<br />

Alternatively the search can be continued until 4F = F (x (k;1) ) ; F (x (k) ), the change<br />

in the objective function value in one iteration, goes to zero or to below a prescribed<br />

limit. But this requirement can also be ful lled far from the minimum if the valley in<br />

which the deepest point issought happens to be very at in shape. In this case the<br />

step length control of the two membered evolution strategy ensures that the variances<br />

become larger, <strong>and</strong> thus the function value di erences between two successful trials also<br />

on average become larger. This is guaranteed even if the function values are equal (within<br />

computational accuracy), since a change in the variables is always then registered as a<br />

success. One thus has only to take care that 4F is summed over a number of results<br />

in order to derive a termination criterion. Just as lower bounds are de ned for the step<br />

lengths, an absolute <strong>and</strong> a relative bound can be speci ed here:<br />

Termination rule:<br />

End the search if<br />

or<br />

where<br />

<strong>and</strong><br />

1<br />

"d<br />

"c > 0<br />

1+"d > 1<br />

F (x (g;4g)<br />

E ) ; F (x (g)<br />

E ) "c<br />

h (g;4g)<br />

F (x E ) ; F (x (g)<br />

E ) i<br />

)<br />

4g 20 n<br />

jF (x (g)<br />

E )j<br />

according to the computational accuracy<br />

The condition 4g 20 n is designed to ensure that in the extreme case the st<strong>and</strong>ard<br />

deviations are reduced or increased within the test period by at least the factor<br />

(0:85) 20 ' (25) 1 , in accordance with the 1=5 success rule. This will prevent the search<br />

being terminated only because the variances are forced to change suddenly. It is clear<br />

from Equation (5.4) that the more variables are involved in the problem, the slower is<br />

the rate of progress. Hence it does not make sense to test the convergence criterion very<br />

frequently. A recommended procedure is to make a test every 20 n mutations. Only one<br />

additional function value then needs to be stored.<br />

Another reason can be adduced for linking the termination of the search to the function<br />

value changes. While every success in an optimum search means, in the end, an economic<br />

pro t, every iteration costs computer time <strong>and</strong> thus money. If the costs exceed the pro t,<br />

the optimization may well provide useful information, but it is certainly not on the whole<br />

of any economic value. Thus someone who only wishes to optimize from an economic<br />

point of view can, by a suitable choice of values for the accuracy parameters, restrain the<br />

search process as soon as it starts running into a loss.


The Two Membered <strong>Evolution</strong> Strategy 115<br />

5.1.4 The Treatment of Constraints<br />

Inequality constraints Gj(x) 0 for all j = 1(1)m are quite acceptable. Sign conditions<br />

may be formulated in the same manner <strong>and</strong> do not receive anyspecial treatment. In<br />

contrast to linear <strong>and</strong> non-linear programming, no sign conditions need to be set in order<br />

to keep within a bounded region. If a mutation falls in the forbidden region it is assessed<br />

as a worsening (in the sense of a lethal mutation) <strong>and</strong> the variation of the variables is not<br />

accepted.<br />

No particular penalty function, such asRosenbrock chooses for his method of rotating<br />

coordinates, has been developed for the evolution strategy. The user is free to use the<br />

techniques for example of Carroll (1961), Fiacco <strong>and</strong> McCormick (1968), or B<strong>and</strong>ler <strong>and</strong><br />

Charalambous (1974), to construct a suitable sequence of substitute objective functions<br />

<strong>and</strong> to solve the original constrained problem as a sequence of unconstrained problems.<br />

This, however, can be done outside the procedure.<br />

It is sometimes di cult to specify an allowed initial vector of the variables. If one were<br />

to wait until by chance a mutation satis ed all the constraints, it could take avery long<br />

time. Besides, during this search period the success checks could not be carried out as<br />

described above. It would nevertheless be desirable to apply the normal search algorithm<br />

e ectively to nd an allowed state. Box (1965) has given in the description of his complex<br />

method a simple way of proceeding from a forbidden starting point. He constructs an<br />

auxiliary objective function from the sum of the constraint function values of the violated<br />

constraints:<br />

mX<br />

~F (x) = Gj(x) j(x)<br />

where<br />

j(x) =<br />

j=1<br />

( ;1 if Gj(x) < 0<br />

0 otherwise<br />

(5.7)<br />

Each decrease in the value of ~F(x) represents an approach to the feasible region. When<br />

eventually ~ F(x) = 0, then x satis es all the constraints <strong>and</strong> can serve as a starting vector<br />

for the optimization proper. This procedure can be taken over without modi cation for<br />

the evolution strategy.<br />

5.1.5 Further Details of the Subroutine EVOL<br />

In Appendix B, Section B.1 a complete FORTRAN listing is given of a subroutine corresponding<br />

to the two membered evolution scheme that has been described. Thus no<br />

detailed algorithm will be formulated here, but a few further programming details will be<br />

mentioned.<br />

In nearly all digital computers there are library subroutines for generating uniformly<br />

distributed pseudor<strong>and</strong>om numbers. They work, as a rule, according to the multiplicative<br />

or additive congruence method (see Johnk, 1969 Niederreiter, 1992 Press et al., 1992).<br />

From any two numbers taken at r<strong>and</strong>om from a uniform distribution in the range [0 1], by<br />

using the transformation rules of Box <strong>and</strong> Muller (1958) one can generate two independent,


116 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

normally distributed r<strong>and</strong>om numbers with the expectation values zero <strong>and</strong> the variances<br />

unity. The formulae are<br />

Z<br />

<strong>and</strong><br />

0 q<br />

1 = ;2lnY1 sin (2 Y2)<br />

Z 0 q (5.8)<br />

2 = ;2lnY1 cos (2 Y2)<br />

where the Yi are the uniformly distributed <strong>and</strong> the Z0 i (0 1)-normally distributed r<strong>and</strong>om<br />

numbers respectively. Toobtain a distribution with a variance di erent from unity, the<br />

Z0 i must simply be multiplied by the desired st<strong>and</strong>ard deviation i (the \step length"):<br />

Zi = i Z 0 i<br />

The transformation rules are contained in a function procedure separate from the actual<br />

subroutine. To make use of both Equations (5.8) a switch withtwo settings is de ned,<br />

the condition of which must be preset in the subroutine once <strong>and</strong> for all. In spite of<br />

Neave's (1973) objection to the use of these rules with uniformly distributed r<strong>and</strong>om<br />

numbers that have been generated by amultiplicative congruence method, no signi cant<br />

di erences could be observed in the behavior of the evolution strategy when other r<strong>and</strong>om<br />

generators were used. On the other h<strong>and</strong> the trapezium method of Ahrens <strong>and</strong> Dieter<br />

(1972) is considerably faster.<br />

Most algorithms for parameter optimization include a second termination rule, independent<br />

of the actual convergence criterion. They end the search after no more than a<br />

speci ed number of iterations, in order to avoid an in nite series of iterations in case the<br />

convergence criterion should fail. Such a rule is e ectively a bound on the computation<br />

time. The program libraries of computers usually contain a procedure with which the<br />

CPU time used by the program can be determined. Thus instead of giving a maximum<br />

number of iterations one could specify a maximum computation time as a termination<br />

criterion. In the present program the latter option is adopted. After every n iterations<br />

the elapsed CPU time is checked. As soon as the limit is reached the search ends <strong>and</strong><br />

output of the results can be initiated from the main program.<br />

The 1=5 success rule assumes that there is always some combination of variances<br />

i > 0 with which, on average, at least one improvement can be expected within ve<br />

mutations. In Figure 5.3 two contour diagrams are shown for which the above condition<br />

cannot always be met. At some points the probability of a success cannot exceed 1=5 : for<br />

example, at points where the objective function has discontinuous rst partial derivatives<br />

or at the edge of the allowed region. Especially in the latter case, the selection principle<br />

progressively forces the sequence of iteration points closer up to the boundary <strong>and</strong> the<br />

step lengths are continuously reduced in size, without the optimum being approached<br />

with comparable accuracy.<br />

Even in the corridor model (Problem 3.8 of Appendix A, Sect. A.3) di culties can<br />

arise. In this case the rate of progress <strong>and</strong> probability of success depend on the current<br />

position relative to the edges of the corridor. Whereas the maximum probability of success<br />

in the middle of the corridor is 1=2, at the corners it is only 2 ;n . If one happens to be in the<br />

neighborhood of the edge of the corridor for several mutations, the probability of success


The Two Membered <strong>Evolution</strong> Strategy 117<br />

x 2<br />

To the<br />

optimum<br />

x<br />

1<br />

Circle : line of equal probability density<br />

Bold segment : fraction where success can be scored<br />

x 2<br />

Figure 5.3: Failure of the 1/5 success rule<br />

Forbidden<br />

region<br />

<strong>Optimum</strong><br />

x<br />

1<br />

calculated by the above rule will be very di erent from that associated with the same<br />

step length if an average over the corridor cross section were taken. If now, on the basis<br />

of this low estimate of the success probability, the step length is further reduced, there<br />

is a corresponding decrease in the probability of escaping from the edge of the corridor.<br />

It would therefore be desirable in this special case to average the probability of success<br />

over a longer time period. Opposed to this, however, is the requirement from the sphere<br />

model that the step lengths should be adjusted to the topology as directly as possible.<br />

The present subroutine o ers several means of dealing with the problem. For example,<br />

the lower bounds on the variances (variables EA, EB in the subprogram EVOL) can be<br />

chosen to be relatively large, or the number of mutations (the variable LS) after which<br />

the convergence criterion is tested can be altered by the user. The user has besides a free<br />

choice with regard to the required probability of success (variable LR) <strong>and</strong> the multiplier<br />

of the variance (variable SN). The rate of change of the step lengths, given by the factor<br />

0:85 per n mutations, was xed on the basis of the sphere model. It is not ideal for all<br />

types of problems but rather in the nature of a lower bound. If it seems reasonable to<br />

operate with constant variances, the parameter in question should be set equal to one.<br />

An indication of a suitable choice for the initial step lengths (variable array SM) can be<br />

obtained from Equation (5.4). Since r increases as the root of the number of parameters,<br />

one is led to set<br />

(0)<br />

i<br />

= 4xi<br />

pn<br />

in which 4xi is a rough measure of the expected distance from the optimum. This does<br />

not actually give the optimal step length because r is a kind of local scale of curvature of<br />

the contours of the objective function. However, it does no harm to start with variances<br />

that are too large they will quickly be reduced to a suitable size by the1=5 success rule.<br />

During this transition phase there is still a chance of escaping from the neighborhood<br />

of a merely local optimum but very little chance afterwards. The global convergence


118 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

property (see Rechenberg, 1973) of the evolution strategy can only be proved under the<br />

condition of constant step lengths. With the introduction of the success rule, it is lost,<br />

or to be more precise: the probability of nding the global minimum among several<br />

local minima decreases continuously as a local minimum is approached with continuous<br />

reduction in the step lengths. Rapid convergence <strong>and</strong> reliable global convergence behavior<br />

are two contradictory requirements. They cannot be reconciled if one has absolutely no<br />

knowledge of the topology of the objective function. The 1=5 success rule is aimed at high<br />

convergence rates. If several local optima are expected, it is thus advisable to keep the<br />

(0)<br />

variances large <strong>and</strong> constant, or at least to start with large i <strong>and</strong> perhaps to require a<br />

lower success probability than 1/5. This measure naturally costs extra computation time.<br />

Once one is sure of having located a point near the global extremum, the accuracy can be<br />

improved subsequently in a follow-up computation. For more sophisticated investigations<br />

of the global convergence see Born (1978), Rappl (1984), Scheel (1985), Back, Rudolph,<br />

<strong>and</strong> Schwefel (1993), <strong>and</strong> Beyer (1993).<br />

5.2 A Multimembered <strong>Evolution</strong> Strategy<br />

While the simple, two membered evolution strategy is successful in application to many<br />

optimization problems, it is not a satisfactory method of solving certain types of problem.<br />

As we have seen,by following the 1=5 success rule, the step lengths can be permanently<br />

reduced in size without thereby improving the rate of progress. This phenomenon occurs<br />

frequently if constraints become active during the search, <strong>and</strong> greatly reduce the size of<br />

the success scoring region. A possible remedy would be to alter the probability distribution<br />

of the r<strong>and</strong>om steps in such away astokeep the success probability su ciently<br />

large. To do so the st<strong>and</strong>ard deviations i would have to be individually adjustable.<br />

The contour surfaces of equal probability could then be stretched or contracted along<br />

the coordinate axes into ellipsoids. Further possibilities for adjustment would arise if the<br />

r<strong>and</strong>om components were allowed to depend on each other. For an arbitrary quadratic<br />

problem the rate of convergence of the sphere model could even be achieved if the r<strong>and</strong>om<br />

changes of the individual variables were correlated so as to make the regression line of<br />

the r<strong>and</strong>om vector run parallel to the concentric ellipsoids F (x) =const:, which now lie<br />

at some angle in the space. To put this into practice, information about the topology<br />

of the objective function would have to be gathered <strong>and</strong> analyzed during the optimum<br />

search. This would start to turn the evolution strategy into something resembling one<br />

of the familiar deterministic optimization methods, as Marti (1980) <strong>and</strong> recently again<br />

Ostermeier (1992) have done this is contrary to the line pursued here, which is to apply<br />

biological evolution principles to the numerical solution of optimization problems. Following<br />

Rechenberg's hypothesis, construction of an improved strategy should therefore be<br />

attempted by taking into account further evolution principles.<br />

5.2.1 The Basic Algorithm<br />

When the ground rules of the two membered evolution strategy were formulated in the<br />

language of biology, reference was to one parent <strong>and</strong> one o spring the basic population


A Multimembered <strong>Evolution</strong> Strategy 119<br />

thus consisted of two individuals. In order to reach a higher level of imitation of the<br />

evolutionary process, the number of individuals must be increased. This is precisely the<br />

concept behind the evolution strategy referred to in the following as multimembered. In<br />

his basic work (Rechenberg, 1973), Rechenberg already presented a scheme for a multimembered<br />

evolution. The one considered here is somewhat di erent. It turns out to<br />

be particularly useful with respect to the individual control of several step lengths to be<br />

described later. As yet, however, no detailed comparison of the two variants has been<br />

undertaken.<br />

It is useful to introduce at this point anomenclature for the di erent evolution strategies.<br />

We shall call the number of parents of a generation ,<strong>and</strong>thenumber of descendants<br />

, so that the selection takes place between + = 1+1 = 2 individuals in the two membered<br />

strategy. Wethus characterize the simplest imitation of evolution in abbreviated<br />

notation as the (1+1) strategy. Since the multimembered evolution scheme described by<br />

Rechenberg allows a selection between > 1 parents <strong>and</strong> = 1 o spring it should be<br />

called the ( +1) strategy. Accordingly a more general form, a ( + )evolution strategy,<br />

should be formulated in such away that a basic population of parents of generation g<br />

produces o spring. The process of selection only allows the best of all + individuals<br />

to proceed as parents of the following generation, be they o spring of generation<br />

g or their parents. In this model it could happen that a parent, because of its vitality,<br />

is far superior to the other parents in the same generation, \lives" for a very long time,<br />

<strong>and</strong> continues to produce further o spring. This is at variance to the biological fact of a<br />

limited lifespan, or more precisely a limited capacity for reproduction. Aging phenomena<br />

do not, as far as is known, a ect biological selection (see Savage, 1966 Osche, 1972). As<br />

a further conceptual model, therefore, let us introduce a population in which parents<br />

produce > o spring but the parents are not included in the selection. Rather<br />

the parents of the following generation should be selected only from the o spring. To<br />

preserve a constant population size, we require that each time the best of the o spring<br />

become parents of the following generation. We will refer to this scheme in what follows<br />

as the ( , ) strategy. As for the (1+1) strategy in Section 5.1.1, the algorithm of the<br />

multimembered ( , ) strategy will rst be formulated in the language of biology.<br />

Step 0: (Initialization)<br />

A given population consists of individuals. Each ischaracterized by its<br />

genotype consisting of n genes, which unambiguously determine the vitality,<br />

or tness for survival.<br />

Step 1: (Variation)<br />

Each individual parent produces = o spring on average, so that a total of<br />

new individuals are available. The genotype of a descendant di ers only<br />

slightly from that of its parents. The number of genes, however, remains to<br />

be n in the following, i.e., neither gene duplication nor gene deletion occurs.<br />

Step 2: (Filtering)<br />

Only the best of the o spring become parents of the following generation.<br />

In mathematical notation, taking constraints into account, the rules are as follows:


120 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Step 0: (Initialization)<br />

De ne x (0)<br />

k<br />

x (0)<br />

k<br />

= x (0)<br />

Ek =(x(0)<br />

k1:::x (0)<br />

kn) T for all k = 1(1) :<br />

= x(0) Ek is the vector of the kth parent Ek, suchthat Gj(x (0)<br />

k ) 0 for all k =1(1) <strong>and</strong> all j = 1(1)m:<br />

Set the generation counter g =0:<br />

Step 1: (Mutation)<br />

Generate x (g+1)<br />

`<br />

= x (g+1)<br />

k + z (g + `) <br />

such thatGj(x (g+1)<br />

` ) 0 j =1(1)m ` =1(1)<br />

where k 2 [1 ]<br />

e.g., k =<br />

( if ` = p pinteger<br />

`(mod ) otherwise.<br />

x (g+1)<br />

` = x (g+1)<br />

N`<br />

=(x(g+1) `1 :::x (g+1)<br />

`n ) T is the vector of the `th o spring N`<br />

<strong>and</strong> z (g +`) is a normally distributed r<strong>and</strong>om vector with n components:<br />

Step 2: (Selection)<br />

Sort the x (g+1)<br />

` for all ` = 1(1) so that<br />

F (x (g+1)<br />

`1 ) F (x (g+1)<br />

`2 ) for all `1 = 1(1) `2 = + 1(1)<br />

Assign x (g+2)<br />

k<br />

= x (g+1)<br />

`1 for all k`1 = 1(1) :<br />

Increase the generation counter g g +1:<br />

Go to step 1, unless some termination criterion is ful lled.<br />

What happens in one generation for a (2 , 4) evolution strategy is shown schematically on<br />

the two dimensional contour diagram of a non-linear optimization problem in Figure 5.4.<br />

5.2.2 The Rate of Progress of the (1 , )<strong>Evolution</strong> Strategy<br />

In this section we attempt to obtain approximately the rate of progress of the multimembered,<br />

or ( , ) strategy{at least for = 1. For this purpose the n-dimensional<br />

sphere <strong>and</strong> corridor models, as used by Rechenberg (1973), are employed for calculating<br />

the progress for the (1+1) strategy.<br />

In the two membered evolution strategy ' was the expectation value of the useful<br />

distance covered in each mutation. It is convenient here to de ne the rate in terms of the<br />

number of generations.<br />

' = expectation value k^x ; x (g) k;k^x ; x (g;1) k<br />

where ^x is the vector of the optimum <strong>and</strong> x (g) is the average vector of the parents of<br />

generation g.<br />

From the chosen n-dimensional normal distribution of the r<strong>and</strong>om vector, which has<br />

expectation value zero <strong>and</strong> variance 2 for all independent vector components, the probability<br />

density for going from a point E with vector xE = (xE1:::xEn) T to another


A Multimembered <strong>Evolution</strong> Strategy 121<br />

x 2<br />

Circles : lines of<br />

constant probability<br />

density<br />

(g)<br />

E<br />

1<br />

(g) (g+1)<br />

N = E<br />

2 2<br />

(g)<br />

N<br />

1<br />

(g)<br />

N<br />

4<br />

Opt.<br />

(g) (g+1)<br />

N = E<br />

3 1<br />

(g)<br />

E<br />

2<br />

x<br />

1<br />

E : Parents<br />

k<br />

N : Offspring<br />

Figure 5.4: Multimembered (2 , 4) evolution strategy<br />

point N with vector xN =(xN1:::xNn) T is<br />

w(E ! N) =<br />

1<br />

p2<br />

! n<br />

exp<br />

The distance kxE ; xNk between xE <strong>and</strong> xN is<br />

kxE ; xNk =<br />

vu<br />

u<br />

t nX<br />

i=1<br />

; 1<br />

2 2<br />

nX<br />

i=1<br />

(xEi ; xNi) 2<br />

(g) : Generation index<br />

(xEi ; xNi) 2<br />

!<br />

(5.9)<br />

But of this, only a part, s = f(xExN), is useful in the sense of approaching the objective.<br />

To discover the total probability density forcovering a useful distance s, anintegration<br />

must be performed over the locus of points for which the useful distance is s, measured<br />

from the starting point xE. This locus is the surface of a nite region in n-dimensional<br />

space:<br />

Z Z<br />

p(s) =<br />

w(E ! N) dxN1 dxN2 ::: dxNn (5.10)<br />

f(xExN) =s<br />

The result of the integration depends on the weighting function f(xExN) <strong>and</strong>thus on<br />

the topology of the objective function F (x).<br />

So far only one r<strong>and</strong>om change has been considered. In the multimembered evolution<br />

strategy, however, the average over the best of the o spring must be taken, in which<br />

each of the o spring is to be associated with its own distance s`. We rst have to nd the<br />

probability density w (s 0 ) for the th best descendant of a generation to cover the useful<br />

distance s 0 .Itisacombinatorial product of


122 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

The probability density w(s`1 = s 0 ) that a particular descendant N`1 gets exactly<br />

s 0 closer to the objective<br />

The probability p(s`2 >s 0 ) that a descendant N`2 advances further than s 0<br />

The probability p(s`3 s 0 )<br />

Y<br />

` +1 =1<br />

` +162f`1`2:::` g<br />

; X+2<br />

` 2 =1<br />

`26=`1<br />

(<br />

p(s`2 >s 0 )<br />

X<br />

` =` ;1 +1<br />

` 62f`1`2:::` ;2g<br />

p(s` +1 s 0 )<br />

p(s`3 >s 0 )<br />

(5.11)<br />

w(s 0 )= 1 X<br />

w (s 0 ) (5.12)<br />

' =<br />

Z1<br />

s 0 =su<br />

=1<br />

s 0 w(s 0 ) ds 0<br />

(5.13)<br />

The meaning of su will be described later.<br />

To evaluate ', besides <strong>and</strong> , all components of the position vectors of all parents of<br />

the generation must be given, together with the values of for producing each descendant.<br />

If ' is to become independent of a particular initial con guration, it is necessary to<br />

de ne representative oraverage values of the relative positions of the parents, which are<br />

established during the optimization as a function of the topology. Todosowould require<br />

setting up <strong>and</strong> solving an integral equation. This has not yet been achieved.<br />

To be able to say something nevertheless about the rate of convergence some simplifying<br />

assumptions will be made. All parents will be represented by a single position vector<br />

xk, <strong>and</strong> the st<strong>and</strong>ard deviations `i will be assumed equal for all components i = 1(1)n<br />

<strong>and</strong> for the descendants ` = 1(1) . Equation (5.11) thereby simpli es to<br />

Since<br />

w (s 0 )=<br />

; 1<br />

; 1<br />

!<br />

w(s` = s 0 )[p(s` s 0 )] ;1<br />

p(s` >s 0 )+p(s`


A Multimembered <strong>Evolution</strong> Strategy 123<br />

<strong>and</strong> ; 1<br />

; 1<br />

we have<br />

w (s 0 )=<br />

!<br />

=<br />

( ; 1) !<br />

( ; 1) ! ( ; )!<br />

!<br />

( ; 1)!( ; )! w(s` = s 0 )[p(s`


124 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

askew distribution this is not the case. Perhaps, however, the skewness is only slight, so<br />

that one can determine at least approximately the expectation value from the position of<br />

the maximum.<br />

Before treating the sphere <strong>and</strong> corridor models in this way,wewillcheck the usefulness<br />

of the scheme with an even simpler objective function.<br />

5.2.2.1 The Linear Model (Inclined Plane)<br />

The simplest way the objective function can depend on the variables is linearly. Imagining<br />

the function to be a terrain in the (n + 1)-dimensional space, it appears as an inclined<br />

plane. In the two dimensional projection the contours are straight, parallel lines in this<br />

case. Without loss of generality one can orient the coordinate system so that the plane only<br />

slopes in the direction of one axis x1 <strong>and</strong> the starting point or parent under consideration<br />

lies at the origin (Fig. 5.5).<br />

The useful distance s` towards the objective thatiscovered by descendant N` of the<br />

parent E is just the part of the r<strong>and</strong>om vector z lying along the x1 axis. Since the<br />

components zi of z are independent, we have<br />

<strong>and</strong><br />

p(s`


A Multimembered <strong>Evolution</strong> Strategy 125<br />

s 0 =su<br />

x 2<br />

z<br />

N<br />

E x<br />

E : Parent<br />

1<br />

N : th offspring<br />

s = z ,1<br />

Contours<br />

F (x) = const.<br />

Figure 5.5: The inclined plane model function<br />

To the minimum<br />

sublinearly however, probably proportional to the logarithm of .Tocompare the above<br />

approximation ~' with the exact value ' the following integral must be evaluated:<br />

' =<br />

Z1<br />

s0 p<br />

2<br />

exp ; s02<br />

2 2<br />

! "<br />

0<br />

1<br />

s<br />

1 + erf p<br />

2<br />

2<br />

!#! ;1<br />

ds 0<br />

For small values of the integration can be performed by elementary methods, but<br />

not for general values of . The value of ' was therefore obtained by simulation on the<br />

computer rst for the case in which the parent survives if the best of the descendants is<br />

worse than the parent ('sur with su = 0) <strong>and</strong> secondly for the case in which the parent<br />

is no longer considered in the selection ('ext with su = ;1). The two results are shown<br />

in Figure 5.6 for comparison with the approximate solution ~'. Itisimmediately striking<br />

that for only ve o spring the extinction of the parent has hardly any e ect on the rate<br />

of progress, i.e., for 5 it is as good as certain that at least one of the descendants<br />

will be better than the parent. The greatest di erences between 'sur <strong>and</strong> 'ext naturally<br />

appear when = 1. Whereas 'ext goes to zero, 'sur keeps a nite value. This can be<br />

determined exactly. Omitting here the details of the derivation, which is straightforward,<br />

the result is simply<br />

'sur( =1)=p 2<br />

The relationship to the (1+1) evolution scheme is thereby established. The di erences<br />

between the approximate theory ( ~') <strong>and</strong> the simulation ('ext) indicate that the assumption<br />

of the symmetry of w(s 0 ) is not correct. The discrepancy with regard to '= seems<br />

to tend to a constant value as increases. While the approximate theory is shown by this


126 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

2<br />

1<br />

0<br />

ϕ<br />

1<br />

Rate of progress for σ = 1<br />

ϕ<br />

ext<br />

(λ)<br />

ϕ<br />

sur<br />

(λ)<br />

ϕ ( λ) λ<br />

(1+1)<br />

5 10 15 20 25<br />

Simulation with<br />

“extinction”<br />

Simulation with<br />

“survival”<br />

(1, ) approximate theory<br />

Theory<br />

Number of offspring<br />

Figure 5.6: Rate of progress for the inclined plane model<br />

comparison to be poor for making exact quantitative predictions, it nevertheless correctly<br />

reproduces the qualitative relation between the rate of progress <strong>and</strong> the number of descendants<br />

in a generation. The probability distributions w(s 0 ) are illustrated in Figure 5.7<br />

for ve di erent values of 2f1 3 10 30 100g, according to Equation (5.16).<br />

For the inclined plane model the question of an optimal step length does not arise. The<br />

rate of progress increases linearly with the step length. Another question that does arise,<br />

however, is how tochoose the optimal number of o spring per parent in a generation.<br />

The immediate answer is: the bigger is, the faster the evolution advances. But in<br />

nature, since resources are limited (territory, food, etc.) it is not possible to increase<br />

the number of descendants arbitrarily. Likewise in applications of the strategy to solving<br />

problems on the digital computer, the requirements for computation time impose limits.<br />

The computers in common use today can only work in a serial rather than parallel way.<br />

Thus all the mutations must be produced one after the other, <strong>and</strong> the more descendants<br />

the longer the computation time. We should therefore turn our attention instead to nding<br />

the optimum value of '= . In the case where the parent survives if it is not bettered by<br />

any descendant, we have the trivial solution<br />

opt =1<br />

The corresponding value for the (1 , ) strategy is, however, larger. With Equation (5.17)<br />

λ


A Multimembered <strong>Evolution</strong> Strategy 127<br />

1.0<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

Probability density<br />

w(s’) for = 1<br />

Parameter = number<br />

of offspring<br />

one obtains from the requirement<br />

the relation<br />

= 1<br />

3<br />

−4 −2 0 2 4<br />

Useful distance s’<br />

10<br />

30<br />

100<br />

Figure 5.7: Probability distribution w(s 0 )<br />

@<br />

@<br />

~'<br />

= opt<br />

opt =~' @<br />

@ ~' = opt<br />

!<br />

=0<br />

<strong>and</strong>, by substituting it back in Equation (5.17), the result<br />

opt =1+<br />

s<br />

The value obtained iteratively is<br />

5.2.2.2 The Sphere Model<br />

2 opt<br />

exp<br />

1<br />

2 opt<br />

! 2<br />

=<br />

2<br />

~' 2<br />

0<br />

41+ erf@<br />

1<br />

q<br />

2 opt<br />

opt ' 2:5 (as an integer: opt =2or3)<br />

We willnowtry to calculate the rate of progress for the simple spherically symmetrical<br />

model, which is of importance for considering the convergence rate properties of the<br />

strategy. The contours of the objective function F (x) are concentric hypersphere surfaces,<br />

given for example by<br />

F (x) =<br />

nX<br />

i=1<br />

x 2<br />

i = const:<br />

13<br />

A5


128 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

x 2<br />

r<br />

2<br />

a<br />

a<br />

r 1<br />

2<br />

1<br />

r E<br />

N 2<br />

s >0<br />

2<br />

E<br />

N<br />

Contours<br />

2<br />

F(x) = x + x = const<br />

1 2<br />

1<br />

s


A Multimembered <strong>Evolution</strong> Strategy 129<br />

For the distance covered towards the objective, s`, the portion is now calculated that<br />

contributes to an improvement of the objective function, i.e., in this case the radial difference<br />

s` = rE ; r` (see Fig. 5.8). The locus of all points N` for which s` is the same is<br />

the surface of the n-dimensional hypersphere about the origin with radius r` = rE ; s`.<br />

Accordingly the total probability density that a mutation (index `) starting from point<br />

E will cover the distance s` is the n-fold line integral:<br />

w(s`) =<br />

Z Z<br />

rE ; r` = s`<br />

1<br />

p2<br />

! n<br />

exp ; 1<br />

r2<br />

2 2 ` + r 2<br />

E ; 2 rE x`1 dx`1 :::dx`n<br />

By transforming to spherical coordinates one obtains a simple integral<br />

w(s`) =<br />

1<br />

p2<br />

! n<br />

n;1<br />

2<br />

; n;1<br />

2<br />

exp<br />

; r2 E + r2 `<br />

2 2<br />

!<br />

r n;1<br />

`<br />

2Z<br />

=0<br />

exp rE r` cos<br />

2<br />

The remaining integral can be expressed as a modi ed Bessel function:<br />

w(s`) =<br />

r n<br />

2<br />

`<br />

r1; n<br />

2<br />

E<br />

2<br />

exp<br />

; r2<br />

E<br />

+ r2<br />

`<br />

2 2<br />

!<br />

I n 2 ;1<br />

rE r`<br />

To simplify the notation we nowintroduce the following de nitions:<br />

We thereby obtain<br />

w(s`) = a<br />

rE<br />

= n<br />

2 a= r2 E<br />

2<br />

v= r`<br />

a<br />

av2<br />

; ;<br />

e 2 v e 2 I ;1(av) with s` = rE (1 ; v)<br />

rE<br />

2<br />

sin n;2<br />

In order to use Equation (5.15) to calculate the total probability that the best of<br />

descendants will cover the distance<br />

the following quantities are still required:<br />

with<br />

<strong>and</strong><br />

s 0 =max<br />

` fs` j ` = 1(1) g = rE ; r 0<br />

w(s` = s 0 )= a<br />

r 0<br />

rE<br />

rE<br />

e ; a au2 ; 2 u e 2 I ;1(au)<br />

= u <strong>and</strong> s 0 = rE (1 ; u)<br />

p(s` s 0 )<br />

=1; s0<br />

=1; uR<br />

R<br />

s`=rE<br />

v=0<br />

w(s`) ds`<br />

a av2<br />

; ; ae 2 v e 2 I ;1(av) dv<br />

d


130 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

This nally gives the probability function for the useful distance s0 covered in one generation,<br />

expressed in units of u:<br />

w(s 0 )= a<br />

rE<br />

e ; a au2 ; 2 u e<br />

0<br />

2 I ;1(au) @1 ; ae ; a 2<br />

Zu<br />

v=0<br />

av2 ;<br />

v e 2 I ;1(av) dv<br />

Since the expectation value of this distribution is not readily obtainable, we shall determine<br />

its maximum to give an approximation ~'. From the necessary condition<br />

with the more concise notation<br />

we obtain the relation<br />

=1+ @D(u)<br />

@u u=1; ~'=rE<br />

@w(s 0 )<br />

@s 0<br />

s 0 =~'<br />

!<br />

=0<br />

D(y) =ae ; a ay2<br />

; 2 y e 2 I ;1(ay)<br />

0<br />

B<br />

[D(1 ; ~'=rE)] ;2<br />

@1 ;<br />

Z<br />

1; ~'=rE<br />

v=0<br />

1<br />

C<br />

1<br />

A<br />

;1<br />

D(v) dvA<br />

(5.18)<br />

Except for the upper limit of integration, this is the same integral that made it so di cult<br />

to obtain the exact solution for the rate of progress in the (1+1) evolution strategy (see<br />

Rechenberg, 1973). Under the condition 1<strong>and</strong> =a 1, which means for many<br />

variables <strong>and</strong> at a large enough distance from the optimum, Rechenberg obtained an<br />

estimate by exp<strong>and</strong>ing Debye's asymptotic series representation of the Bessel function<br />

(e.g., Jahnke-Emde-Losch, 1966) in powers of =a. Without giving here the individual<br />

steps in the derivation, the result is<br />

Z1<br />

D(v) dv ' 1<br />

"<br />

1 ; erf<br />

2<br />

!#<br />

p +<br />

2 a 2<br />

p<br />

a<br />

p 2<br />

v=0<br />

"<br />

exp<br />

; ( ; 1)2<br />

!<br />

; exp<br />

2 a<br />

2<br />

;<br />

2 a<br />

!#<br />

(5.19)<br />

It is clear from Equation (5.4) that the rate of progress of the (1+1) strategy for the<br />

two membered evolution varies inversely as the number of variables. Even if a higher<br />

convergence rate is expected from the multimembered scheme, with many descendants<br />

per parent, there will be no change in the relation to n, thenumber of parameters. In<br />

addition to the assumptions already made regarding <strong>and</strong> =a, without further risk to<br />

the validity of the approximate theory we can assume that 1 ; ~'=rE ' 1. Equation (5.19)<br />

can now also be applied here.<br />

For the partial di erential<br />

@D(u)<br />

@u u=1; ~'=rE<br />

we obtain with the use of the Debye series again:<br />

@D(u)<br />

@u u=1; ~'=rE<br />

= D(1 ; ~'=rE)<br />

"<br />

a exp<br />

a (1 ; ~'=rE)<br />

!<br />

1<br />

+<br />

1 ; ~'=rE<br />

#<br />

; a (1 ; ~'=rE)


A Multimembered <strong>Evolution</strong> Strategy 131<br />

Figure 5.9: Rate of progress for the sphere model


132 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

If the result is substituted into Equation (5.18) a longer expression is obtained, of the<br />

form:<br />

= (~' rEn)<br />

In the expectation of an end result similar to Equation (5.4) <strong>and</strong> since a particular starting<br />

point rE is of no interest, we will introduce new variables:<br />

' = ~'n<br />

rE<br />

<strong>and</strong> = n<br />

If ~' <strong>and</strong> are now replaced by ' <strong>and</strong> , taking the limit<br />

lim<br />

n!1<br />

(' rEn)<br />

we nd that the quantities n <strong>and</strong> rE disappear from the parameter list of . ' <strong>and</strong><br />

can therefore be regarded as \universal" variables. We obtain<br />

= (' )=1+ p ! 2<br />

! 3 2 "<br />

'<br />

p2 + p exp 4 '<br />

p2 + p 5 1 + erf<br />

8<br />

8<br />

!#<br />

p<br />

8<br />

(5.20)<br />

As in the case of the inclined plane considered previously, this equation cannot be<br />

simply solved for ' . Figure 5.9 shows the family of curves ' = ' ( ).<br />

For ! 0, as expected, ' ! 0. For = 1, the rate of progress is always negative.<br />

Since the parent in the (1 , ) strategy is not included in the selection after it has served<br />

to produce a descendant, = 1 means that every mutation is adopted, whether better<br />

or worse. For the sphere model, except for = 0, the region of success is always smaller<br />

than half of the variable space. With increasing , the ratio becomes even worse ' is<br />

thus always 0, <strong>and</strong> more strongly negative the greater is .<br />

For 2 the rate of progress increases at rst as a function of the variance, reaches a<br />

maximum, <strong>and</strong> then decreases continuously until it becomes negative. From this behavior<br />

one can see even more clearly than in the (1+1) strategy how important the correct choice<br />

of variance is for the optimization.<br />

In the (1 , ) strategy, the progress can turn retrograde if all the o spring are worse<br />

than the parent that produced them. Only with an immortal parent having an in nite<br />

capacity for reproduction would progress be guaranteed or, at least, would retrogression<br />

be ruled out. We shall see later why the model with \extinction" is nevertheless advantageous.<br />

Except for small values of , the maximum rate of progress is almost the same<br />

in the \survival" <strong>and</strong> \extinction" cases. So if the optimal variance can be maintained<br />

throughout, leaving the parents out of the selection is not a disadvantage.<br />

The position of the maxima of ' with respect to at a constant is obtained by<br />

simple di erentiation <strong>and</strong> equating the partial derivative to zero. De ning<br />

the equation is<br />

opt<br />

p =<br />

8 + <strong>and</strong> 'max p<br />

2 opt<br />

rE<br />

= ' +<br />

+ (' + + + ) exp(; +2 )+ p ( + ; ' + ) 1<br />

2 +('+ + + ) 2<br />

1 + erf( + )<br />

!<br />

=0 (5.21)


A Multimembered <strong>Evolution</strong> Strategy 133<br />

2.0<br />

1.0<br />

1<br />

Maximal universal rate<br />

of progress<br />

ϕ ( σ )<br />

max opt<br />

(1+1) - theory<br />

Numer of offspring<br />

5 10 15 20 25 30<br />

Figure 5.10: Maximal rate of progress for the sphere model<br />

Points on the curve ' max = ' ( = opt) can only be obtained iteratively. To<br />

express = (' max), the non-linear system of equations consisting of Equations (5.20)<br />

<strong>and</strong> (5.21) must be solved. The results as obtained with the multimembered evolution<br />

strategy are shown in Figure 5.10. A convenient formula can only be obtained by assuming<br />

' + ' + i.e., 2 ' max ' 2<br />

opt<br />

This estimate is actually not far wrong, since the second term in Equation (5.21) goes to<br />

zero. We thus nd<br />

' 1+ q 'max exp('max ) 1 + erf 1 q<br />

'max (5.22)<br />

2<br />

a relation with comparable structure to the result for the inclined plane.<br />

Finally we ask whether ' max= has a maximum, as in the inclined plane case. If the<br />

parent can survive the o spring, opt = 1 here too if not the condition<br />

p 1<br />

opt =2<br />

2 +('+ + + ) 2 exp[(' + + + ) 2 ][1 + erf( + )] ' +<br />

must be added to Equations (5.20) <strong>and</strong> (5.21). The solution, obtained iteratively, is:<br />

opt ' 4:7 (as an integer: opt =5)<br />

λ<br />

(5.23)


134 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Both the (1 , ) <strong>and</strong> (1+ )schemes were run on the computer for the sphere model,<br />

with n = 100rE = 100, <strong>and</strong> variable . In each case' was evaluated over 10 000<br />

generations. The resulting data are shown in terms of ' <strong>and</strong> in Figure 5.9. In<br />

comparison with the approximate theory, deviations are apparent mainly for > opt .<br />

The skewness of the probability distribution w(s 0 ) <strong>and</strong> the error in the estimate of the<br />

integral R D(y) dy have only a weak e ect in the region of greatest interest, where the<br />

rate of progress is maximum. Furthermore, the results of the simulation fall closer to<br />

the approximate theory if n is taken to be greater than 100 however, the computation<br />

time then becomes excessive. For large values of the possible survival of the parent<br />

only becomes noticeable when the variance is too large to allow rapid convergence. The<br />

greatest di erences, as expected, appear for =1.<br />

On the whole we see that the theory worked out here gives at least a qualitative<br />

account ofthebehavior of the (1 , ) strategy. Amuch more elegant method yielding an<br />

even better approximation may be found in Back, Rudolph, <strong>and</strong> Schwefel (1993), or Beyer<br />

(1993, 1994a,b).<br />

5.2.2.3 The Corridor Model<br />

As a third <strong>and</strong> last model objective function, we will now consider the right-angled corridor.<br />

The contours of F (x) in the two dimensional picture (Fig. 5.11) are straight <strong>and</strong><br />

parallel, but not necessarily equidistant.<br />

F (x) =c0 +<br />

For the sake of simplifying the calculation we will again give the coordinate system a<br />

particular position <strong>and</strong> orientation with c1 = ;1 ci = 0 for all i = 2 3:::n: The<br />

right-angled corridor (Problem 2.37, see Appendix A, Sect. A.2){we are using here three<br />

dimensional concepts for the essentially n-dimensional case{is de ned by constraints of<br />

the form<br />

Gj(x) =jxjj b for j = 2(1)n<br />

It has the width 2 b for all coordinate directions xi i = 2(1)n hence the cross section<br />

(2 b) n;1 . As a starting point, the position xE of the parent E, wechoose the origin with<br />

respect to x1 = 0. The useful part of a r<strong>and</strong>om step is just its component z1 in the<br />

x1 direction, which is the negative gradient direction. The formulae for w(s` = s 0 ) <strong>and</strong><br />

p(s`


A Multimembered <strong>Evolution</strong> Strategy 135<br />

2b<br />

N 2<br />

N 3<br />

s 2 < 0<br />

0<br />

x 2<br />

E<br />

s 1 > 0<br />

N 1<br />

Line of equal probability density<br />

Figure 5.11: Corridor model function<br />

= 1<br />

"<br />

erf<br />

2<br />

!<br />

b ; xEi<br />

p + erf<br />

2<br />

!#<br />

b + xEi<br />

p<br />

2<br />

Contours F(x) = const.<br />

Downwards<br />

x 1<br />

Allowed region<br />

Forbidden region<br />

That is, the probability depends on the current position xEi of the starting point E. We<br />

can only construct an average value for all possible situations if we know the probability<br />

pa of certain situations occurring. It could well be that, during the minimum search,<br />

positions near the border are occupied less often than others. The same problem of<br />

nding the occupation probability pa has arisen already in the theoretical treatment of<br />

the (1+1) strategy. Rechenberg (1973) discovered that<br />

pa = 1<br />

2 b (with respect to one of the variables xi i= 2(1)n)<br />

which is a constant independent of the current values of the variables. We will assume<br />

that this also holds here. Thus the average probability that one of the n ; 1 constrained<br />

variables will remain within the corridor can be given as:<br />

= 1<br />

4 b<br />

~p(jx`ij b) =<br />

Zb<br />

xEi=;b<br />

"<br />

erf<br />

Zb<br />

xEi=;b<br />

!<br />

b ; xEi<br />

p + erf<br />

2<br />

pa p(jx`ij b) dxEi<br />

!#<br />

b + xEi<br />

p dxEi<br />

2


136 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Making use of the relation (see Ryshik <strong>and</strong> Gradstein, 1963)<br />

one nally obtains<br />

Zp<br />

y=0<br />

erf( y) dy = p erf( p)+ exp(; 2 p2 ) ; 1<br />

p<br />

~p(jx`ij b) = erf<br />

p !<br />

2 b<br />

+ 1 p<br />

2 b<br />

In the following we refer to this expression as item v.<br />

v =~p(jx`ij b)<br />

"<br />

exp<br />

; 2 b2<br />

2<br />

!<br />

#<br />

; 1<br />

(5.24)<br />

With the above de nition of v, the total probability that a descendant N` is feasible, i.e.,<br />

that it satis es all the constraints, is<br />

pf eas =<br />

<strong>and</strong> the probability that N` is lethal is<br />

nY<br />

i=2<br />

= v n;1<br />

~p(jx`ij b)<br />

pleth =1; pf eas =1; v n;1<br />

Only non-lethal mutants come into consideration as parents of the next generation. Hence,<br />

instead of w(s` = s0 )wemust insert into Equation (5.15) the expression<br />

w(s` = s 0 ) pf eas = 1<br />

p 2<br />

<strong>and</strong> instead of p(s`


A Multimembered <strong>Evolution</strong> Strategy 137<br />

outcome would be extinction of the population <strong>and</strong> the rate of progress would no longer<br />

be de ned. The probability of extinction of the population is given by the product of the<br />

lethal probabilities:<br />

pstop =(1; v n;1 )<br />

To be able to optimize further in such situations let us adopt the following procedure: If<br />

all the mutations lead to forbidden points, the parent willsurvive <strong>and</strong> produce another<br />

generation of descendants. Thus for this generation the rate of progress takes the value<br />

zero. Equation (5.25) then only holds for s 0 6= 0 <strong>and</strong> we must reformulate the probability<br />

of advancing by s 0 in one generation as follows:<br />

where<br />

w(s 0 )= ~w(s 0 )+ pstop<br />

=<br />

( 0 if s 0 6=0<br />

1 if s 0 =0<br />

The distribution w(s 0 ) is no longer continuous, <strong>and</strong> even if w 0 (s 0 ) is symmetricwe cannot<br />

assume that the maximum of the distribution is a useful approximation to the average<br />

rate of progress (Fig. 5.12). The following condition must be satis ed:<br />

Z1<br />

s 0 =;1<br />

w(s 0 ) ds 0 =<br />

Z1<br />

s 0 =;1<br />

~w(s 0 ) ds 0 + wstop =1 (5.26)<br />

We can think of w(s 0 ) as a superposition of two density distributions, with conditional<br />

mathematical expectation values<br />

<strong>and</strong><br />

<strong>and</strong> with associated frequencies<br />

<strong>and</strong><br />

p1 =<br />

'1 =<br />

Z1<br />

s 0 =;1<br />

Z1<br />

s 0 =;1<br />

'2 =0<br />

s 0 ~w(s 0 ) ds 0<br />

~w(s 0 ) ds 0 =1; pstop<br />

p2 = pstop<br />

The events belonging to the two density distributions are mutually exclusive<strong>and</strong>by virtue<br />

of Equation (5.26) they make up together a complete set of events. The expectation value<br />

is then given by (e.g., Gnedenko, 1970 Sweschnikow ,1970).<br />

' =<br />

Z1<br />

s 0 =;1<br />

s 0 w(s 0 ) ds 0 = '1 p1 + '2 p2 = '1 (1 ; pstop)


138 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

w(s’)<br />

Probability density<br />

p<br />

stop<br />

s’ = 0<br />

w(s’= ~ 0)<br />

w(s’)<br />

s’<br />

Useful distance covered<br />

Figure 5.12: Estimation of the rate of progress from the probability density for<br />

the corridor model<br />

Since we are unable to calculate '1 directly, we make an approximation:<br />

taking for ^' the position of the maximum of ~w(s 0 ).<br />

We require<br />

~' =^' (1 ; pstop) = ^'[1 ; (1 ; v n;1 ) ] (5.27)<br />

@ ~w(s 0 )<br />

@s 0<br />

s 0 =^'<br />

!<br />

=0<br />

By di erentiating Equation (5.25) <strong>and</strong> setting the rst derivative to zero:<br />

=1+ p ^'<br />

p2<br />

exp<br />

^' 2<br />

2 2<br />

!"<br />

1+ erf<br />

^'<br />

p2<br />

!<br />

+2(v 1;n ; 1)<br />

#<br />

(5.28)<br />

Apart from an extra term, this formula is similar to the relation = (~' ) found for<br />

the inclined plane (Equation (5.17)). The main di erence here, however, is that in place<br />

of ~' ^' appears, as de ned by Equation (5.27).<br />

As in the case of the sphere model, we willintroduce here \universal parameters"<br />

' = ~'n<br />

b<br />

<strong>and</strong> = n<br />

b<br />

<strong>and</strong> take the limit n !1in order to arrive at a practically useful relation = (' ).


A Multimembered <strong>Evolution</strong> Strategy 139<br />

With the new quantities ' <strong>and</strong> , Equation (5.24) for v becomes<br />

p !<br />

2 n<br />

v = erf ; p<br />

2<br />

"<br />

1 ; exp<br />

n<br />

; 2 n2<br />

!#<br />

2<br />

Since the argument of the error function increases as n, thenumber of variables, the<br />

approximation<br />

erf(y) ' 1 ; 1<br />

p exp (;y<br />

y 2 )<br />

can be used to give<br />

<strong>and</strong> with<br />

nally<br />

v =1; n p 2<br />

lim<br />

n!1<br />

1+ 1<br />

n<br />

v 1;n =exp<br />

The desired relation = (' )isthus<br />

=1+<br />

p ~'<br />

p 2<br />

exp<br />

2<br />

4<br />

~'<br />

p2<br />

in which, from Equation (5.27),<br />

~' =<br />

! 2 3<br />

5<br />

"<br />

erf<br />

n<br />

p 2<br />

~'<br />

p2<br />

'<br />

for n 1<br />

= e<br />

!<br />

!<br />

1 ; h 1 ; exp ;p 2<br />

+ 2 exp<br />

i<br />

p 2<br />

!<br />

#<br />

; 1<br />

(5.29)<br />

Pairs of values obtained iteratively are shown in Figure 5.13 together with simulation<br />

results for the cases of \survival" <strong>and</strong> \extinction" of the parent (n = 100 b = 100,<br />

average over 10 000 successful generations).<br />

As in the case of the sphere model, the deviations can be attributed to the simplifying<br />

assumptions made in deriving the approximate theory. For = 1' is always zero if<br />

the parent is not included in the selection. The transition to the inclined plane model is<br />

correctly reproduced in this respect. Negative rates of progress cannot occur.<br />

The position of the maxima ' max = ' ( = opt) at constant are obtained in the<br />

same way as for the sphere model. The condition to be added to Equation (5.29) is<br />

c exp (; + )[1; exp (; + )] ;1 ; 1<br />

h erf(' + )+2exp( + ) ; 1 ih 1+2' +2 i + 2<br />

p ' + exp (;' +2 )<br />

in which the following new quantities are introduced again for compactness:<br />

+<br />

!<br />

!<br />

+2exp( + ) ! =0<br />

(5.30)


140 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Figure 5.13: Rate of progress for the corridor model<br />

j<br />

j


A Multimembered <strong>Evolution</strong> Strategy 141<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

1<br />

Maximal universal rate<br />

of progress<br />

ϕ * ( σ * )<br />

max opt<br />

(1+1) - theory<br />

Number of descendants<br />

10 20 30 40 50 λ 60<br />

Figure 5.14: Maximal rate of progress for the corridor model<br />

+<br />

=<br />

opt<br />

p<br />

2<br />

' + =<br />

'max p<br />

2 opt c<br />

"<br />

c = 1 ; 1 ; exp<br />

; opt<br />

p<br />

2<br />

Pairs of values found by iteration are shown in Figure 5.13. Figure 5.14 shows ' max<br />

versus .To determine opt for the (1 , ) strategy, i.e., the value of for which ' max= is<br />

a maximum, it is necessary to solve the system of three non-linear equations, comprising<br />

Equation (5.29), Equation (5.30), <strong>and</strong><br />

The result is<br />

opt = ' + np exp (' +2 )[ erf (' + ) + 2 exp ( + ) ; 1][1+2' +2 ]+2' + o<br />

!#<br />

( opt<br />

c [1 ; exp (; + )] ln[1 ; exp(; + )]+ 1<br />

opt ' 6:0 (as an integer: opt =6)<br />

)<br />

(5.31)


142 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

5.2.3 The Step Length Control<br />

How should one proceed in order to still achieve the maximum rate of progress, i.e.,<br />

2<br />

to maintain the optimum variances i i = 1(1)n, for the case of the multimembered<br />

evolution scheme? For the (1+1) strategy this aim was met by the1=5 success rule,<br />

which was based on the probability of success at maximum convergence rate of the sphere<br />

<strong>and</strong> corridor model functions. Such control from outside the actual mutation-selection<br />

game does not correspond to the biological paradigm. It should rather be assumed that<br />

the step lengths, or more precisely the variances, have adapted <strong>and</strong> are still adapting to<br />

circumstances arising in the course of natural evolution. Although the environmentally<br />

induced rate of mutation cannot be interfered with directly, the existence of mutator<br />

genes <strong>and</strong> repair enzymes strongly suggests that the consequences of such environmental<br />

in uences are always reduced to the appropriate level. In the multimembered evolution<br />

strategy the fact that the observed rates of mutation are also small, indeed that they<br />

must be small to be optimal, comes out of the universal rate of progress <strong>and</strong> st<strong>and</strong>ard<br />

deviation introduced above, which require to be inversely proportional to the number<br />

of variables, as in the (1+1) strategy.<br />

If we wish to imitate organic evolution, we can proceed as follows. Besides the variables<br />

xEi i= 1(1)n, a set of parameters Ei i = 1(1)n, is assigned to a parent E. These<br />

describe the variances of the r<strong>and</strong>om changes. Each descendant N` of the parent E should<br />

di er from it both in x`i <strong>and</strong> `i. The changes in the variances should also be r<strong>and</strong>om<br />

<strong>and</strong> small, <strong>and</strong> the most probable case should be that there is no change at all. Whether<br />

a descendant can become a parent of the next generation depends on its vitality, thus<br />

only on its x`i. Whichvalues of the variables it represents depends, however, not only<br />

on the xEi of the parent, but also on the st<strong>and</strong>ard deviations `i, whichaect the size of<br />

the changes zi = x`i ; xEi. In this way the \step lengths" also play an indirect r^ole in<br />

the selection mechanism.<br />

The highest possible probability that a descendant is better than the parent is normally<br />

wemax =0:5<br />

It is attained in the inclined plane case, for example, <strong>and</strong> for other model functions in the<br />

limit of in nitely small step lengths. In order to prevent that a reduction of the i always<br />

gives rise to a selection advantage, must be at least 2. But the optimal step lengths<br />

can only take e ect if<br />

> 1<br />

weopt<br />

This means that on average at least one descendant represents an improvement of the<br />

value of the objective function. The number of descendants per parent thus plays a<br />

decisive r^ole in the multimembered scheme, just as does the check on the success ratio in<br />

the two membered evolution scheme. For comparison let us tabulate here the opt of the<br />

(1 , ) strategy <strong>and</strong> weopt of the (1+1) strategy for the three model functions considered.<br />

The values of weopt are taken from the work of Rechenberg (1973).


A Multimembered <strong>Evolution</strong> Strategy 143<br />

Model function<br />

Inclined plane<br />

Sphere<br />

weopt<br />

1<br />

2<br />

0:27<br />

1<br />

weopt 2<br />

3.7<br />

opt<br />

2.5<br />

4.7<br />

Corridor<br />

1<br />

2e 5.4 6.0<br />

How should the step lengths now be altered? We shall rst consider only a single<br />

variance 2 for changes in all the variables. In the production of the r<strong>and</strong>om changes,<br />

the st<strong>and</strong>ard deviation is always a positive factor. It is therefore reasonable to generate<br />

new step lengths from the old by amultiplicative rather than additive process, according<br />

to the scheme<br />

(g)<br />

N = (g)<br />

E Z (g)<br />

(5.32)<br />

The median of the r<strong>and</strong>om distribution for the quantity Z must equal one to satisfy the<br />

condition that there is no deterministic drift without selection. Furthermore an increase<br />

of the step length should occur with the same frequency as a decrease more precisely,the<br />

probability of occurrence of a particular r<strong>and</strong>om value must be the same as that of its reciprocal.<br />

The third requirement is that small changes should occur more often than large<br />

ones. All three requirements are satis ed by the log-normal distribution. R<strong>and</strong>om quantities<br />

obeying this distribution are obtained from (0 2 ) normally distributed numbers Y<br />

by the process<br />

Z = e Y (5.33)<br />

The probability distribution for Z is then<br />

w(z) = 1<br />

p 2<br />

1<br />

z exp<br />

; (ln z)2<br />

2 2<br />

!<br />

The next question concerns the choice of ,<strong>and</strong>we shall answer it, in the same way as<br />

for the (1+1) strategy, with reference to the rate of change of step lengths that maintains<br />

the maximum rate of progress in the sphere model. Regarding ' as a di erential quotient<br />

;dr=dg leads to the relation (see Sect. 5.1.2)<br />

(g+1)<br />

opt<br />

(g)<br />

opt<br />

=exp ; ' max<br />

n<br />

(5.34)<br />

for the optimal step lengths of two consecutive generations, where ' max now has a different,<br />

larger value that depends on <strong>and</strong> . The actual size of the average changes in<br />

the variances, using the proposed mutation scheme based on Equations (5.32) <strong>and</strong> (5.33),<br />

depends on the topology of the objective function <strong>and</strong> the number of parents <strong>and</strong> descendants.<br />

If n, thenumber of variables, is large, the optimal variance will only change<br />

slightly from generation to generation. We will therefore assume that the selection in<br />

any generation is more or less indi erent to reductions <strong>and</strong> increases in the step length.<br />

We thereby obtain the multiplicative change in the r<strong>and</strong>om quantity X, averaged over n<br />

generations:<br />

X =<br />

0<br />

@ nY<br />

g=1<br />

Z (g)<br />

1<br />

A<br />

1<br />

n<br />

= exp<br />

0<br />

@ 1<br />

n<br />

nX<br />

g=1<br />

Y (g)<br />

1<br />

A


144 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Since the Y (g) are all (0 2 ) normally distributed, it follows from the addition theorem<br />

of the normal distribution (Heinhold <strong>and</strong> Gaede, 1972) that<br />

1<br />

n<br />

nX<br />

g=1<br />

isa(0 2 =n) normally distributed r<strong>and</strong>om quantity. Accordingly, the two quantities<br />

exp( = p n)arecharacteristic of the average changes (minus sign for reduction) in the<br />

step lengths per generation. The median of w(z) isofcoursejuste 0 =1.Together with<br />

Y (g)<br />

Equation (5.34), our observation leads us to the requirement<br />

or<br />

exp ' max<br />

n<br />

' exp<br />

' 'max p<br />

n<br />

p n<br />

!<br />

(5.35)<br />

The variance 2 of the normally distributed r<strong>and</strong>om numbers Y , from which the lognormally<br />

distributed r<strong>and</strong>om multipliers for the st<strong>and</strong>ard deviations (\step sizes") of the<br />

changes in the object variables are produced, thus must vary inversely as the number of<br />

variables. Its actual value should depend on the expected rate of convergence ' <strong>and</strong><br />

hence on the choice of the number of descendants .<br />

Instead of only one common strategy parameter ,each individual can now have a<br />

complete set of n di erent i i = 1(1)n, for every alteration in the corresponding n<br />

object variables xi i=1(1)n. The two following schemes can be envisioned:<br />

or<br />

(g) (g) (g)<br />

Ni = Ei Z i<br />

(g)<br />

Ni = (g)<br />

Ei Z (g)<br />

i<br />

Z (g)<br />

0<br />

(5.36)<br />

(5.37)<br />

But only the second one should be taken into further consideration, because otherwise in<br />

the case of n 1theaverage overall step size of the o spring<br />

sN =<br />

vu<br />

u<br />

t nX<br />

i=1<br />

2<br />

Ni<br />

could not be substantially di erent from that of its parent<br />

sE =<br />

vu<br />

ut nX<br />

due to the levelling e ect of the many r<strong>and</strong>ommultiplication events (law of large number<br />

of events). In order to split the mutation e ects to the overall step size <strong>and</strong> the individual<br />

step sizes one could choose<br />

0 '<br />

'<br />

i=1<br />

2<br />

Ei<br />

p<br />

'<br />

for Z0 (5.38)<br />

2 n<br />

' p p for all Zi i= 1(1)n (5.39)<br />

2 n


A Multimembered <strong>Evolution</strong> Strategy 145<br />

We shall not go into further details since another kind of individual step length control<br />

will o er itself later, i.e., recombination.<br />

At this point afurtherword should be said about the alternative (1+ )or(1, )<br />

strategies. Let us assume that by virtue of a jump l<strong>and</strong>ing far from the expectation value,<br />

a descendant has made a very large <strong>and</strong> useful step towards the optimum, thus becoming<br />

a parent of the next generation. While the variance allocated to it was eminently suitable<br />

for the preceding situation, it is not suited to the new one, being in general much too<br />

big. The probability that one of the new descendants will be successful is thereby low.<br />

Because the (1+ ) strategy permits no worsening of the objective function value, the<br />

parent survives{<strong>and</strong> may do so for many generations. This increases the probability ofa<br />

successful mutation still having a poorly adapted step length. In the (1 , ) strategy such<br />

a stray member will indeed also turn up in a generation, but it will be in e ect revoked in<br />

the following generation. The descendant that regresses the least survives <strong>and</strong> is therefore<br />

probably the one that most reduces the variance. The scheme thus has better adaptation<br />

properties with regard to the step length. In fact this phenomenon can be observed in the<br />

simulation. Since we have seen that for 5 the maximum rate of progress is practically<br />

independent of whether or not the parent survives, we should favor a ( , ) strategy, at<br />

least when = is not chosen to be very small, e.g., less than 5 or 6.<br />

5.2.4 The Convergence Criterion for > 1 Parents<br />

In Section 5.2.2 wewere really looking for the rate of progress of a ( , )evolution method.<br />

Because of the analytical di culties, however, we had to fall back onthe = 1 case,<br />

with only one parent. We shall now proceed again on the assumption that > 1. In<br />

each generation state vectors xE <strong>and</strong> associated step lengths are stored, which should<br />

always be the best of the mutants of the previous generation. We naturally require<br />

more storage space for doing this on the computer, but on the other h<strong>and</strong> we havemore suitable values at our disposal for each variable. Supposing that the topology of the<br />

objective function is complicated or even \pathological," <strong>and</strong> an individual reaches a point<br />

that is unfavorable to further progress, we still have su cient alternative starting points,<br />

which mayeven be much more favorable. According to the usefulness of their parameter<br />

sets, some parents place more mutants in the prime group of descendants than others.<br />

In general the best individuals of a generation will di er with respect to their variable<br />

vectors <strong>and</strong> objective function values as long as the optimum has not been reached. This<br />

provides us with a simple convergence criterion.<br />

From the population of<br />

function value:<br />

parents Ek k = 1(1) ,welet Fb be the best objective<br />

Fb = min fF (x<br />

k (g)<br />

k )k= 1(1) g<br />

<strong>and</strong> Fw the worst<br />

Fw = max<br />

k<br />

Then for ending the search we require that either<br />

fF (x(g)<br />

k )k= 1(1) g<br />

Fw ; Fb "c


146 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

or<br />

"d<br />

(Fw ; Fb)<br />

where "c <strong>and</strong> "d are to be de ned such that<br />

"c > 0<br />

1+"d > 1<br />

)<br />

X<br />

k=1<br />

F (x (g)<br />

k )<br />

according to the computational accuracy<br />

Either absolutely or relatively, the objective function values of the parents in a generation<br />

must fall closely together before convergence is accepted. The reason for basing the<br />

criterion on function values, rather than variable values or step lengths, has already been<br />

discussed in connection with the (1+1) strategy (see Sect. 5.1.3).<br />

5.2.5 Scaling of the Variables by Recombination<br />

The ( , ) method opens up the possibility of imitating a further principle of organic<br />

evolution, which is of particular interest from the point ofviewofnumerical optimization<br />

problems, namely sexual propagation. By combining the genes of twoparents a new source<br />

of variation is added to point mutation. The fact that only a few primitive organisms do<br />

without this mechanism of recombination leads us to expect that it is very favorable for<br />

evolution. Instead of one vector x (g)<br />

E now there are distinct vectors x (g)<br />

k for k = 1(1)<br />

in a population. In biology, the totality of all genes in a generation is known as a gene<br />

pool. Among the concerns of population genetics (e.g., Wilson <strong>and</strong> Bossert, 1973) is the<br />

frequency distribution of certain alleles in a population, the so-called gene frequencies.<br />

Until now, we did not argue on that level of detail, nor did we godown to the oor of only<br />

four nucleic acids in order to model, for example, the mutation process within evolution<br />

strategies. This might beworthwhile for quaternary optimization, but not in our case of<br />

continuous parameters. It would be a tedious task to model all the intermediate processes<br />

from nucleic acids to proteins, cell, organs, etc., taking into account the genetic code <strong>and</strong><br />

the whole epigenetic apparatus. We shall now apply the principle of recombination to<br />

numerical optimization with continuous parameters, once again in a simpli ed fashion.<br />

In our population of parents we have stored di erent values of each component<br />

xi i = 1(1)n. From this gene pool we now draw one of the values of xi for each<br />

i = 1(1)n. The draw should be r<strong>and</strong>om so that the probability that an xi comes from any<br />

particular parent(k) of the is just 1= for all k = 1(1) . The variable vector constructed<br />

in this way forms the starting point for the subsequent variation of the components. The<br />

Figure 5.15 should help to clarify that kind of global recombination.<br />

By imitating recombination in this way wehave, so as to speak, replaced bisexuality<br />

by multisexuality. This was less for reasons of principle than as a result of practical<br />

considerations of programming. A crude test yielded only a slight further increase in<br />

the rate of progress in changing from the bisexual to the multisexual scheme, whereas<br />

appreciable acceleration was achieved by introducing the bisexual in place of the asexual<br />

scheme, which allowed no recombination. A more detailed <strong>and</strong> exact comparison has yet<br />

to be carried out. Without some guidance from theory it is hard to choose the correct<br />

initial step lengths <strong>and</strong> rates of change of step lengths for each of the di erent algorithms.


A Multimembered <strong>Evolution</strong> Strategy 147<br />

Parents of<br />

generation g<br />

Descendants<br />

x x x . . .<br />

x<br />

1 2 3 n<br />

Discrete global recombination<br />

Figure 5.15: Scheme of global uniform recombination<br />

Recombination<br />

by choosing<br />

components<br />

columnwise<br />

<strong>and</strong> at r<strong>and</strong>om<br />

This is, however, the only way to arrive atquantitative statements, free from confusing<br />

side e ects.<br />

It is thus hard to explain the origin of the accelerating e ect of recombination. It<br />

may, for example, lie in the fact that instead of di erent starting points, the bisexual<br />

scheme o ers<br />

2 + (<br />

n;2 X<br />

; 1) 2i possible combinations in the case of n variables. With multirecombination, as chosen<br />

here, there are as many as n ,which is far more than could be put into e ect. A more<br />

detailed investigation may be found in Back (1994a).<br />

So far we have only considered recombination of the object variables, but the strategy<br />

variables, the step lengths, can be recombined in just the same way. Even if all the parents<br />

start with equal i = for all i = 1(1)n, <strong>and</strong> if all the step length components are varied<br />

by a common r<strong>and</strong>om factor in the production of descendants, the variances i of all the<br />

individuals for each i = 1(1)n di er from each other in the subsequent generations.<br />

Thus by recombination is it possible for the step lengths to adapt individually in<br />

this way to circumstances. A better combination a ords a higher chance of survival to<br />

its bearer. It can therefore be expected that in the course of the optimum search, the<br />

currently best combination of the f i i= 1(1)ng prevails{the one that is associated with<br />

the fastest rate of progress. In attempting to verify this in a practical test, an unpleasant<br />

phenomenon occurs. It can happen that one of the st<strong>and</strong>ard deviations i is suddenly<br />

i=1


148 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

(e.g., by a r<strong>and</strong>om value very far from the expectation value) so much reduced in size that<br />

the associated variable xi can now hardly be changed. The total change in the vector x is<br />

then roughly speaking within an (n ; 1)-dimensional subspace of IR n .Contrary to what<br />

one might hope, that such a descendant would have less chance of surviving than others,<br />

it turns out that the survival of such a descendant is actually favored. The reason is that<br />

the rate of progress with an optimal step length is proportional to 1=n. If the number<br />

of variables n decreases, the rate of convergence, together with the optimal step length,<br />

increases. The optimum search therefore only proceeds in a subspace of IR n . Not until<br />

the only improvement in the objective function entails changing the variable that has<br />

hitherto been omitted from the variation will the mutation-selection mechanism operate<br />

to increase its associated variance <strong>and</strong> so restore it to the range for which noticeable<br />

changes are possible.<br />

The minimum search proceeds by jumps in the value of the objective function <strong>and</strong><br />

with rates of progress that vary alternately above <strong>and</strong> below whatwould otherwise be<br />

smooth convergence. Such unstable behavior is most pronounced when , the number<br />

of parents, is small. With su ciently large the reserve of step length combinations<br />

in the gene pool is always big enough to avoid overadaptation, or to compensate for it<br />

quickly. From an experimental study (Schwefel, 1987) the conclusion could be drawn<br />

that punctuated equilibrium evolution (Gould <strong>and</strong> Eldredge, 1977, 1993) can be avoided<br />

by using a su ciently large population ( >1) <strong>and</strong> a su ciently low selection pressure<br />

( = ' 7). A further improvement can be made by using as the starting point inthe<br />

variation of the step lengths the currentaverage of two parents' variances, rather than the<br />

value from only one or the other parent. This measure too has its biological justi cation<br />

it represents an imitation of what is called intermediary recombination (instead of discrete<br />

recombination).<br />

In this context chromosome mutations should be very e ective, those in which for<br />

example, the positions of two individual step lengths are exchanged. As well as the haploid<br />

scheme of inheritance on which the presentwork is based, some forms of life also exhibit the<br />

diploid scheme. In this case each individual stores two sets of variable values. Whilst the<br />

formation of the phenotype only makes use of one allele, the production of o spring brings<br />

both alleles into the gene pool. If both alleles are the same one speaks of homozygosity,<br />

otherwise of heterozygosity. Heterozygote alleles enlarge the set of variants in the gene<br />

pool <strong>and</strong> thus the range of possible combinations. With regard to the stability of the<br />

evolutionary process this also appears to be advantageous. The true gain made by diploidy<br />

only becomes apparent, however, when the additional evolutionary factors of recessiveness<br />

<strong>and</strong> dominance are included. For multiple criteria optimization, the usefulness of this<br />

concept has been demonstrated by Kursawe (1991, 1992). Many possible extensions of<br />

the multimembered scheme have yet to be put into practice. To nd their theoretical<br />

e ect on the rate of progress, one would rst have to construct a theory of the ( , )<br />

strategy for > 1. If one goes beyond the = 1 scheme followed here, signi cant<br />

di erences between approximate theory <strong>and</strong> simulation results arise for >1 because of<br />

the greater asymmetry of the probability distribution w(s 0 ).


A Multimembered <strong>Evolution</strong> Strategy 149<br />

5.2.6 Global Convergence<br />

In our discussion of deterministic optimization methods (Chap. 3) we haveestablished that only simultaneous strategies are capable of locating with certainty global minima<br />

of arbitrary objective functions. The computational cost of their application increases<br />

with the volume of the space under consideration <strong>and</strong> thus with the power of n. The<br />

dynamic programming technique of Bellman allows the reliabilityofglobalconvergence to<br />

be maintained at less cost, but only if the objective function has a rather special structure,<br />

such that only a part of the space IR n needs to be investigated. Of the stochastic search<br />

procedures, the Monte-Carlo method has the best chance of global convergence it o ers a<br />

high probability rather than certainty of nding the global optimum. If one requires a 90%<br />

probability, its cost is greater than that of the equidistant grid search. However, the (1+1)<br />

evolution strategy can also be credited with a nite probability of global convergence if the<br />

step lengths (variances) of the r<strong>and</strong>om changes are held constant (see Rechenberg, 1973<br />

Born, 1978 Beyer, 1989, 1990). How great the chance is of nding an absolute minimum<br />

among several local minima depends on the topology, in particular on the disposition <strong>and</strong><br />

\width" of the minima.<br />

If the user wishes to realize the possibility of a jump from a local to a global extremum,<br />

it requires a trial of patience. The requirement of approaching an optimum as quickly <strong>and</strong><br />

as accurately as possible is always diametrically opposed to maintaining the reliability of<br />

global convergence. In the formulation of the algorithms of the evolution strategies we<br />

have mainly strived to satisfy the rst requirement of rapid convergence, by adaptation<br />

of the step lengths. Thus for both strategies no claims can be made for good global<br />

convergence properties.<br />

With > 1 in the multimembered evolution scheme, several state vectors x (g)<br />

k 2<br />

IR n k = 1(1) are stored in each generation g. If the x (g)<br />

k are very di erent, the<br />

probability is greater that at least one point is situated near the global optimum <strong>and</strong> that<br />

the others will approach it in the process of generation. The likelihood of this is less if<br />

the x (g)<br />

k fall close together, with the associated reduction in the step lengths. It always<br />

remains nite, however, <strong>and</strong> increases with ,thenumber of parents. This advantage<br />

over the (1+1) strategy is best exploited if one starts the search with initial vectors x (0)<br />

k<br />

roughly evenly distributed over the whole region of interest, <strong>and</strong> chooses fairly large initial<br />

(0)<br />

values of the st<strong>and</strong>ard deviations k 2 IRn k = 1(1) . Here too the ( )scheme is<br />

preferable to the ( + ) because concentration at a locally very favorable position is at<br />

least delayed.<br />

5.2.7 Program Details of the ( + ) ES Subroutines<br />

Appendix A, Section A.2 contains FORTRAN listings of the multimembered ( + )<br />

evolution strategy developed here, with the alternatives<br />

GRUP without recombination<br />

REKO with recombination (intermediary recombination for the step lengths)<br />

KORR the so far most general form with correlated mutations as well as ve<br />

di erent recombination types (see Chap. 7)


150 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

In the choice of (number of parents) <strong>and</strong> (number of descendants) there is no<br />

need to ensure that is exactly divisible by . The association of descendants to parents<br />

is made by a r<strong>and</strong>om selection of uniformly distributed r<strong>and</strong>om integers from the range<br />

[1 ]. It is only necessary that exceeds by a su cient margin that on average at least<br />

one descendant can be better than its parent. From the results of Section 5.2.3 a suitable<br />

choice would be for example 6 .<br />

The transformation from [0 1]evenly distributed r<strong>and</strong>om numbers to (0 2 ) normally<br />

distributed pseudor<strong>and</strong>om numbers is carried out in the same way as in subroutine EVOL<br />

of the (1+1) strategy (see Sect. 5.1.5). The log-normally distributed variance multipliers<br />

are produced by the exponential function. The step lengths (st<strong>and</strong>ard deviations of the<br />

individual r<strong>and</strong>om components) can initially be speci ed individually. During the subsequent<br />

process of generation they satisfy the constraints<br />

(g)<br />

i<br />

(g)<br />

i<br />

"a<br />

)<br />

<strong>and</strong> "b jx (g)<br />

where<br />

<strong>and</strong><br />

for all i = 1(1)n<br />

i j<br />

)<br />

"a > 0<br />

according to the computational accuracy<br />

1+"b > 1<br />

can be speci ed in advance.<br />

The parameter which in uences the average rate of change of the step lengths<br />

should be given a value roughly proportional to 1= p nincaseoftwofactors (the case<br />

to be preferred), a global <strong>and</strong> an individual one, the values given in Section 5.2.3 are<br />

recommended. The constant of proportionality depends mainly on another adjustable<br />

feature, = , whichmaybe called the selection pressure. For a (10 , 100) strategy it should<br />

be set at about unity to allow the fastest convergence of simple optimization problems like<br />

the hypersphere. With increasing this value ' can be changed sublinearly according<br />

to<br />

p '<br />

' e<br />

(compare Equation (5.22)).<br />

(0)<br />

If the initial step lengths i are chosen to be too large, what may havebeen an<br />

especially well situated starting point x (0) can be thrown away. Nevertheless, this step<br />

backwards in the rst generation works in favor of reaching a global minimum among<br />

several local minima. In principle, for > 1eachofthedi erent starting vectors<br />

x (0)<br />

k 2 IRn <strong>and</strong> (0)<br />

k 2 IRn k = 1(1) can be speci ed. In the present program this<br />

di erentiation of the parent generation is carried out automatically the x (0)<br />

k are produced<br />

from x (0) by addition of (0 ( (0) ) 2 ) normally distributed r<strong>and</strong>om vectors. The (0) (0)<br />

k =<br />

are initially equal for all parents.<br />

The convergence criterion is described in Section 5.2.4. It is based on the di erence<br />

in objective function values between the current best <strong>and</strong> worst parents of a generation.<br />

As accuracy parameters, an absolute <strong>and</strong> a relativequantity ("c <strong>and</strong> "d) must be speci ed<br />

(compare Sect. 5.1.3). Furthermore, an upper bound on the computation time for the<br />

search can be given so that whatever the outcome results can be output from the main<br />

program (see also Sect. 5.1.5).<br />

Inequality constraints are treated as described for subroutine EVOL (Sect. 5.1.4) so


Genetic Algorithms 151<br />

too is the case of the starting point x (0) lying outside the feasible region.<br />

Whereas the subroutine GRUP with option REKO has been taken into account in<br />

the test series of Chapter 6, this is not so for the third version KORR, which was created<br />

later (Schwefel, 1974). Still, more often than any multimembered version, the (1+1)<br />

strategy has been used in practice. Nonetheless it has proved its usefulness in several<br />

applications: for example, in conjunction with a linearization method for minimizing<br />

quadratic functions in surface tting problems (Plaschko <strong>and</strong>Wagner, 1973). In this case<br />

the evolution process provides useful approximate values that enable the deterministic<br />

method to converge. It should also serve to locate the global minimum of the multimodal<br />

objective function. Another practically oriented multiparameter case was to nd the<br />

optimum weight disposition of lightweight rigidly jointed frameworks (Ho er, Ley ner,<br />

<strong>and</strong> Wiedemann, 1973 Ley ner, 1974). Here again the evolution strategy is combined<br />

with another method, this time the simplex method of linear programming. Each strategy<br />

is applied in turn until the possible improvements remaining at a step are very small.<br />

The usefulness of this procedure is demonstrated by checking against known solutions.<br />

A third example is provided by Hartmann (1974), who seeks the optimal geometry of<br />

a statically loaded shell support. He parameterizes the functional optimization problem<br />

by assuming that the shape of the cross section of the cylindrical shell is described by a<br />

suitable polynomial. Its coe cients are to be determined such that the largest absolute<br />

value of the transverse moment is as small as possible. For various cases of loading,<br />

Hartmann nds optimal shell geometries di ering considerably from the shape of circular<br />

cylinders, with sometimes almost vanishingly small transverse moments. More examples<br />

are mentioned in Chapter 7.<br />

5.3 Genetic Algorithms<br />

At almost the same time that evolution strategies (ESs) were developed <strong>and</strong> used at the<br />

Technical University of Berlin, two other lines of evolutionary algorithms (EAs) emerged<br />

in the U.S.A., all independently of each other. One of them, evolutionary programming<br />

(EP), was mentioned at the end of Chapter 4 <strong>and</strong> goes back to the workofL.J.Fogel<br />

(1962 see also Fogel, Owens, <strong>and</strong> Walsh, 1965, 1966a,b). For a long time, activity on this<br />

front seemed to have become quiet. However, in 1992 a series of yearly conferences was<br />

started by D.B.Fogel <strong>and</strong> others (Fogel <strong>and</strong> Atmar, 1992, 1993 Sebald <strong>and</strong> Fogel, 1994)<br />

to disseminate recent results on the theory <strong>and</strong> applications of EP. Since EP uses concepts<br />

that are rather similar to either ESs or genetic algorithms (GAs) (Fogel, 1991, 1992), it<br />

will not be described in detail here, nor will it be compared to ESs on the basis of test<br />

results. This was done in a paper presented at the second EP conference (Back, Rudolph,<br />

<strong>and</strong> Schwefel, 1993). Similarly, contributions to comparing ESs <strong>and</strong> GAs in detail may<br />

be found in Ho meister <strong>and</strong> Back (1990, 1991, 1992 see also Back, Ho meister, <strong>and</strong><br />

Schwefel, 1991 Back <strong>and</strong>Schwefel, 1993).<br />

The third line of EAs mentioned above, genetic algorithms, has become rather popular<br />

today <strong>and</strong> di ers from the others in several aspects. This approach will be explained in<br />

the following according to its classical (also called canonical) form.<br />

Even to attentive scientists, GAs did not become apparent before 1975 when the rst


152 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

book of Holl<strong>and</strong> (1975) <strong>and</strong> the dissertation of De Jong (1975) were published. Thus this<br />

work was unknown in Europe at the time when Rechenberg's <strong>and</strong> the author's dissertations<br />

were completed <strong>and</strong>, later on, published as books. Only 10 years later, however, in<br />

1985, a series of biennial conferences (ICGA, International Conferences on Genetic Algorithms)<br />

has been started (Grefenstette, 1985, 1987 Scha er, 1989 Belew <strong>and</strong> Booker,<br />

1991 Forrest, 1993) to bring together those who are interested in the theory or application<br />

of GAs. On the Eastern side of the Atlantic, a similar revival of the eld began in<br />

1990 with the rst conference on parallel problem solving from nature (PPSN) (Schwefel<br />

<strong>and</strong> Manner, 1991 Manner <strong>and</strong> M<strong>and</strong>erick, 1992 Davidor, Schwefel, <strong>and</strong> Manner, 1994).<br />

During the PPSN 90 <strong>and</strong> the ICGA 91 events, proponents of GAs <strong>and</strong> ESs agreed upon<br />

the common denominators evolutionary algorithms (EAs) for both approaches as well as<br />

evolutionary computation (EC) for a new international journal (see De Jong, 1993). The<br />

latter term has been adopted among others by the Institute of Electrical <strong>and</strong> Electronics<br />

Engineers (IEEE) for an international conference during the 1994 World congress on computational<br />

intelligence (WCCI). Surveys of the history have been attempted by De Jong<br />

<strong>and</strong> Spears (1993) <strong>and</strong> Spears et al. (1993). As forerunners of the genetic simulation,<br />

Fraser (1957), Friedberg (1958), <strong>and</strong> Hollstien (1971) should at least be mentioned here.<br />

5.3.1 The Canonical Genetic Algorithm for Parameter<br />

Optimization<br />

Even if the originators of the GA approach emphasized that GAs were designed for general<br />

adaptation processes, most applications reported up to now concern numerical optimization<br />

by means of digital computers, including discrete as well as combinatorial optimization.<br />

Books by Ackley (1987), Goldberg (1989), Davis (1987, 1991), Davidor (1990),<br />

Rawlins (1991), Michalewicz (1992, 1994), Stender (1993), <strong>and</strong> Whitley (1993) may serve<br />

as sources for more details in this eld. As for so-called classi er systems (CS see Holl<strong>and</strong><br />

et al., 1986) <strong>and</strong> genetic programming (GP see Koza, 1992), two very interesting special<br />

areas of evolutionary computation{in which GAsplay an important r^ole in searching<br />

for production rules in so-called knowledge-based systems <strong>and</strong> for correct expressions in<br />

computer programs, respectively{the reader must be referred to the relevant <strong>and</strong>vast<br />

literature (Al<strong>and</strong>er, 1994 he compiled more than 3,000 references).<br />

The GA for parameter optimization usually has been presented in the following general<br />

form:<br />

Step 0: (Initialization)<br />

Agiven population consists of individuals. Each ischaracterized by its<br />

genotype consisting of n genes, which determine the vitality, or tnessfor<br />

survival. Each individual's genotype is represented by a (binary) bit string,<br />

representing the object parameter values either directly or by means of an<br />

encoding scheme.<br />

Step 1: (Selection)<br />

Two parents are chosen with probabilities proportional to their relative position<br />

in the current population, either measured by their contribution to the


Genetic Algorithms 153<br />

mean objective function value of the generation (proportional selection) orby<br />

their rank (e.g., linear ranking selection).<br />

Step 2: (Recombination)<br />

Two di erent preliminary o spring are produced by recombination of two<br />

parental genotypes by means of crossover at a given recombination probability<br />

pc only one of those o spring (at r<strong>and</strong>om) is actually taken into further<br />

consideration.<br />

Steps 1 <strong>and</strong> 2 are repeated until individuals represent the (next) generation.<br />

Step 3: (Mutation)<br />

The o spring eventually (with a given xed <strong>and</strong> small probability pm) underly<br />

further modi cation by means of point mutations working on individual bits,<br />

either by reversing a one to a zero, or vice versa or by throwing a dice for<br />

choosing a zero or a one, independent of the original value.<br />

At rst glance, this scheme looks very similar to that of a multimembered ES with<br />

discrete recombination. To reveal the di erences one has to take a closer look at the<br />

so-called operators, \selection (S)", \mutation (M)", <strong>and</strong> \recombination (R)." The GA<br />

sequence of events, i.e., S { R { M, as opposed to M { R { S within ESs, should not matter<br />

signi cantly since the whole process is a circular one, <strong>and</strong> whether one likes to reverse the<br />

order of mutation <strong>and</strong> recombination is a matter of avoiding unnecessary operations or<br />

not. In applications, the evaluation of the individuals with respect to their corresponding<br />

objective function values normally dominates all other operations. Canonical values for<br />

the recombination probability arepc =0:6, for the number of crossover points nc =2,<br />

<strong>and</strong> for the mutation probability pm =0:001.<br />

5.3.2 Representation of Individuals<br />

One of the most apparent di erences between GAs <strong>and</strong> ESs is the fact, that completely<br />

di erent representations of the object variables are used. Organic evolution uses four<br />

di erent nucleotides to encode the genotype in pairs of triplets. By means of the genetic<br />

code these are translated to 20 di erent amino acids. Since there are 4 3 = 64 di erent<br />

triplets, the genetic code is largely redundant. A closer look reveals its property of<br />

maintaining similarity on the amino acid level despite most of the small variations on the<br />

level of single nucleotides. Similar transmission laws between chains of amino acids <strong>and</strong><br />

proteins, proteins <strong>and</strong> higher aggregates like cells <strong>and</strong> organs, up to the overall phenotype<br />

are called the epigenetic apparatus (Riedl, 1976). As a matter of fact, biologists as<br />

well as behaviorists report that di erences among several children of the same parents as<br />

well as di erences between two consecutive generations can well be described by normal<br />

distributions with zero mean <strong>and</strong> characteristic, probably genetically coded, variances.<br />

That is why ESs, when used for seeking optimal values for continuous variables use the<br />

more aggregate model of normal distributions for mutations <strong>and</strong> discrete or intermediary<br />

recombination as described in Sections 5.1 <strong>and</strong> 5.2.<br />

GAs, however, rely on binary representations of the object variables. One might call<br />

this genotypic modelling of the variation process, instead of phenotypic modelling as is


154 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

practiced in ESs <strong>and</strong> EP. An important linkbetween both levels, i.e., the genetic code<br />

as well as the so-called epigenetic apparatus, is neglected at least in the canonical GA.<br />

For dealing with integer or real values on the level of the object variables GAs make use<br />

of a normal Boolean representation or they use the so-called Gray code. Both, however,<br />

present the di culty of so-called Hamming cli s. Depending on its position, a single<br />

bit reversal thus can lead to small or very large changes on the phenotypic level. This<br />

important fact has advantages <strong>and</strong> disadvantages. The advantage lies in the broad range<br />

of di erent phenotypes available in a GA population at the same time, a matter a ecting<br />

its global convergence reliability (for a thorough convergence analysis of the canonical GA<br />

see Rudolph, 1994a). The corresponding disadvantage stems from the other side of the<br />

same coin, i.e., the inability to focus the search e ort in a close enough vicinity of the<br />

current positions of individuals in one generation.<br />

There is a second reason to cling to binary representations of object variables within<br />

GAs, i.e., Holl<strong>and</strong>'s schema theorem (Holl<strong>and</strong>, 1975, 1992). This theorem tries to assure<br />

exponential penetration of the population by individuals with above average tness under<br />

proportional selection, with su ciently higher reproduction rates for better individuals,<br />

one point crossover with xed crossover probability, <strong>and</strong> small, xed mutation rates.<br />

If, at some time, especially when starting the search, the population contains the<br />

globally optimal solution, this will persist in the case where there are zero probabilities<br />

for mutation <strong>and</strong> recombination. Mutation, according to the theorem, is an always destructive<br />

force <strong>and</strong> thus called a subordinate operator. It only serves to introduce missing<br />

or reintroduce lost correct bits into nite populations. Recombination (here, one point<br />

crossover) mayormay not be destructive, depending on whether the crossover point happens<br />

to lie within a so-called building block, i.e., a short substring of the bit string that<br />

contributes to above-average tness of one of the mating individuals, or not. Building<br />

blocks are especially important in case of decomposable objective functions (for a more<br />

detailed description see Goldberg, 1989).<br />

GAs in their original form do not permit the h<strong>and</strong>ling of implicit inequality or equality<br />

constraints. On the other h<strong>and</strong>, explicit upper <strong>and</strong> lower bounds have tobeprovided for<br />

the range of the object variables:<br />

ui xi vi for all i = 1(1)n<br />

in order to have a basis for the binary decoding <strong>and</strong> encoding process, e.g.,<br />

xi = ui + vi ; ui<br />

2 l ; 1<br />

lX<br />

j=1<br />

aij 2 j;1<br />

where aij for j = 1(1)l represents the bit string segment of length l for encoding the ith<br />

element of the object variable vector x.<br />

Instead of this Boolean mapping one also may choose the Gray code, which has the<br />

property that neighboring values for the xi di er in one bit position only. Looking for<br />

the probability distribution p( xi) of phenotypic changes xi from one generation to the<br />

next at a given position x (0)<br />

i <strong>and</strong> a given mutation probability pm shows that changing<br />

the code from Boolean to Gray only shifts, but never avoids, the so-called Hamming


Genetic Algorithms 155<br />

p( ∆x)<br />

1<br />

1e-5<br />

1e-10<br />

1e-15<br />

-6 -4 -2 0 2 4 6 8 10 ∆x<br />

p( ∆x)<br />

1<br />

1e-5<br />

1e-10<br />

1e-15<br />

-6 -4 -2 0 2 4 6 8 10 ∆x<br />

Figure 5.16: Probability distributions for GA mutations / left: normal binary<br />

code right: Gray code<br />

cli s. As Figure 5.16 clearly shows for a one dimensional case with x (0) =5,l = 4, <strong>and</strong><br />

pm =0:001, the expectation values for changes x are di erent from zero in both cases,<br />

<strong>and</strong> the distribution is in no case unimodal.<br />

5.3.3 Recombination <strong>and</strong> Mutation<br />

Innovation during evolutionary processes occurs in two di erent ways, for so-called higher<br />

organisms at least. Only the most early <strong>and</strong> primitive species operate asexually. People<br />

have often said that GAs can do their work without mutations, which, according to<br />

the schema theorem, always hamper the adaptation or optimization process, <strong>and</strong> that,<br />

on the other h<strong>and</strong>, ESs can do their work without recombination. The latter is not<br />

true if self-adaptation of the individual mutation variances <strong>and</strong> covariancesistowork<br />

properly (see Schwefel, 1987), whereas the former conjecture has been disproved by Back<br />

(1993, 1994a,b). For a GA the probability of containing the correct bits for the global<br />

solution, dispersed over its r<strong>and</strong>om start population, is 1 ; L 2 ; ,whichmay be close<br />

enough to 1 for = 50 as population size <strong>and</strong> L = 1000 as length of the bit string<br />

(actually it is 0:999999999999) however, it cannot be guaranteed that those bits will not<br />

get lost in the course of generations. Whether this happens or not, largely depends on<br />

the problem structure, the phenomenon being called deception (e.g., Whitley, 1991 Page<br />

<strong>and</strong> Richardson, 1992).<br />

If one looks for recombination e ects within GAs on the level of phenotypes, one<br />

stumbles over the fact that a recombined o spring of two parents that are close together<br />

in the phenotype space may largely deviate from both parental positions there. This


156 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Table 5.1: Two point crossover within a GA <strong>and</strong> its e ect on the<br />

phenotypes<br />

Bit strings Phenotype<br />

Parent 1 0111 1100 7 12<br />

Parent 2 1000 1011 8 11<br />

Two point crossover<br />

O spring 1 0000 1000 0 8<br />

O spring 2 1111 1111 15 15<br />

completely contradicts the proverbial saying that the apple never falls far from the tree.<br />

Table 5.1 shows a simple situation with two parents producing two o spring by means of<br />

two point crossover, on a bit string of length 8, <strong>and</strong> encoding two phenotypic variables<br />

in the range [0 15]in the st<strong>and</strong>ard Boolean form. Neither discrete nor intermediary<br />

recombination within ESs can be that disruptive intermediary recombination always<br />

delivers phenotypic values for the o spring between those of their parents. The assumption<br />

that mutations are not necessary for the GA process may even stem from that disruptive<br />

character of recombination that permits crossover points not only at the boundaries of<br />

meaningful parental information but also within the genes themselves.<br />

ESs obey the general rule, that mutations are undirected, by means of using normally<br />

distributed changes with zero mean{even in the case of correlated mutations. That this<br />

is not so for GAs can easily be seen from Figure 5.16. Without selection, the GA process<br />

thus provides biased genetic drift, depending on the actual situation.<br />

Table 5.2 presents the probability transition matrix for one phenotypic integer variable<br />

xi in the range [0 3]encoded by means of two bitsonly.Let<br />

p = pm<br />

1<br />

2<br />

single bit inversion probability <strong>and</strong><br />

q =1; pm probability of not inverting the bit<br />

From Table 5.2 it is obvious that among all possible transitions (except for those with-<br />

Table 5.2: Transition probabilities for mutations within a GA<br />

xi new<br />

Genotype 00 01 10 11<br />

Phenotype 0 1 2 3<br />

00 0 q 2 pq pq p 2<br />

xi old 01 1 pq q 2 p 2 pq<br />

10 2 pq p 2 q 2 pq<br />

11 3 p 2 pq pq q 2


Genetic Algorithms 157<br />

out any change) between the four di erent genetic states 00 01 10 11 (e.g., phenotypes<br />

0 1 2 3), those from 01 to 10 <strong>and</strong> from 10 to 01 are the most improbable ones despite their<br />

phenotypic vicinity. Let pm = 10 ;3 then q 2 = 0:998001pq = 0:000999 <strong>and</strong> p 2 =<br />

0:000001:<br />

5.3.4 Reproduction <strong>and</strong> Selection<br />

Whether selection is the rst or last operator in the generation loop of EAs should not<br />

matter except for the rst iteration. The di erence in this respect between ESs <strong>and</strong> GAs,<br />

however, is that both mingle several aspects of the generation transition. Let us look rst,<br />

therefore, at the biological facts to be modelled by a selection operator.<br />

An o spring may ormay not be able to survive the time span between birth <strong>and</strong><br />

reproduction. If it is vital up to its reproductive ageitmayhave varying numbers of<br />

o spring with one or more partners of its own generation. Thus, the term \selection" in<br />

EAs comprises at least three di erent aspects:<br />

Survival to adult state (ontogeny)<br />

Mating behavior (perhaps including promiscuity)<br />

Reproductive activity<br />

Both ESs <strong>and</strong> GAs select parents for each o spring anew, thus modelling maximal<br />

promiscuity. GAs assign higher mating <strong>and</strong> reproductive activities to individuals with<br />

better objective function values (both for proportional as well as linear or other ranking<br />

selection). But even the worst o spring of generation g may become parents for generation<br />

g +1. The probability, however, may bevery low. If this is the case, most o spring<br />

are descendants of a few best parents only. The corresponding loss of diversity inthe<br />

population may lead to premature stagnation (not convergence!) of the evolutionary<br />

seeking process. Reducing the proportionality factor in the selection function, on the<br />

other h<strong>and</strong>, ultimately leads to r<strong>and</strong>om walk behavior. This enhances the reliability in<br />

multimodal situations, but reduces the convergence velocity <strong>and</strong> the precision of locating<br />

the optimum.<br />

For proportional selection, after Holl<strong>and</strong> derived from an analogy to the game-theoretic<br />

multiarmed b<strong>and</strong>it problem, the average number of o spring for an individual with genotype<br />

ak, phenotype xk, <strong>and</strong> vitality f(xk) is<br />

(f(xk))<br />

(ak) = ps(ak) =<br />

=<br />

1 X<br />

(f(xi))<br />

k<br />

The transformation (f) is necessary for introducing the proportionality factor mentioned<br />

aboveaswell as for dealing with negativevalues of the objective function. ps often is called<br />

the survival probability, which is misleading. No parent really survives its generation<br />

except in an elitist GA version. Then the best parent isputinto the next generation<br />

i=1


158 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

without applying the selection operator. Otherwise it may happen simply by chance that<br />

one or the other descendant is not di erent from one of its parents.<br />

In contrast to ESs, the number of o spring always is equal to the number of parents<br />

( = ). There is no surplus of descendants to cope with lethal mutations <strong>and</strong><br />

recombinations. ESs need that kind of surplus for h<strong>and</strong>ling constraints, at least. In<br />

the non-preserving case of its comma-version, a multimembered ES also needs a surplus<br />

( > ) for the selection process. The ; worst o spring are h<strong>and</strong>led as if they do<br />

not survive to the adult reproductive state the best, however, have the same reproduction<br />

probability ps =1= ,which does not depend on their individual phenotypes or<br />

corresponding objective function values. Thus,onaverage, every parent has = descendants.<br />

This is depicted on the left-h<strong>and</strong> side of Figure 5.17, where the average number<br />

of descendants of the two bestof = 10 descendants (evenly distributed on the tness<br />

scale just for simpli cation purposes) is just = = 5 for a (2,10) ES, <strong>and</strong> zero for all<br />

others.<br />

Within a GA it largely depends on the scaling function (f), how many o spring are<br />

produced on average by their ancestors. The right-h<strong>and</strong> part of Figure 5.17 presents two<br />

possible situations. Crosses (+) belong to a steep, triangles (4) to a at reproduction<br />

probability curve (average number of o spring) over the tness of the individuals. In<br />

the former case it typically happens that, just like in ESs, only the best individuals<br />

produce o spring (here the best parent has 6, the second best 3, the third best only 1,<br />

<strong>and</strong> all others zero o spring). One would call this strong selection. Weak selection, on<br />

the contrary, characterizes the other case (only the worst parent has no o spring, the<br />

best one just 2, <strong>and</strong> all others 1). It will strongly depend on the actual topology how one<br />

should choose the proportionality factor <strong>and</strong> it mayeven be necessary to change it during<br />

one optimum seeking process.<br />

Self-adaptation of internal strategy parameters is possible within the framework of<br />

GAs, too. Back (1992a,b, 1993, 1994a,b) has demonstrated this with respect to the<br />

mutation rate. For that purpose he adopts the selection mechanism of the multimembered<br />

ES.<br />

Last but not least, the question remains whether a stochastic or a deterministic approach<br />

to modelling selection is more appropriate. The argument that a stochastic model<br />

is closer to reality, is not su cient for the purpose at h<strong>and</strong>: optimization <strong>and</strong> adaptation.<br />

5.3.5 Further Remarks<br />

Of course, one would like to incorporate at least one close-to-canonical GA version into the<br />

comparative test series with all the other optimization procedures. But there are problems<br />

with that kind of endeavor. First, GAs do not permit general inequality constraints.<br />

This does not matter too much, since there are other algorithms that are not applicable<br />

directly in such cases, too. Next, GAs must be provided with lower <strong>and</strong> upper bounds for<br />

all parameters, which of course have tobechosen to contain the solution, probably in or<br />

near the middle of the hypercube de ned by the explicit bounds. The GA thus would be<br />

provided with information that is not available for the other algorithms.<br />

For all other methods the starting point is of great importance, not only because it


Genetic Algorithms 159<br />

Average<br />

Number of<br />

Offspring<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

1 5 10<br />

Average<br />

Number of<br />

Offspring<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

Fitness 1 5 10<br />

Figure 5.17: Comparison of selection consequences in EAs<br />

left: ES right: GA<br />

Fitness<br />

de nes the initial distance from the optimum <strong>and</strong> thus determines largely the number of<br />

iterations needed to approximate the solution at the prede ned accuracy, but also because<br />

it may provide more or less topological di culties in its vicinity. GAs, however, should<br />

be started at r<strong>and</strong>om in the whole hypercube de ned by the lower <strong>and</strong> upper bounds<br />

of the variables, in order to give themachance of approaching the global or, at least,<br />

avery good local optimum. Reliability tests (see Appendix A, Sect. A.2), especially<br />

in cases of multimodal functions would thus be biased against all other methods, if one<br />

allows the GA to start from many points at the same time <strong>and</strong> if one gives the GA the<br />

needed extra information about the relevant search region that is not available for the<br />

other methods. One might provide special test conditions to compare di erent EAs with<br />

each other without giving one of them an advantage from the very beginning, but no large<br />

e ort of this kind has been made so far.<br />

Even in cases of special constraints or side conditions one may formulate appropriate<br />

instantiations of suitable GA versions. This has been done, for example, for the<br />

combinatorial optimization task of solving the travelling salesperson problem (TSP) by<br />

Gorges-Schleuter (1991a,b) repair mechanisms were used in cases where unfeasible tours<br />

were caused by recombination. Beyer (1992) has investigated ESs for solving TSP-likeoptimization<br />

problems. It is much better to look for data structures tted to the special task<br />

<strong>and</strong> to rede ne the genetic operators to keep to the feasible solution set (see Michalewicz,<br />

1992, 1994). The time for developing such special EAs must be added to the run time<br />

on the computer, <strong>and</strong> one argument infavor of EAs is lost, i.e., their simplicity of use or<br />

generality of application.<br />

As the short analysis of GA mutation <strong>and</strong> recombination operators above has clearly


160 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

shown, GAs other than ESs favor in-breadth search <strong>and</strong>thus are especially prepared to<br />

solve global <strong>and</strong> discrete optimization problems, where a volume-oriented approach is<br />

more appropriate than a path-oriented one. They have so far done their best in all kinds<br />

of combinatorial optimization (e.g., Lawler et al., 1985), a eld that has not been pursued<br />

in depth throughout this book. One example in the domain of computational intelligence<br />

has been the combined topology <strong>and</strong> parameter optimization of arti cial neural networks<br />

(e.g., M<strong>and</strong>ischer, 1993) another is the optimization of membership function parameters<br />

within fuzzy controllers (e.g., Meredith, Karr, <strong>and</strong> Kumar, 1992).<br />

5.4 Simulated Annealing<br />

The simulated annealing approach tosolveoptimization problems does not really belong<br />

to the biologically motivated evolutionary algorithms. However, it belongs to the realm of<br />

problem solving methods that make use of other natural paradigms. This is the reason why<br />

this section has not been placed elsewhere among the traditional hill climbing strategies.<br />

In order to harden steel one rst heats it up to a high temperature not far away<br />

from the transition to its liquid phase. Subsequently one cools down the steel more or<br />

less rapidly. This process is known as annealing. According to the cooling schedule the<br />

atoms or molecules have more or less time to nd positions in an ordered pattern (e.g.,<br />

a crystal structure). The highest order, which corresponds to a global minimum of the<br />

free energy, canbeachieved only when the cooling proceeds slowly enough. Otherwise<br />

the frozen status will be characterized by one or the other local energy minimum only.<br />

Similar phenomena arise in all kinds of phase transitions from gaseous to liquid <strong>and</strong> from<br />

liquid to solid states.<br />

A descriptive mathematical model abstracts from local particle-to-particle interactions.<br />

It describes statistically the correspondences between macro variables like density,<br />

temperature, <strong>and</strong> entropy. It was Boltzmann who rst formulated a probability lawto<br />

link the temperature with the relative frequencies of the very many possible micro states.<br />

Metropolis et al. (1953) simulated on that basis the evolution of a solid in a heat bath<br />

towards thermal equilibrium. By means of a Monte-Carlo method new particle con gurations<br />

were generated. Their free energy Enew was compared with that of the former<br />

state (Eold). If Enew Eold then the new con guration \survives" <strong>and</strong> forms the basis<br />

for the next perturbation. The new state may survivealsoifEnew >Eold, but only with<br />

a certain probability w<br />

w = 1<br />

c exp Eold ; Enew<br />

KT<br />

where K denotes the famous Boltzmann constant <strong>and</strong>Tthe current temperature. The<br />

constant c serves to normalize the probability distribution. This Metropolis algorithm<br />

thus is in line with the probability lawof Boltzmann.<br />

Kirkpatrick, Gelatt, <strong>and</strong> Vecchi (1983) <strong>and</strong> Cerny (1985) published optimization methods<br />

based on Metropolis' simulation algorithm. These methods are used quite frequently<br />

nowadays as simulated annealing (SA) procedures. Due to the fact that good intermediate<br />

positions may be \forgotten" during the searchforaminimum or maximum, the algorithm<br />

is able to escape from local extrema <strong>and</strong> nally might reach the global optimum.


Simulated Annealing 161<br />

There are two loops within the SA process:<br />

Lowering the temperature (outer loop)<br />

Tnew = f(Told)


162 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

Step 4: (Termination criterion)<br />

If T (k) ", end the search with result x .<br />

Step 5: (Cooling, outer loop)<br />

Set x (k+10) = x ~x = x ,<br />

<strong>and</strong> T (k+1) = T (k) 0 <


Tabu Search <strong>and</strong> Other Hybrid Concepts 163<br />

Aggressive exploration using a short-term memory forms the core of the TS. From<br />

a c<strong>and</strong>idate list of (non-exhaustive) moves the best admissible one is chosen. The decision<br />

is based on tabu restrictions on the one h<strong>and</strong> <strong>and</strong> on aspiration criteria on the<br />

other. Whereas aspiration criteria aim at perpetuating former successful operations, tabu<br />

restrictions help to avoid stepping back to inferior solutions <strong>and</strong> repeating already investigated<br />

trial moves. Although the best admissible step does not necessarily lead to<br />

an improvement, only better solutions are stored as real moves. Successes <strong>and</strong> failures<br />

are used to update the tabu list <strong>and</strong> the aspiration memory. If no further improvements<br />

can be found, or after a speci ed number of iterations, one transfers the results to the<br />

longer-term memories <strong>and</strong> switches to either an intensi cation or a diversi cation mode.<br />

Intensi cation combined with the medium-term memory refers to procedures for reinforcing<br />

movecombinations historically found good, whereas diversi cation combined with<br />

the long-term memory refers to exploring new regions of the search space. The rst articles<br />

of Glover (1986, 1989) present many ideas to decide upon switching back <strong>and</strong> forth<br />

between the three modes. Many morehave been conceived <strong>and</strong> published together with<br />

application results. In some cases complete procedures from other optimization paradigms<br />

have been used within the di erent phases of the TS, e.g., line search orgradient-liketechniques<br />

during intensi cation, <strong>and</strong> GAs during diversi cation.<br />

Instead of going into further details here, it seems appropriate to give some hints that<br />

point to rather similar hybrid methods, more or less centered around either GAs, ESs, or<br />

SA as the main strategy.<br />

One could start again with Powell's rule to look for further restart points in the<br />

vicinity of the nal solutions of his conjugate direction method (Chap. 3, Sect. 3.2.2.1)<br />

or with the restart rule of the simplex method according to Nelder <strong>and</strong> Mead (Chap. 3,<br />

Sect. 3.2.1.5), in order to interpret them in terms of some kind of diversi cation phase. But<br />

in general, both approaches cannot be classi ed as better ideas than starting a speci c<br />

optimum seeking method from di erent initial solutions <strong>and</strong> simply comparing all the<br />

(maybe di erent) outcomes, <strong>and</strong> choosing the best one as the nal solution. It might<br />

even be more promising to use di erent strategies from the same starting point <strong>and</strong> to<br />

select the overall best outcome again as a new start condition. On MIMD (multiple<br />

instructions, multiple data) parallel computers or nets of workstations the competition of<br />

di erent search methods could even be used to set up a knowledge base that adapts to<br />

a speci c situation (e.g., Peters, 1989, 1991). Only individual conclusions for one or the<br />

other special application can be drawn from this kind of metastrategic approach, however.<br />

At the close of this general survey, only a few further hints will be given regarding the<br />

vast number of recent proposals.<br />

Ablay (1987), for example, uses a basic search routine similar to Rechenberg's (1+1)<br />

ES <strong>and</strong> interrupts it more or less frequently by a pure r<strong>and</strong>om search in order to avoid<br />

premature stagnation as well as convergence to a non-global local optimum.<br />

The replicator algorithm of Voigt (1989) also refers to organic evolution as a metaphor<br />

(see also Voigt, Muhlenbein, <strong>and</strong> Schwefel, 1990). Its modelling technique may be called<br />

descriptive, according to earlier work of Feistel <strong>and</strong> Ebeling (1989). Ebeling (1992) even<br />

proposes to incorporate ontogenetic learning features (so-called Haeckel strategy).<br />

Muhlenbein <strong>and</strong> Schlierkamp-Voosen (1993a,b) proposed a so-called breeder GA, which


164 <strong>Evolution</strong> Strategies for Numerical Optimization<br />

combines a greedy algorithm to locate nearest local optima very quickly, with a genetic<br />

algorithm to allocate recombined start positions for further local optimum seeking cycles.<br />

This has proven to be very successful in special situations where the local optima are<br />

situated in a regular pattern in the search space.<br />

Dueck <strong>and</strong> Scheuer (1990) have devised a so-called threshold accepting strategy, which<br />

is rather similar to the simulated annealing approach but pretends to deliver superior<br />

results. Later on Dueck (1993) elaborated his great deluge algorithm, which addstothe<br />

threshold accepting method some kind of diversi cation mode like the tabu search in order<br />

to avoid premature stagnation at a non-global local optimum.<br />

Lohmann (1992) <strong>and</strong> Herdy (1992) propose a hierarchical ES according to Rechenberg's<br />

extended notation (Rechenberg, 1978, 1989, 1994) of the multimembered scheme<br />

to solve so-called structural optimization problems. Whereas this term normally points<br />

to situations in which a solid structure subject to stresses <strong>and</strong> deformations has to be<br />

designed in order to have least weight or production cost, Lohmann <strong>and</strong> Herdy do not<br />

mean anything else than a mixed-integer optimization problem. The solution is sought<br />

for in an outer ES-loop that varies the integer object variables only <strong>and</strong> an inner ESloop<br />

that varies the real-valued variables. Thus the outer loop compares relative optima<br />

found in the inner loops. This kind of cyclical subspace search, somehow similar to the<br />

Gauss-Seidel approach, must not represent the ultimate solution to mixed-integer problems,<br />

however. It is more or less prone to nding non-global local optima only. A more<br />

general evolutionary algorithm should be able to change{at the same time, by appropriate<br />

mutation <strong>and</strong> recombination operators{both the discrete <strong>and</strong> the real-valued object<br />

variables. But this speculation must be proved in forthcoming further steps towards a<br />

more general evolutionary algorithm, perhaps a hybrid of ES <strong>and</strong> GA ingredients.


Chapter 6<br />

Comparison of Direct Search<br />

Strategies for Parameter<br />

Optimization<br />

6.1 Di culties<br />

The vast <strong>and</strong> steadily increasing number of optimization methods necessarily raises the<br />

question of which is the best strategy. There seems to be no unique answer. If indeed<br />

there were an optimal optimization method all the others would be super uous <strong>and</strong> would<br />

have been long ago forgotten.<br />

Because of the strong competition between already existing strategies it is necessary<br />

nowadays that whenever any proposal for a new method or variant is made, its advantages<br />

<strong>and</strong> improvements compared to older strategies be displayed. The usual way is to refer to<br />

a minimum problem for which the known methods fail to nd a solution whereas the new<br />

proposal is successful. Or it is shown with reference to chosen examples that computation<br />

time or iterations can be saved by using the new version. The series of publications<br />

along these lines can in principle be continued inde nitely. With su cient insight into<br />

the working of any strategy a special optimization problem can always be constructed for<br />

which the strategy fails. Likewise for any problem a special method of solution can be<br />

devised that is superior to the other procedures. One simply needs to exploit to the full<br />

what one knows of the problem structure as contained in its mathematical formulation.<br />

Progress in the eld of optimization methods does not, however, consist in developing<br />

an individual method of solution for each problem or type of problem. A practitioner<br />

would much rather manage with just one strategy, which can solve all the practically<br />

occurring problems for as small a total cost as possible. But as yet there is no such<br />

universal optimization method, <strong>and</strong> some authors doubt if there ever will be (Arrow <strong>and</strong><br />

Hurwicz, 1957). All the methods presently known can only be used without restriction<br />

in particular areas of application. According to the nature of the particular problem,<br />

one or another strategy o ers a more successful solution. The question of which isthe<br />

best strategy is itself a kind of optimization problem. To be able to answer it objectively<br />

an objective function would have tobeformulated for deciding which oftwo methods<br />

165


166 Comparison of Direct Search Strategies for Parameter Optimization<br />

was best from the point of view of its results. So long as no generally recognized quality<br />

function of this kind exists, the question of which optimization method is optimal remains<br />

unanswered.<br />

6.2 Theoretical Results<br />

Classical optimization theory is concerned with establishing necessary <strong>and</strong> su cient existence<br />

criteria for maxima <strong>and</strong> minima. It provides systems of equations but no iterative<br />

methods of nding their solutions. Not even Dantzig's simplex method (1966) for solving<br />

linear programming problems can be regarded as a direct result of theory{theoretical considerations<br />

of the linear problem only show that the extremum sought, except in special<br />

cases, must always lie in a corner of the polyhedron de ned by the constraints. With n<br />

variables <strong>and</strong> m constraints (together with n non-negativity conditions) the number of<br />

corners or points of intersection of the hypersurfaces formed by the constraints is also<br />

limited to a maximum of m+n<br />

. Even the systematic inspection of all the points of<br />

n<br />

intersection would be a nite optimization method. But not all the points of intersection<br />

are also within the allowed region (Saaty, 1955, 1963). Muller-Merbach (1971) gives<br />

mn ; m + 2 as an upper bound to the number of feasible corner points. The simplex<br />

method, which is a method of steepest ascent along the edges of the polyhedron only<br />

traverses a tiny fraction of all the corners. Dantzig (1966) refers to empirical evidence<br />

that the number of necessary iterations increases as n, thenumber of variables, if the<br />

number of constraints m is constant, or as m if (n ; m) is not too small. Since, in<br />

the least favorable case, between m <strong>and</strong> 2 m exchange operations must be performed on<br />

the tableau of (m +1)(n + 1) coe cients, the average computation time increases as<br />

O(m2 n). In so-called degenerate cases, however, the simplex method can also become<br />

in nite. The repeated cycling through the same corners must then be broken by arule<br />

for r<strong>and</strong>omly choosing the iteration step (Dantzig). From a theoretical point of view the<br />

ellipsoid method of Khachiyan (1979) <strong>and</strong> the interior point method of Karmarkar (1984)<br />

do have the advantage of polynomial time consumption even in the worst case.<br />

The question of niteness of iterative methodsisalsoacentral theme of non-linear<br />

programming. In this case the solution can lie at any point on the border or interior<br />

of the enclosed region. For the special case that the objective function <strong>and</strong> all the constraint<br />

functions are convex <strong>and</strong> multiply di erentiable Kuhn <strong>and</strong> Tucker (1951) <strong>and</strong> John<br />

(1948) have derived necessary <strong>and</strong> su cient conditions for extremal solutions. Most of<br />

the iteration methods that have beendeveloped on this basis are designed for problems<br />

with a quadratic objective function <strong>and</strong> linear constraints. Representative of quadratic<br />

programming are, for example, the methods of Beale (1956) <strong>and</strong> Wolfe (1959a). They<br />

make extensive use of the algorithm of the simplex method <strong>and</strong> thus belong, according to<br />

Hadley (1969), to the class of neighboring extremal point methods. Other strategies can<br />

moveinto the allowed region in the course of the iterations. As far as the constraints permit<br />

they take the direction of the gradient of the objective function. They are therefore<br />

known as gradient methods of non-linear programming (Kappler, 1967). As their name<br />

may suggest, however, they are not suitable for all non-linear problems. Their convergence<br />

can be proved at best for di erentiable quasi-convex programs (Kunzi, Krelle, <strong>and</strong>


Theoretical Results 167<br />

Oettli, 1962). For these conditions the number of required iterations <strong>and</strong> rate of convergence<br />

cannot be stated in general. The same is true for the methods of Khachiyan (1979)<br />

<strong>and</strong> Karmarkar (1984). In the following chapters a short summary is attempted of the<br />

convergence properties of non-linear optimization methods in the unconstrained case (hill<br />

climbing methods).<br />

6.2.1 Proofs of Convergence<br />

A proof of convergence of an iterative method will aim to show that a sequence of iteration<br />

points x (k) tends monotonically with the index k towards the point x 0 which is sought:<br />

or<br />

lim<br />

k!1 kx(k) ; x 0 k!0<br />

kx (k) ; x 0 k " " 0 for K(") k


168 Comparison of Direct Search Strategies for Parameter Optimization<br />

of a practical method, one must usually introduce adaptive rules for the termination of<br />

subroutines that would in principle run forever (Nickel, 1967 Nickel <strong>and</strong> Ritter, 1972).<br />

A further limitation to the predictive power of proofs of convergence arises from the<br />

properties of the point x 0 referred to above. Even if confusion of maxima <strong>and</strong> minima<br />

is eliminated, the approximate solution x 0 can still be a saddle point. To exclude this<br />

possibility, the second <strong>and</strong> sometimeseven higher partial derivatives must be constructed<br />

<strong>and</strong> tested. It still always remains uncertain whether the solution that is nally found<br />

represents the global minimum or only a local minimum of the objective function. The<br />

only possibility of proving the global convergence of a sequential optimization method<br />

seems to be to require unimodality of the objective function. Then only one local optimum<br />

exists that is also a global optimum. Some global convergence properties are only<br />

possessed by a few simultaneous methods, such as for example the systematic grid method<br />

or the Monte-Carlo method. They place no continuity requirements on the objective function<br />

but the separation of the trial points must be signi cantly smaller than the distance<br />

between neighboring minima <strong>and</strong> the required accuracy. The fact that its cost rises exponentially<br />

with the number of variables usually precludes the practical application of such<br />

a method.<br />

How does the convergence of the evolution strategy compare? For xed step lengths, or<br />

more precisely for xed variances 2<br />

i > 0 of the normally distributed mutation steps, there<br />

is always a positive probability of going from any starting point (e.g., a local minimum)<br />

to any other point with a better objective function value, provided that the separation of<br />

the points is nite. For the two membered method, Rechenberg (1973) gives necessary<br />

<strong>and</strong> su cient conditions that the probability of success should exceed a speci ed value.<br />

Estimates of the computation cost can only be made for special objective functions. In<br />

this respect there are problems in determining the rules for controlling the mutation step<br />

lengths <strong>and</strong> deciding when the search is to be terminated. It is hard to reconcile the<br />

requirements for rapid convergence in one case <strong>and</strong> for a certain minimum probability of<br />

global convergence in another.<br />

6.2.2 Rates of Convergence<br />

While it may be of importance from a mathematical point of view to show that under<br />

certain assumptions a particular method leads with certainty to the objective, it is even<br />

more important toknowhowmuch computational e ort is required, or what is the rate<br />

of convergence. The question of how fast an optimal solution is approached, or how many<br />

iterations are needed to reach a prescribed small distance from the objective, can only be<br />

answered for a few abstract methods <strong>and</strong> under even more restrictive assumptions. One<br />

distinguishes between rst <strong>and</strong> second order convergence. Although some authors reserve<br />

the term quadratic convergence for the case when the solution of a quadratic problem is<br />

found within a nite number of iterations, it will be used here as a synonym for second<br />

order convergence. A sequence of iteration points x (k) converges linearly to x if it satis es<br />

the condition<br />

kx (k) ; x k c k


Theoretical Results 169<br />

where 0


170 Comparison of Direct Search Strategies for Parameter Optimization<br />

their so-called Q-properties. Thus if a strategy takes p iteration steps for locating exactly<br />

the quadratic optimum it is said to have the property Qp.<br />

The Newton-Raphson method, for example, takes only a single step because the second<br />

partial derivatives are constant over the whole IR n <strong>and</strong> all higher order derivatives vanish.<br />

If the iteration rule is followed exactly it gives the position of the minimum right atthe<br />

rst step without the necessity of a line search. As no objective function values need to<br />

be evaluated explicitly one also refers to it as an indirect optimization method. It has the<br />

property Q 1.<br />

A conjugate gradients method, e.g., that of Fletcher <strong>and</strong> Reeves (1964), requires up<br />

to n cycles before a complete set of conjugate directions is assembled <strong>and</strong> a line search<br />

leads to the minimum. It therefore has the property Qn.<br />

Powell's (1964) derivative-free search method of conjugate directions requires n +1<br />

line searches for determining each of the n direction vectors <strong>and</strong> thus has the property<br />

Qn(n +1)orQO(n 2 ) in terms of the number of one dimensional minimizations.<br />

The variable metric strategy of Davidon (1959) in the formulation of Fletcher <strong>and</strong><br />

Powell (1963) can be interpreted both as a quasi-Newton method <strong>and</strong> as a method with<br />

conjugate directions. If the objective function is quadratic, then the iteratively improved<br />

approximate matrix agrees with the exact inverse of the Hessian matrix after n iterations.<br />

This method has the property Qn.<br />

Apart from the fact that any practical algorithm can require more than the theoretically<br />

predicted number of iterations due to the e ect of rounding errors, for peculiar<br />

types of coe cient matrix in the quadratic problem the algorithm can fail completely.<br />

For example Zangwill (1967) demonstrates such a source of error in the Powell method if<br />

no improvement isachieved in one direction.<br />

6.2.4 Computing Dem<strong>and</strong>s<br />

The speci cation of the Q-properties of individual strategies is only the rst step towards<br />

estimating the computing dem<strong>and</strong>s. In di erent procedures an iteration or a cycle<br />

comprises various di erent operations. It is useful to distinguish ordinary calculation operations<br />

like additions <strong>and</strong> multiplications from the evaluation of functions such as the<br />

objective function <strong>and</strong> its derivatives. The number of variables is the basic quantity that<br />

determines the computation cost. A crude but adequate measure is therefore given by<br />

the power p of n, the number of parameters, with which the expected computation times<br />

increase. For the case of many variables, since the highest powers are dominant, lower<br />

order terms can be neglected. In the Newton-Raphson method, at each iteration the gradient<br />

vector rF <strong>and</strong> the Hessian matrix r2F must be evaluated, which meansn rst <strong>and</strong><br />

n<br />

2<br />

(n + 1) second partial derivatives. Objective function values are not required. In fact<br />

the most costly step is the matrix inversion. It requires in the order of O(n 3 ) operations.<br />

A cycle of the conjugate gradient method consists of a line search <strong>and</strong> a gradient determination.<br />

The one dimensional minimization requires several calls of the objective function.<br />

Their number depends on the choice of method but it can be regarded as constant, or at<br />

least as independent of the number of variables. The remaining steps in the calculation,<br />

including vector multiplications, are composed of O(n) elementary arithmetical opera-


Theoretical Results 171<br />

tions. Similar results apply in the case of the variable metric strategy, except that there<br />

are an additional O(n 2 ) basic operations for matrix additions <strong>and</strong> multiplications. The<br />

direct search method due to Powell evaluates neither rst nor second partial derivatives.<br />

After every n + 1 line searches the direction vectors are rede ned, which requires O(n 2 )<br />

values to be assigned. But since each one dimensional optimization counts as an iteration<br />

step, only O(n) direct operations are attributed to each iteration. A convenient summary<br />

of the relationships is given in Table 6.1. For simplicity only the terms of highest order in<br />

the number of parameters n are accounted for, without their coe cients of proportionality.<br />

So far we have no scale for comparison of the di erent function evaluations with each<br />

other. Fletcher (1972a) <strong>and</strong> others consider an evaluation of the Hessian matrix to be<br />

equivalent toO(n) gradient determinations or O(n 2 ) objective function calls. This type<br />

of scaling is valid whenever the partial derivatives cannot be obtained in analytic form<br />

<strong>and</strong> provided as functions, but are calculated approximately as quotients of di erences<br />

obtained by trial steps in the coordinate directions. In any case it ought to be about<br />

right if the objective function is of higher than second order. Accordingly the following<br />

weighting of the function evaluations can be introduced on the table:<br />

F : rF : r 2 F ^ = n 0 : n 1 : n 2<br />

Before anything can be said about the overall computation cost, or time, one must<br />

know howmany operations are required for calculating a value of the objective function.<br />

In general a function of n variables will entail a cost that rises at least linearly with n.<br />

Table 6.1: Number of operations required by the most important<br />

basic strategies to minimize a quadratic objective function<br />

in terms of the number of variables n (only orders<br />

of magnitude)<br />

Number of Number of operations per iteration<br />

Strategy iterations<br />

Function evaluations<br />

F rF r<br />

Elementary<br />

2F operations<br />

Newton<br />

e.g., Newton-Raphson<br />

Variable metric<br />

e.g., Davidon<br />

Conjugate gradients<br />

e.g., Fletcher-Reeves<br />

Conjugate directions<br />

e.g., Powell<br />

n 0 | n 0 n 0 n 3<br />

n 1 n 0 n 0 | n 2<br />

n 1 n 0 n 0 | n 1<br />

n 2 n 0 | | n 1<br />

n 0 n 1 n 2<br />

Weighting factors


172 Comparison of Direct Search Strategies for Parameter Optimization<br />

For a quadratic function with a full matrix of coe cients, just to evaluate the expression<br />

xT Axrequires O(n2 ) basic arithmetical operations. If the order of magnitude is denoted<br />

by O(nf ) then, assuming f<br />

computation time is given by:<br />

1, for all the optimization methods considered so far the<br />

T n 2+f<br />

n 3<br />

The advantage of having fewer function-independent operations in the Fletcher-Reeves<br />

method, therefore, only makes itself felt if the number of variables is small <strong>and</strong> the time<br />

for one function evaluation is short.<br />

All the variants of the basic second order strategies mentioned here can be tted, with<br />

similar assumptions, into the above scheme. Among these are (Broyden, 1972)<br />

Modi ed <strong>and</strong> quasi-Newton methods<br />

Methods of conjugate gradients <strong>and</strong> conjugate directions<br />

Variable metric strategies, with their variations using correction matrices of rank<br />

one<br />

There is no optimization method that has a cost rising with less than the third power<br />

of the number of variables. Even the indirect procedure, in which the equations for the<br />

necessary conditions for an extremum are set up <strong>and</strong> solved by conventional methods,<br />

does not a ord any basic reduction in the computational e ort. If the objective function<br />

is quadratic, a system of n simultaneous linear equations is obtained. To solve for the<br />

n unknowns the Gaussian elimination method requires 1<br />

3 n3 basic operations (multiplications<br />

<strong>and</strong> divisions). According to Zurmuhl (1965) all the other direct methods, meaning<br />

here non-iterative methods, are more costly, except in special cases. Methods involving<br />

a stepwise approach to the solution of systems of linear equations (relaxation methods)<br />

require an in nite number of iterations to reach an absolutely exact result. They converge<br />

linearly <strong>and</strong> correspond to rst order optimization strategies (single step or Gauss-Seidel<br />

methods <strong>and</strong> total step or gradient methods see Schwarz, Rutishauser, <strong>and</strong> Stiefel, 1968).<br />

Only the method of Hestenes <strong>and</strong> Stiefel (1952) converges after a nite number of calculation<br />

steps, assuming that the calculations are exact. It is a conjugate gradient method<br />

for solving systems of linear equations with a symmetrical, positive-de nite matrix of<br />

coe cients.<br />

The main concern here is with direct, i.e., derivative-free, search strategies for optimization.<br />

Finiteness of the search in the quadratic case <strong>and</strong> greater than linear convergence<br />

can only be proved for the Powell method of conjugate directions <strong>and</strong> for<br />

the Davidon-Fletcher-Powell variable metric method, which Stewart reformulated as a<br />

derivative-free quasi-Newton method. Of the coordinates strategy, at best it can be said<br />

that it converges linearly. The same holds for the simple gradient methods. There are also<br />

versions of them in which the partial derivatives are obtained numerically. Since various<br />

comparison tests have shown them to be rather ine ective in highly non-linear situations,<br />

none is considered here. No theoretically founded statements about convergence rates<br />

<strong>and</strong> Q-properties are available for the other direct strategies. The rate of progress dened<br />

by Rechenberg (1973) for the evolution strategy with adaptive step length control


Numerical Comparison of Strategies 173<br />

represents an average measure of convergence. It could, however, only be determined<br />

theoretically for two selected model objective functions. The one with concentric contour<br />

lines, or contour hypersurfaces, can be regarded as a special case of a quadratic objective<br />

function. The formula for the local rate of progress in both the two membered <strong>and</strong> the<br />

multimembered strategies has the form<br />

r is the current distance from the objective:<br />

'(r) =c r<br />

c = const:<br />

n<br />

r = kx (k) ; x k<br />

<strong>and</strong> ' is the change in r at one iteration or mutation<br />

Rearrangement of the above formulae gives<br />

'(r) =4r = kx (k) ; x k;kx (k+1) ; x k<br />

kx (k+1) ; x k = kx (k) ; x k 1 ; c<br />

n<br />

or<br />

kx (k) ; x k = kx (0) ; x k 1 ; c<br />

n<br />

k<br />

which because<br />

0 < 1 ; c<br />

< 1 <br />

n<br />

for 1 n


174 Comparison of Direct Search Strategies for Parameter Optimization<br />

hold for the idealized concept of an algorithm, not for a particular computer program.<br />

The susceptibility of a strategy to rounding errors depends on how it is coded. Thus, for<br />

this reason too there is a need to check the convergence properties of numerical methods<br />

experimentally.<br />

Because of the nite word length of a digital computer the number range is also<br />

limited. If it is exceeded, the program that is running normally terminates. Such fatal<br />

execution errors ( oating over ow, oating divide check), are usually the consequence of<br />

rounding errors in previous steps if the error is in going below the absolutely smallest<br />

number value ( oating under ow) it is not regarded as fatal. Only few algorithms, e.g.,<br />

Brent (1973), take special account of nite machine accuracy.<br />

In spite of the frequent mention of the importance of numerical comparisons of strategies,<br />

few publications to date have reported results on several di erent test problems using<br />

a large number of minimization methods. By virtue of its scope, the work of Colville<br />

(1968, 1970) st<strong>and</strong>s out among the older studies by Brooks (1959), Spang (1962), Dickinson<br />

(1964), Leon (1966a), Box (1966), <strong>and</strong> Kowalik <strong>and</strong> Osborne (1968). It included<br />

30 strategies <strong>and</strong> 8 di erent problems, but not many direct search methods compared to<br />

gradient methods. In some other tests by Jacoby, Kowalik, <strong>and</strong> Pizzo (1972), Himmelblau<br />

(1972a), Smith (1973), <strong>and</strong> others in the collection of Lootsma (1972a), derivative-free<br />

strategies receive much moreattention. The comparisons of Gorvits <strong>and</strong> Larichev (1971)<br />

<strong>and</strong> Larichev <strong>and</strong> Gorvits (1974) treat only gradient methods, <strong>and</strong> that of Tapley <strong>and</strong><br />

Lewallen (1967) deals with some schemes for the numerical treatment of functional optimization<br />

problems. The huge collection of test problems of Hock <strong>and</strong>Schittkowski (1981)<br />

is biased towards st<strong>and</strong>ard methods of mathematical programming <strong>and</strong> their capabilities<br />

(Schittkowski, 1980).<br />

6.3.1 Computer Used<br />

The machine on which thenumerical experiments were carried out was a PDP 10 from<br />

the rm Digital Equipment Corporation, Maynard, Massachusetts. It had the following<br />

speci cations:<br />

Core storage area: 64K (1K = 1024 words)<br />

Word length: 36 bits<br />

Cycle time: 1.65 or 1.8 s<br />

The time-sharing operating system accounted for about 34K of core, so that only<br />

30K remained available to the user. To tackle some problems with as many variables<br />

as possible, the computations were generally worked only to single precision. The main<br />

program, which was the same for all strategies, occupied about 5+<br />

2 n<br />

1024<br />

Kwords,<br />

<strong>and</strong> the FORTRAN library a further 5K. The consequent maximum number nmax of<br />

parameters is given for each search method under test in Table 6.2. The nite word length<br />

of a digital computer means that its number range is limited. The absolute bounds for<br />

oating point arithmetic were given by:<br />

Largest absolute number: 2 127 ' 1:7 10 38<br />

Smallest absolute number: 2 ;128 ' 2:9 10 ;39


Numerical Comparison of Strategies 175<br />

Only a part of the word is available for the mantissa of a number. This imposed the<br />

di erential accuracy limit, which ismuch lower <strong>and</strong> usually more important:<br />

Smallest di erence relative to unity: 2 ;27 ' 7:5 10 ;9<br />

Accordingly the following equalities hold for this computer:<br />

" = 0 for j"j < 2 ;128<br />

1+" = 1 for j"j < 2 ;27<br />

These computer-speci c data play ar^ole when testing for zero or for the equality of<br />

two quantities. The same programs can therefore lead to di erent results on di erent<br />

computers.<br />

Strategies are often judged by the computation time they require to achieve a result, for<br />

example, with a speci ed accuracy. The basic quantity for this purpose is the occupation<br />

time of the central processor unit (CPU). It also depends on the machine. Word lengths<br />

<strong>and</strong> cycle times are not enough to allow comparison between runs that were made on<br />

di erent computers. So-called MIX-times, which are average values of the duration of<br />

certain operations, also prove to be unsuitable, since the speed of calculation is so strongly<br />

dependent on the frequency of its individual steps. A method proposed by Colville (1968)<br />

has received wide recognition. Its design was particularly suited to optimization methods.<br />

According to this scheme, measured computation times are expressed relative to the time<br />

required for 10 consecutive inversions of a 40 40 matrix, using the FORTRAN program<br />

written by Colville. In our case this unit was around 110 seconds. Because of the timesharing<br />

operation, with its rather variable load on the PDP 10, there were deviations<br />

of 10% <strong>and</strong> above on the reported CPU times. This was especially marked for short<br />

programs.<br />

6.3.2 Optimization Methods Tested<br />

One goal of this work is to compare evolution strategies with other derivative-free methods<br />

of continuous parameter optimization. To this end we consider not only direct search<br />

methods in the narrower sense, but also those methods that glean their required knowledge<br />

of partial derivatives by means of trial steps <strong>and</strong> nite di erence methods. Altogether 14<br />

strategies or versions of basic strategies are considered. Their names <strong>and</strong> abbreviations<br />

used for them are listed in Table 6.2. All tests were run on the PDP 10 mentioned in the<br />

previous section.<br />

Finite computer accuracy implies that in the case of quadratic objective functions<br />

the iteration process could or should not be continued until the exact solution has been<br />

obtained. The decision when to terminate the optimum search is a necessary <strong>and</strong> often<br />

crucial component ofany iterative method. Just as the procedures of the individual<br />

strategies di er, so too do their termination or convergence criteria. As a rule, the user<br />

is given the chance to exert an in uence on the termination criterion by means of an<br />

input parameter de ned as the required accuracy. It refers either to the values of the<br />

variables (change in xi within one iteration or size of the step lengths si) ortovalues of<br />

the objective function. Both criteria harbor the danger that the search will be terminated


176 Comparison of Direct Search Strategies for Parameter Optimization<br />

prematurely, that is before arriving as close to the objective as is required. This is made<br />

clear by Figure 6.1.<br />

Neither 4x < "x nor 4 F < "F alone are su cient conditions for being close to<br />

the solution x . The condition krF k


Numerical Comparison of Strategies 177<br />

Table 6.2: Strategies applied: their abbreviations, maximum number<br />

of variables <strong>and</strong> accuracy parameters<br />

Strategy Abbreviation Maximum Accuracy<br />

number of parameter<br />

variables<br />

Coordinate strategy with FIBO 2900 " =7:5 10 ;9<br />

Fibonacci search<br />

Coordinate strategy with GOLD 2910 " =7:5 10 ;9<br />

golden section<br />

Coordinate strategy with LAGR 2540 " =7:5 10 ;9<br />

Lagrangian interpolation<br />

Direct search ofHooke HOJE 4090 " =7:5 10 ;9<br />

<strong>and</strong> Jeeves<br />

Davies-Swann-Campey method DSCG 75 " =7:5 10 ;9<br />

with Gram-Schmidt<br />

orthogonalization<br />

Davies-Swann-Campey method DSCP 95 " =7:5 10 ;9<br />

with Palmer orthogonalization<br />

Powell's method of conjugate POWE 135 " =7:5 10 ;9<br />

directions<br />

Stewart's modi cation of the DFPS 180 "a = "b = "c =<br />

;9 y<br />

Davidon-Fletcher-Powell<br />

method<br />

7:5 10<br />

Simplex Method of Nelder<br />

<strong>and</strong> Mead<br />

SIMP 135 " =10<br />

Method of Rosenbrock with<br />

Gram-Schmidt orthogonalization<br />

ROSE 75 " =10<br />

Complex method of Box COMP 95 " =10<br />

(1 + 1) <strong>Evolution</strong> strategy EVOL 4000 ) "a = "c =<br />

) 3:0 10 ;39<br />

(10 100) <strong>Evolution</strong> strategy GRUP 435 )<br />

(10 100) <strong>Evolution</strong> strategy REKO 435 ) "b = "d =<br />

with recombination ) 7:5 10 ;9<br />

z Values xed by the author.<br />

y In place of the values set in Lill's program: "a =10 ;6 " b =10 ;10 " c =5 10 ;13 :<br />

The maximum numberofvariables refers to an available core storage area of 30K words, which includes<br />

the main program <strong>and</strong> the FORTRAN library.<br />

;8 z<br />

;4 z<br />

;6 z


178 Comparison of Direct Search Strategies for Parameter Optimization<br />

Besides their considerable cost in programming <strong>and</strong> computation time, numerical<br />

strategy comparisons entail further di culties. The e ectiveness of a method can be<br />

strongly in uenced by small programming details. A number of methods were not fully<br />

worked out by their originators <strong>and</strong> require heuristic rules to be introduced before they can<br />

be applied. The way in which this degree of freedom is exercised to de ne the procedure<br />

depends on the skill <strong>and</strong> experience of the programmer, which leads to large discrepancies<br />

between the results of investigations <strong>and</strong> the judgements of di erent authors on one <strong>and</strong><br />

the same strategy.<br />

We have therefore, as far as possible, used already published programs (FORTRAN<br />

or ALGOL) for the algorithms or parts of them under study:<br />

One dimensional search with the Fibonacci method of Kiefer:<br />

M. C. Pike, J. Pixner (1965) Algorithm 2, Fibonacci search<br />

J. Boothroyd (1965) Certi cation of Algorithm 2<br />

M. C. Pike, I. D. Hill, Note on Algorithm 2<br />

F. D. James (1967)<br />

One dimensional search with the golden section method of Kiefer:<br />

K. J. Overholt (1967) Algorithm 16, Gold<br />

Direct search (pattern search) of Hooke <strong>and</strong> Jeeves:<br />

A. F. Kaupe, Jr. (1963) Algorithm 178, direct search<br />

M. Bell, M. C. Pike (1966) Remark on Algorithm 178<br />

R. DeVogelaere (1968) Remark on Algorithm 178<br />

F. K. Tomlin, L. B. Smith (1969) Remark on Algorithm 178<br />

L. B. Smith (1969) Remark on Algorithm 178<br />

Orthogonalization method for the strategies of Rosenbrock <strong>and</strong> of Davies, Swann,<br />

<strong>and</strong> Campey:<br />

J. R. Palmer (1969) An improved procedure for orthogonalizing the<br />

search vectors in Rosenbrock's <strong>and</strong> Swann's direct<br />

search optimization methods<br />

Derivative-free method of conjugate directions of M. J. D. Powell:<br />

M. J. Hopper (1971) Harwell subroutine library. A catalogue of subroutines,<br />

from which subroutine VA04A, updated<br />

May 20, 1970 (received as a card deck).<br />

Variable metric method of Davidon, Fletcher, <strong>and</strong> Powell as formulated by Stewart:<br />

S. A. Lill (1970) Algorithm 46. A modi ed Davidon method for<br />

nding the minimum of a function, using di erence<br />

approximation for the derivatives.


Numerical Comparison of Strategies 179<br />

S. A. Lill (1971) Note on Algorithm 46<br />

Z. Kovacs (1971) Note on Algorithm 46<br />

Some of the parameters a ecting the accuracy were altered, either because the small<br />

values de ned by the author could not be realized on the available computer or<br />

because the closest possible approach to the objective could not have been achieved<br />

with them.<br />

Simplex method of Nelder <strong>and</strong> Mead:<br />

R. O'Neill (1971) Algorithm AS 47, function minimization using a<br />

simplex procedure<br />

A complete program for the Rosenbrock strategy:<br />

M. Machura, A. Mulawa (1973) Algorithm 450, Rosenbrock function minimization<br />

This was not applied because it could only treat the unconstrained case.<br />

The same applies to the code for the complex method of M. J. Box:<br />

J. A. Richardson, J. L. Kuester<br />

(1973)<br />

Algorithm 454, the complex method for constrained<br />

optimization<br />

The part of the strategy that, when the starting point is not feasible seeks a basis<br />

in the feasible region, is not considered here.<br />

Whenever the procedures named were published in ALGOL they have been translated<br />

into FORTRAN. All the other optimization strategies not mentioned here have also been<br />

programmed in FORTRAN, with close reference to the original publications. If one wanted<br />

to repeat the test series today, amuch larger number of codes could be made use of from<br />

the book of More <strong>and</strong> Wright (1993).<br />

6.3.3 Results of the Tests<br />

6.3.3.1 First Test: Convergence Rates for a Quadratic Objective Function<br />

In the rst part of the numerical strategy comparison the theoretical predictions of convergence<br />

rates <strong>and</strong> Q-properties will be tested, or, where these are not available, experimental<br />

data will be supplied instead. For this purpose two quadratic objective functions are used<br />

(Appendix A, Sect. A.1). In the rst (Problem 1.1) the matrix of coe cients is diagonal<br />

with unit diagonal elements, i.e., a scalar matrix. This simplest of all quadratic problems<br />

is characterized by concentric contour lines or surfaces that can be represented or<br />

imagined as circles in the two parameter case, spheres in the three parameter case, <strong>and</strong><br />

surfaces of hyperspheres in the general case. The same pattern of contours but with arbitrary<br />

monotonic variation in the objective function occurs in the sphere model for which<br />

the average rates of progress of the evolution strategies could be determined theoretically<br />

(Rechenberg, 1973 <strong>and</strong> Chap. 5 of this book).<br />

The second objective function (Problem 1.2) has a matrix of coe cients with all nonzero<br />

elements. It represents a full quadratic problem (except for the missing linear term)


180 Comparison of Direct Search Strategies for Parameter Optimization<br />

with concentric, oblique ellipses, or ellipsoids as the contour lines or surfaces. The condition<br />

number of the matrix of coe cients increases quadratically with the number of<br />

parameters (see Appendix A, Sect. A.1). In general, the time required to calculate one<br />

value of the objective function increases as O(n 2 ) for a quadratic problem, because, for a<br />

full form matrix, n<br />

2 (n + 1) distinct second order terms aij xi xj must be evaluated. The<br />

objective function of Problem 1.2 has been formulated with the intention of reducing the<br />

computation time per function call to O(n), without it being such a particular quadratic<br />

problem that one of the strategies could nd it especially advantageous. The strategy<br />

comparison for this problem could thereby be made for much larger numbers of variables<br />

for the prescribed maximum computation time (Tmax = 8 hours). The storage requirement<br />

for the full matrix A would also have been an obstacle to numerical tests with many<br />

parameters.<br />

To enable comparison of the experimental <strong>and</strong> theoretical results, the required number<br />

of iterations, line searches, orthogonalizations, objective function calls, <strong>and</strong> the computation<br />

time were measured in going from the initial values<br />

to an approximation<br />

x (k)<br />

i<br />

x (0)<br />

i<br />

; x i<br />

= x i + (;1)i<br />

p n for i = 1(1)n<br />

1<br />

10 x(0)<br />

i ; x i for i =1(1)n<br />

The interval of uncertainty of the variables thus had to be reduced by at least 90%.<br />

The distance covered is e ectively independent ofthenumber of variables. The above<br />

conditions were tested after each iteration, <strong>and</strong> as soon as they were satis ed the search<br />

was terminated. The convergence criteria of the strategies themselves were not suppressed,<br />

but they could not generally take e ect as they were much stricter. If they did actually<br />

operate it could be regarded as a failure of the method being applied.<br />

The results of the rst test are given in Tables 6.3 <strong>and</strong> 6.4. The number of function<br />

calls <strong>and</strong> the number of iterations or other characteristic processes involved are displayed<br />

in Figures 6.2 to 6.13 as a function of the number of parameters n on a log-log scale. As<br />

the data show, the computation time <strong>and</strong> e ort of a strategy increase sharply with n.<br />

The large range in the number of variables compared to other investigations allows the<br />

trends to be seen clearly. To facilitate an overall view, the computation times of all the<br />

strategies are plotted as a function of the number of variables in Figures 6.14 <strong>and</strong> 6.15.


Numerical Comparison of Strategies 181<br />

Table 6.3: Results of all strategies for test Problem 1.1<br />

FIBO{Coordinate strategy with Fibonacci search<br />

Number of variables Number of cycles Number of objective Computation time<br />

function calls in seconds<br />

3 1 158 0.13<br />

6 1 278 0.28<br />

10 1 456 0.53<br />

20 1 866 1.66<br />

30 1 1242 3.07<br />

60 1 2426 10.7<br />

100 1 3870 26.5<br />

200 1 7800 106<br />

300 1 10562 210<br />

600 1 21921 826<br />

1000 1 38701 2500<br />

2000 1 67451 8270<br />

(max) 2900 1 103846 19300<br />

1 cycle = n line searches<br />

GOLD{Coordinate strategy with golden section<br />

Number of variables Number of cycles Number of objective Computation time<br />

function calls in seconds<br />

1 cycle = n line searches<br />

3 1 158 0.10<br />

6 1 279 0.22<br />

10 1 458 0.51<br />

20 1 866 1.48<br />

30 1 1242 3.14<br />

60 1 2426 11.3<br />

100 1 3870 27.6<br />

200 1 7802 114<br />

300 1 10562 221<br />

600 1 21921 808<br />

1000 1 38703 2670<br />

2000 1 67431 8410<br />

2900 1 103834 18300


182 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.3 (continued)<br />

LAGR{Coordinate strategy with Lagrangian interpolation<br />

Number of variables Number of cycles Numberofobjective Computation time<br />

function calls in seconds<br />

3 1 85 0.04<br />

6 1 163 0.12<br />

10 1 271 0.30<br />

20 1 521 0.88<br />

30 1 781 1.80<br />

60 1 1561 6.68<br />

100 1 2501 17.3<br />

200 1 5001 68.6<br />

300 1 7201 153<br />

600 1 14401 546<br />

1000 1 24001 1620<br />

2000 1 46803 6020<br />

(max) 2540 1 64545 10300<br />

1 cycle = n line searches<br />

HOJE-Direct search ofHooke <strong>and</strong> Jeeves<br />

Number of variables Number of cycles Numberofobjective Computation time<br />

function calls in seconds<br />

3 4 20 0.02<br />

6 4 43 0.04<br />

10 3 48 0.06<br />

20 7 274 0.50<br />

30 3 168 0.43<br />

60 8 874 3.70<br />

100 2 352 2.37<br />

200 8 3104 40.1<br />

300 9 4954 100<br />

600 7 7503 286<br />

1000 12 23505 1460<br />

2000 9 35003 4270<br />

3000 10 58504 11200<br />

(max) 4090 13 104300 25600<br />

1 cycle = n to 2 n individual steps


Numerical Comparison of Strategies 183<br />

Table 6.3 (continued)<br />

DSCG{Davies-Swann-Campey method with Gram-Schmidt orthogonalization<br />

Number of Number of Number of line Number of objec- Computation time<br />

variables orthog. searches tive function calls in seconds<br />

3 0 3 20 0.04<br />

6 0 6 34 0.10<br />

10 0 10 56 0.20<br />

20 0 20 111 0.68<br />

30 0 30 136 1.18<br />

50 0 50 226 2.80<br />

(max) 75 0 75 338 6.10<br />

DSCP{Davies-Swann-Campey method with Palmer orthogonalization<br />

Number of Number of Number of line Number of objec- Computation time<br />

variables orthog. searches tive function calls in seconds<br />

(max) 95 0 95 428 9.49<br />

Results for n 75 identical to those of DSCG in addition.<br />

POWE{Powell's method of conjugate directions<br />

Number of Number of Number of line Number of objec- Computation time<br />

variables iterations searches tive function calls in seconds<br />

3 1 3 11 0.02<br />

6 1 6 20 0.06<br />

10 1 10 32 0.12<br />

20 1 20 62 0.32<br />

30 1 30 92 0.60<br />

60 1 60 182 1.96<br />

100 1 100 202 3.72<br />

(max) 135 1 135 407 8.60<br />

1 complete iteration = n + 1 line searches included are all the iterations begun


184 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.3 (continued)<br />

DFPS{Stewart's modi cation of the Davidon-Fletcher-Powell method<br />

Number of variables Number of iterations Number of objective Computation time<br />

function calls in seconds<br />

3 1 10 0.02<br />

6 1 16 0.04<br />

10 1 24 0.06<br />

20 1 44 0.16<br />

30 1 64 0.32<br />

60 1 124 1.14<br />

100 1 204 3.19<br />

135 1 274 5.42<br />

(max) 180 1 364 9.56<br />

1 iteration = 1 gradient evaluation <strong>and</strong> 1 line search<br />

SIMP{Simplex method of Nelder <strong>and</strong> Mead (with restart)<br />

Numberofvariables Number of restarts Number of objective Computation time<br />

function calls in seconds<br />

3 0 28 0.09<br />

6 0 104 0.64<br />

10 0 138 1.49<br />

20 0 301 8.24<br />

30 0 664 37.4<br />

60 0 1482 277<br />

100 0 1789 862<br />

(max) 135 1 5142 5270<br />

ROSE{Rosenbrock's method with Gram-Schmidt orthogonalization<br />

Number of variables Number of orthog. Number of objective Computation time<br />

function calls in seconds<br />

3 1 27 0.08<br />

6 2 60 0.32<br />

10 2 120 0.91<br />

20 1 181 2.56<br />

30 0 121 1.18<br />

40 1 281 13.7<br />

50 2 550 48.4<br />

60 2 600 78.3<br />

(max) 75 2 899 145


Numerical Comparison of Strategies 185<br />

Table 6.3 (continued)<br />

COMP{Complex method of Box (2n vertices)<br />

Number of variables Numberofobjective Computation time<br />

function calls in seconds<br />

3 69 0.22<br />

6 259 1.62<br />

10 535 6.72<br />

20 1447 72.0<br />

30 2621 211<br />

60 7263 2240<br />

(max) 95 14902 11000<br />

All numbers are averages over several attempts.<br />

EVOL{(1+1) evolution strategy (average values)<br />

Number of variables Number of mutations Computation time<br />

in seconds<br />

3 49 0.17<br />

6 154 0.79<br />

10 224 1.74<br />

20 411 6.47<br />

30 630 14.0<br />

60 1335 60.0<br />

100 2192 149<br />

150 3322 340<br />

200 4232 565<br />

300 6666 1310<br />

600 13819 5440<br />

1000 23607 15600<br />

Maximum numberofvariables (4,000) not achieved because too<br />

much computation time required.<br />

Number of objective function calls = 1 + number of mutations


186 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.3 (continued)<br />

GRUP{(10 , 100) evolution strategy (average values)<br />

Number of variables Number of generations Computation time<br />

in seconds<br />

3 4 1.81<br />

6 10 6.75<br />

10 17 16.8<br />

20 37 64.5<br />

30 55 145<br />

60 115 519<br />

100 194 1600<br />

200 377 5720<br />

300 551 13600<br />

(max) 435 854 28300<br />

Number of objective function calls:<br />

10 + 100 times number of generations.<br />

REKO{(10 , 100) evolution strategy with recombination (average<br />

values)<br />

Number of variables Number of generations Computation time<br />

in seconds<br />

3 4 2.67<br />

6 6 7.42<br />

10 13 23.3<br />

20 23 82.5<br />

30 34 177<br />

60 53 514<br />

100 84 1420<br />

200 136 4380<br />

300 180 9340<br />

(max) 435 289 21100<br />

Number of objective function calls:<br />

10 + 100 times number of generations.


Numerical Comparison of Strategies 187<br />

Table 6.4: Results of all strategies for test Problem 1.2<br />

FIBO{Coordinate strategy with Fibonacci search<br />

Number of variables Number of cycles Number of objec- Computation time<br />

tive function calls in seconds<br />

3 8 928 0.68<br />

6 22 4478 4.44<br />

10 40 12644 15.6<br />

20 87 50265 102<br />

30 132 110423 298<br />

50 227 297609 1290<br />

60 282 422911 2120<br />

100 Search terminates prematurely<br />

GOLD{Coordinate strategy with golden section<br />

Number of variables Number of cycles Number of objec- Computation time<br />

tive function calls in seconds<br />

3 8 946 0.61<br />

6 22 4418 3.96<br />

10 40 12622 14.5<br />

20 86 50131 102<br />

30 133 111219 287<br />

50 226 296570 1330<br />

60 279 423471 2040<br />

100 Search terminates prematurely<br />

LAGR{Coordinate strategy with Lagrangian interpolation<br />

Number of variables Number of cycles Number of objec- Computation time<br />

tive function calls in seconds<br />

3 8 586 0.39<br />

6 22 2826 2.48<br />

10 40 8023 9.55<br />

20 87 32452 62.8<br />

30 134 70889 192<br />

60 272 263067 1320<br />

100 519 703130 5770<br />

150 Search terminates prematurely


188 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.4 (continued)<br />

HOJE{Direct search ofHooke <strong>and</strong> Jeeves<br />

Number of variables Number of cycles Number of objec- Computation time<br />

tive function calls in seconds<br />

3 11 65 0.04<br />

6 30 353 0.34<br />

10 26 502 0.62<br />

20 78 3035 5.70<br />

30 111 6443 16.3<br />

60 212 24801 119<br />

100 367 71345 547<br />

200 727 284060 4270<br />

300 1117 656113 14800<br />

DSCG{Davies-Swann-Campey method with Gram-Schmidt orthogonalization<br />

Number of Number of Number of line Number of objec- Computation time<br />

variables orthog. searches tive function calls in seconds<br />

3 3 16 87 0.22<br />

6 7 55 195 0.87<br />

10 8 101 323 2.70<br />

20 16 361 1209 29.2<br />

30 21 691 2181 110<br />

40 42 1802 5883 484<br />

50 27 1451 4453 582<br />

60 44 2822 9308 1540<br />

(max) 75 87 6676 20365 5790<br />

DSCP{Davies-Swann-Campey method with Palmer orthogonalization<br />

Number of Number of Number of line Number of objec- Computation time<br />

variables orthog. searches tive function calls in seconds<br />

3 3 16 84 0.22<br />

6 7 55 194 0.78<br />

10 8 101 324 1.54<br />

20 16 361 1208 10.3<br />

30 28 901 2809 33.8<br />

50 28 1501 4610 89.7<br />

75 79 6076 18591 547<br />

(max) 95 100 9691 29415 1090


Numerical Comparison of Strategies 189<br />

Table 6.4 (continued)<br />

POWE{Powell's method of conjugate directions<br />

Number of Number of Number of line Number of objec- Computation time<br />

variables iterations searches tive function calls in seconds<br />

70<br />

80<br />

90<br />

100<br />

(max) 135<br />

3 3 11 27 0.08<br />

6 5 35 77 0.30<br />

10 9 99 215 0.97<br />

20 17 354 744 4.82<br />

30 53 1621 3401 24.1<br />

40 search becomes in nite { no convergence<br />

50 175 8864 21532 235<br />

60 138 8367 19677 222<br />

9<br />

>=<br />

><br />

search becomes in nite { no convergence<br />

DFPS{Stewart's modi cation of the Davidon-Fletcher-Powell method<br />

Number of Number of Number of objec- Computation time Fatal errors<br />

variables iterations tive function calls in seconds<br />

3 3 20 0.04<br />

6 4 41 0.14<br />

10 5 74 0.34<br />

20 7 178 1.36<br />

30 9 333 3.63<br />

60 13 926 19.7<br />

100 17 2003 67.9<br />

135 20 3190 145<br />

(max) 180 22 4757 270 2 oating<br />

divide checks


190 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.4 (continued)<br />

SIMP{Simplex method of Nelder <strong>and</strong> Mead (with restart)<br />

Number of Number of Number of objec- Computation time<br />

variables restarts tive function calls in seconds<br />

3 0 29 0.09<br />

6 1 173 1.06<br />

10 0 304 3.17<br />

20 0 2415 77.6<br />

30 0 8972 579<br />

40 2 28202 3030<br />

50 1 53577 8870<br />

60 1 62871 13700<br />

70 1 86043 25800<br />

ROSE{Rosenbrock's method with Gram-Schmidt<br />

orthogonalization<br />

Number of Number of Number of objec- Computation time<br />

variables orthog. tive function calls in seconds<br />

3 3 38 0.12<br />

6 4 182 0.82<br />

10 8 678 4.51<br />

20 12 2763 35.6<br />

30 14 5499 114<br />

40 19 10891 329<br />

50 21 15396 645<br />

60 23 20911 1130<br />

(max) 75 34 43670 3020<br />

COMP{Complex method of Box (2n vertices)<br />

Number of Number of objective Computation time<br />

variables function calls in seconds<br />

3 60 0.21<br />

6 302 2.06<br />

10 827 12.0<br />

20 5503 235<br />

30 24492 2330<br />

Search sometimes terminates prematurely<br />

40 Search always terminates prematurely<br />

All numbers are averages over several attempts


Numerical Comparison of Strategies 191<br />

Table 6.4 (continued)<br />

EVOL{(1+1) evolution strategy (average values)<br />

Number of variables Number of mutations Computation time<br />

in seconds<br />

3 85 0.33<br />

6 213 1.18<br />

10 728 6.15<br />

20 2874 44.4<br />

30 5866 136<br />

60 24089 963<br />

100 69852 4690<br />

150 152348 15200<br />

GRUP{(10 , 100) evolution strategy (average values)<br />

Number of variables Number of generations Computation time<br />

in seconds<br />

3 5 2.02<br />

6 14 9.36<br />

10 53 49.4<br />

20 183 326<br />

30 381 955<br />

50 1083 4400<br />

80 2977 18600<br />

100 4464 35100<br />

REKO{(10 , 100) evolution strategy with recombination (average<br />

values)<br />

Number of variables Number of generations Computation time<br />

in seconds<br />

3 6 2.44<br />

6 15 18.9<br />

10 42 76.2<br />

20 162 546<br />

30 1322 6920<br />

40 9206 61900<br />

Figures 6.2 to 6.13 translate the numerical data into vivid graphics. The abbreviations<br />

used here are:<br />

OFC st<strong>and</strong>s for objective function calls<br />

ORT st<strong>and</strong>s for orthogonalizations<br />

The parameters 1.1 <strong>and</strong> 1.2 refer to Problems 1.1 <strong>and</strong> 1.2 as mentioned above.


192 Comparison of Direct Search Strategies for Parameter Optimization<br />

Figure 6.2: Coordinate strategy with Fibonacci search<br />

Figure 6.3: Coordinate strategy with golden section


Numerical Comparison of Strategies 193<br />

Figure 6.4: Coordinate strategy with Lagrangian interpolation<br />

Figure 6.5: Strategy of Hooke <strong>and</strong> Jeeves


194 Comparison of Direct Search Strategies for Parameter Optimization<br />

Figure 6.6: Strategy of Davies, Swann, <strong>and</strong> Campey with Gram-Schmidt orthogonalization<br />

Figure 6.7: Strategy of Davies, Swann, <strong>and</strong> Campey with Palmer orthogonalization<br />

.


Numerical Comparison of Strategies 195<br />

no convergence<br />

Figure 6.8: Strategy of Powell with conjugate directions<br />

Figure 6.9: Strategy of Davidon, Fletcher, Powell, <strong>and</strong> Stewart as formulated<br />

by Lill (variable metric)<br />

no convergence


196 Comparison of Direct Search Strategies for Parameter Optimization<br />

Figure 6.10: Strategy of Rosenbrock with Gram-Schmidt orthogonalization<br />

Figure 6.11: Left: Simplex strategy of Nelder <strong>and</strong> Mead,<br />

Right: Complex strategy of Box


Numerical Comparison of Strategies 197<br />

Mutations 1.2<br />

Mutations 1.1<br />

Figure 6.12: (1+1) evolution strategy<br />

Figure 6.13: Left: (10 , 100) evolution strategy without recombination,<br />

Right: (10 , 100) evolution strategy with recombination


198 Comparison of Direct Search Strategies for Parameter Optimization<br />

10 1<br />

10 0<br />

10 −1<br />

10 −2<br />

10 −3<br />

−4<br />

10<br />

10 0<br />

Computation time (sec)<br />

2<br />

(Number of variables)<br />

10 1<br />

(10,100) evolution strategy without recombination<br />

(1+1) evolution strategy<br />

Complex strategy of Box<br />

Strategy of Rosenbrock<br />

Simplex strategy of Nelder <strong>and</strong> Mead<br />

Strategy of Davidon−Fletcher−Powell−Stewart<br />

(10,100) evolution strategy, parallel<br />

10 2<br />

.......<br />

Figure 6.14: Result of the rst comparison test:<br />

computation times for Problem 1.1<br />

COMP SIMP<br />

10 3<br />

Strategy of Powell<br />

DSC−Strategy (Palmer−Orthog.)<br />

DSC−Strategy (Gram−Orthog.)<br />

Strategy of Hooke <strong>and</strong> Jeeves<br />

Coordinate / Lagrange<br />

Coordinate / Fibonacci<br />

...................<br />

Number of variables<br />

−4<br />

10<br />

ROSE<br />

GRUP<br />

EVOL<br />

FIBO<br />

HOJE<br />

LAGR<br />

DSCP/<br />

DSCG<br />

POWE<br />

DFPS


Numerical Comparison of Strategies 199<br />

10 0<br />

10 −1<br />

−2<br />

10<br />

−3<br />

10<br />

10 −4<br />

−5<br />

10<br />

10 0<br />

Computation time (sec)<br />

(Number of variables) 3<br />

10 1<br />

COMP<br />

Figure 6.15: Result of the rst comparison test:<br />

computation times for Problem 1.2<br />

Meanings of the symbols as in Figure 6.14<br />

10 2<br />

SIMP<br />

.................................<br />

Number of Variables<br />

DSCG<br />

ROSE<br />

GRUP<br />

FIBO/<br />

GOLD<br />

LAGR<br />

EVOL<br />

DSCP<br />

POWE<br />

HOJE<br />

DFPS<br />

10 3


200 Comparison of Direct Search Strategies for Parameter Optimization<br />

Points that deviated greatly from the trends have been omitted. To emphasize the<br />

di erences between the methods, instead of the computation time T the quantities T=n 2<br />

for Problem 1.1 <strong>and</strong> T=n 3 for Problem 1.2 have been plotted on a logarithmic scale.<br />

For solving Problem 1.1 nearly all strategies require computation times of the order<br />

of O(n 2 ). This corresponds to O(n) objective function calls, each requiring O(n) computation<br />

time. As expected, the most successful methods are the two that theoretically<br />

show quadratic convergence, namely the method of conjugate directions (Powell) <strong>and</strong> the<br />

variable metric method (DFPS). They obtain the solution within one iteration <strong>and</strong> n line<br />

searches respectively. For this simple problem, however, the same can be said for strategies<br />

with cyclic variation of the variables, since the search directions are the same. Of the<br />

three coordinate methods, the one with quadratic interpolation is a bit faster than the<br />

two which use sequential interval division. The latter two are of equal merit. The strategy<br />

of Davies, Swann, <strong>and</strong> Campey (DSC) also performs very well. Since the objective is<br />

reached within the rst n line searches, no orthogonalizations need to be carried out. For<br />

this reason too both versions yield identical results for n 75.<br />

The evolution strategies live up to expectations in so far as the number of mutations<br />

or generations increases linearly with n. The number of objective function calls<br />

<strong>and</strong> the computation times are, however, considerably higher than those of the previously<br />

mentioned methods. For r (0) =r (M ) = 10 the approximate theory of the two membered<br />

evolution strategy with optimal step length control predicts the number of mutations to be<br />

M ' (5 ln 10) n ' 11:5 n<br />

In fact nearly twice as many objective function calls (about 22 n) are required. This is<br />

partly because of the discrete way inwhichthevariances are adjusted <strong>and</strong> partly because<br />

the chosen reduction factor of 0.85 corresponds to a success rate below the optimal value<br />

of 0.27. The ASSRS (adaptive step size r<strong>and</strong>om search) method of Schumer <strong>and</strong> Steiglitz<br />

(1968), which resembles the simple evolution strategy, is presently the most e ective<br />

r<strong>and</strong>om method as far as we know. According to the experimental results of Schumer<br />

(1967) for Problem 1.2, taking into account the di erent initial <strong>and</strong> nal conditions, it<br />

requires about the same number of steps as the (1+1) evolution strategy.<br />

It is noteworthy that the (10 , 100) strategy without recombination only takes about<br />

10 times as much time as the (1+1) method, in spite of having to execute 100 mutations<br />

per generation. This factor of acceleration is signi cantly higher than the theory for a<br />

(1 , 10) version would indicate <strong>and</strong> is closer to the calculated value for a (1 , 30) strategy.<br />

In the case of many variables, recombination further reduces the required number of<br />

generations by two thirds. This is less apparent in the computation time that is increased<br />

by the extra arithmetic operations, compared to the relatively inexpensive calculation of<br />

one objective function value. Thus, in the gures showing computation times only the<br />

(10 , 100) evolution without recombination has been included.<br />

The strategy of Hooke <strong>and</strong> Jeeves appears to require computation times rather more<br />

than O(n 2 )onaverage for many variables, nearer O(n 2:2 ). This arises from the slight<br />

increase with n of the number of exploratory moves. The likely cause is the xed initial<br />

step length, which for problems with manyvariables is signi cantly too big <strong>and</strong> must rst<br />

be reduced to the appropriate size. Three search strategies exhibit strikingly di erent<br />

behavior.


Numerical Comparison of Strategies 201<br />

The method of Rosenbrock requires computation times on the order of O(n 3 ). This<br />

can be readily understood. Up to the single exception of n = 30, in each case one or two<br />

orthogonalizations are performed. The Gram-Schmidt method employed performs O(n 3 )<br />

operations. If the number of variables is large the orthogonalization time is of major<br />

signi cance whenever the time for one function call increases less than quadratically with<br />

the number of variables. One can see here that the number of objective function calls is<br />

not always su cient tocharacterize the cost of a strategy. In this case the DSC method<br />

succeeds with no orthogonalizations. The introduction of quadratic interpolation proves<br />

to give better results than the single step method of Rosenbrock.<br />

Computation times for the simplex <strong>and</strong> complex strategies also increase as n 3 ,oreven<br />

somewhat more steeply with n for many variables. The determining factor for the cost in<br />

this case is calculating the centroid of the simplex (or complex), about which theworst<br />

of the (n +1)or2n vertices is re ected. This process takes O(n 2 ) additions. Since the<br />

number of re ections <strong>and</strong> objective function calls increases as n, the cost increases, simply<br />

on this basis, as O(n 3 ). Even in this simplest of all quadratic problems the simplex of the<br />

Nelder-Mead method collapses if the number of variables is large. To avoid premature<br />

termination of the optimum search, in the presently used algorithm for this strategy the<br />

simplex is initialized again. The search can thereby be prevented from stagnating in a<br />

subspace of IR n , but the required computation time increases even more rapidly than<br />

O(n 3 ). The situation is even worse for the complex method of Box. The author suggests<br />

using 2 n vertices for problems with few variables <strong>and</strong> considers that this number could<br />

be reduced for many variables. However, the attempt to solve Problem 1.1 for n = 30<br />

with a complex of 40 vertices fails in one of three cases with di ering sequences of r<strong>and</strong>om<br />

numbers: i.e., the search process ends before achieving the required approximation to<br />

the objective. For n = 40 <strong>and</strong> 50 vertices the complex collapsed prematurely in all<br />

three attempts. With 2 n vertices the complex strategy is successful up to the maximum<br />

possible number of variables, n = 95. Here again, however, for n>30 the computation<br />

time increases faster than O(n 3 ) with the number of parameters. It is therefore dubious<br />

whether the search would have been pursued to the point of reaching the maximum<br />

internally speci ed accuracy.<br />

The second order methods only distinguish themselves from other strategies for solving<br />

Problem 1.1 in that their required computation time<br />

T = cn 2 c = const:<br />

is characterized by a small constant of proportionality c. Their full capabilities should<br />

become apparent in solving the true quadratic problem (Problem 1.2). The variable<br />

metric method lives up to this expectation. According to theory it has the property Qn,<br />

which means that after n iterations, n 2 line searches, <strong>and</strong> O(n 3 ) computation time the<br />

problem should be solved. It comes as something of a surprise to nd that the numerical<br />

tests indicate a requirement for only about O(n 0:5 ) iterations <strong>and</strong> O(n 2:5 ) computation<br />

time. This apparent discrepancy between theory <strong>and</strong> experiment is explained if we note<br />

that the property Qnsigni es absolute accuracy within at most n iterations, while in<br />

this example only a nite reduction of the uncertainty interval is required.<br />

More surprising than the good results of the DFPS method is the behavior of the


202 Comparison of Direct Search Strategies for Parameter Optimization<br />

strategy of Powell, which in theory is also quadratically convergent. Not only does it<br />

require signi cantly more computation time, it even fails completely when the number of<br />

parameters is large. And in the case of n =40variables the step length goes to zero along<br />

achosen direction. The convergence criterion is subsequently not satis ed <strong>and</strong> the search<br />

process becomes in nite it must be interrupted externally. For n =50<strong>and</strong>n = 60 the<br />

Powell method does converge, but for n = 70 80 90 100 <strong>and</strong> 130 it fails again. The<br />

origin of this behavior was not investigated further, but it may well have to do with the<br />

objection raised by Zangwill (1967) against the completeness of Powell's (1964) proof of<br />

convergence. It appears that rounding errors combined with small step lengths in the one<br />

dimensional search can cause linearly dependent directions to be generated. However,<br />

independence of the n directions is the precondition for them to be conjugate to each<br />

other.<br />

The coordinate strategies also fail to converge when the number of variables in Problem<br />

1.2 becomes very large. With the Fibonacci search <strong>and</strong> golden section as interval<br />

division methods they fail for n 100, <strong>and</strong> with quadratic interpolation for n 150.<br />

For successful line searching the step lengths would have to be smaller than allowed by<br />

the nite word length of the computer used. This phenomenon only occurs for many variables<br />

because the condition of the matrix of coe cients in Problem 1.2 varies as O(n 2 ).<br />

In this proportion the elliptic contour surfaces F (x) = const: become gradually more<br />

extended <strong>and</strong> the relative minimizations along the coordinate directions become less <strong>and</strong><br />

less e ective. This failure is typical of methods with variation of individual parameters<br />

<strong>and</strong> demonstrates how important it can be to choose other search directions. This is<br />

where r<strong>and</strong>om directions can prove advantageous (see Chap. 4).<br />

Computation times for the method of Hooke <strong>and</strong> Jeeves <strong>and</strong> the method of Davies-<br />

Swann-Campey (DSC) clearly increase as O(n 3 )ifPalmer orthogonalization is employed<br />

for the latter. For the method of Hooke <strong>and</strong> Jeeves this corresponds to O(n) exploratory<br />

moves <strong>and</strong> O(n 2 ) function calls for the DSC method it corresponds to O(n) orthogonalizations<br />

<strong>and</strong> O(n 2 ) line searches <strong>and</strong> objective function evaluations. The original<br />

Gram-Schmidt procedure for constructing mutually orthogonal directions requires O(n 3 )<br />

rather than O(n 2 ) arithmetic operations. Since the type of orthogonalization seems to<br />

hardly alter the sequence of iterations, with the Gram-Schmidt subroutine the DSC strategy<br />

takes O(n 4 ) instead of O(n 3 ) basic operations to solve Problem 1.2. For the same<br />

reason the Rosenbrock method requires computation times that increase as O(n 4 ). It<br />

is, however, striking that the single step method (Rosenbrock) in conjunction with the<br />

suppression of orthogonalization until at least one successful step has been made in each<br />

direction requires less time than line searching, even if only one quadratic interpolation<br />

is performed. In both these methods the number of objective function calls, which isof<br />

order O(n 2 ), plays only a secondary r^ole.<br />

Once again the simplex <strong>and</strong> complex strategies are the most expensive. From n =30,<br />

the method of Nelder <strong>and</strong> Mead does not come within the required distance of the objective<br />

without restarts. Even for just six variables the search simplex has to be re-initialized<br />

once. The number of objective function calls increases approximately as O(n 3 ), hence<br />

the computation time increases as O(n 5 ). The strategy of Box with 2 n vertices shows<br />

a correspondingly steep increase in the time with the number of variables. For n =30


Numerical Comparison of Strategies 203<br />

Problem 1.2 was actually only solved in one out of three attempts, <strong>and</strong> for n = 40 not at<br />

all. If the number of vertices of the complex is reduced to n + 10 the method fails from<br />

n = 20.<br />

As in Problem 1.1, the cost of the evolution strategies increases rather smoothly with<br />

the number of parameters{more so than for several of the deterministic search methods.<br />

To solve Problem 1.2, O(n 2 ) objective function calls are required, corresponding to O(n 3 )<br />

computation time. Since the distance to be covered is no greater than it was in Problem<br />

1.1, the greater cost must have been caused by the locally smaller curvatures. These are<br />

related to the lengths of the semi-axes of the contour ellipsoids. Because of the regular<br />

structure of the matrix of coe cients A of the quadratic objective function in Problem<br />

1.2, the condition number K, the ratio of greatest to least semi-axes (cf. test Problem<br />

1.2 in Appendix A, Sect. A.1)<br />

K = amax<br />

amin<br />

can be considered as the only quantity of signi cance in determining the geometry of the<br />

contour pattern. The remaining semi-axes will distribute themselves uniformly between<br />

amin <strong>and</strong> amax. The fact that K increases as O(n 2 ) suggests that the rate of progress ',<br />

the average change in the distance from the objective per mutation or generation, only<br />

decreases as the square root of the condition number. There is so far no theory for the<br />

general quadratic case. Such a theory will also look more complicated, since apart from<br />

the ratio of greatest to smallest semi-axis a further n ; 2 parameters that determine the<br />

shape of the hyperellipsoid will play ar^ole. The position of the starting point will also<br />

have an e ect, although in the case of many variables only at the beginning of the search.<br />

After a transition phase the starting point ofmutations will always lie in the vicinity ofa<br />

point where the objective function contour surfaces are most curved. In the sphere model<br />

theory of Rechenberg, if r is regarded as the average local radius of curvature, the rate of<br />

progress at worst should become inversely proportional to the square root of the condition<br />

number. The convergence rate of the evolution strategy would then be comparable to that<br />

of the strategy of steepest descents, for which function values of two consecutive iterations<br />

in the quadratic case are in the ratio (Akaike, 1960)<br />

amax ; amin<br />

amax + amin<br />

Compared to other methods having costs in computation time that increase as O(n 3 ),<br />

the evolution strategies fare better than they did in Problem 1.1. Besides the fact that<br />

the coordinate strategies do not converge at all when the number of variables becomes<br />

large, they are surpassed in speed by thetwo membered evolution strategy. The relative<br />

performance of the two membered <strong>and</strong> multimembered evolution strategies without<br />

recombination remains about the same.<br />

The behavior of the (10 , 100) evolution strategy with recombination deviates from that<br />

of the other versions. It requires considerably more computation time to solve Problem<br />

1.2. This can be attributed to the fact that, although the probability distribution for<br />

mutation steps alters, it cannot adapt continuously to the local conditions. Whilst the<br />

mutation ellipsoid, the locus of all equiprobable mutation steps, can extend <strong>and</strong> contract<br />

2


204 Comparison of Direct Search Strategies for Parameter Optimization<br />

along the coordinate directions, it cannot rotate in the space. To do so, not only the<br />

variances but also the orientation or covariances would need to be variable (for such<br />

an extension see Chap. 7 <strong>and</strong> subroutine KORR). As the results show, starting from a<br />

spherical shape the mutation ellipsoid adopts a con guration that initially accelerates the<br />

search process. As it progresses towards the objective the ellipsoid must become smaller<br />

but it should also gradually rotate to follow the orientation of the contour lines. That<br />

is not possible because the mechanism adopted here allows no mutual dependence of the<br />

components of the r<strong>and</strong>om vector. The ellipsoid rst has to form itself into a sphere<br />

again, or to become generally small, before it extends again with the longer axes in new<br />

directions. This awkward process actually occurs, but it causes an appreciable delay in<br />

the search.<br />

There is a further undesirable phenomenon. Supposing that a single variance suddenly<br />

becomes very much smaller. The associated variation in the variables then takes place<br />

in an (n ; 1)-dimensional subspace of IR n (for a more detailed analysis see Schwefel,<br />

1987). Other things being equal, the probability of a success is thereby greater than if<br />

all the parameters had varied. Step length alterations of this kind are therefore favored<br />

<strong>and</strong>, together with the resistance to rotation of the mutation ellipsoid, they enhance the<br />

unstable behavior of the strategy with recombination. This can be prevented by having a<br />

large population, in which there is always a su cient supply of di erent kinds of parameter<br />

combinations for the variances as well. Another possibility istoallow one individual to<br />

execute several consecutive mutations with one setting of the step length parameters.<br />

Then the overall success depends rather less on the instantaneous probability of success<br />

<strong>and</strong> more on the size of the partial successes. The quality of the strategy parameters is<br />

thereby assessed more objectively. It should be noticed that Problem 1.2 is actually the<br />

only one in which recombination appears troublesome. In many other cases it led to a<br />

reduction in the computation cost, even in the simple form applied here (see second <strong>and</strong><br />

third test).<br />

6.3.3.2 Second Test: Reliability<br />

Convergence in the quadratic case is a minimum requirement of non-linear optimization<br />

methods. The unsatisfactory results of the coordinate strategies <strong>and</strong> of Powell's method<br />

for a large number of variables con rm the necessityofnumerical tests even when convergence<br />

is assured by theory. Even more important, in fact unavoidable, are experimental<br />

tests of the reliabilityofconvergence of optimization methods on non-quadratic, non-linear<br />

problems. Some methods with an internal quadratic model of the objective function have<br />

to be modi ed in order to deal with more general problems. Such, for example, is the<br />

method of conjugate gradients. The method of Fletcher <strong>and</strong> Reeves (1964) actually terminates<br />

after the relative minimum has been obtained in each ofn conjugate directions.<br />

However, for higher order objective functions the optimum will not have been reached<br />

after n iterations. Even in quadratic problems, if they are ill-conditioned, more iterations<br />

may be required. There are two possible ways to proceed. Either the iteration process can<br />

be formally continued beyond n line searches or it can be repeated in a cyclic way. Fletcher<br />

<strong>and</strong> Reeves recommend destroying all the accumulated information after each setofn +1


Numerical Comparison of Strategies 205<br />

iterations <strong>and</strong> beginning again, i.e., with uncorrected gradient directions. This procedure<br />

is said to be more e ective for non-quadratic objective functions. On the other h<strong>and</strong>,<br />

Fox (1971) suggests that a periodic restart of the search can prevent convergence in the<br />

quadratic case, whereas a simple continuation of the sequence of iterations is successful.<br />

Further suggestions for the way to restart are made by Fletcher (1972a).<br />

The situation is similar for the quasi-Newton methods in which the Hessian matrix or<br />

its inverse is approximated in discrete steps. Some of the proposed formulae for improving<br />

the approximation matrix can lead to division by zero sometimes due to rounding errors<br />

(Broyden, 1972), but in other cases even on theoretical grounds. If the Hessian matrix has<br />

singular points, the optimization process stagnates before reaching the optimum. Bard<br />

(1968) <strong>and</strong> others recommend as a remedy replacing the approximation matrix from time<br />

to time by the unit matrix. The information gathered over the course of the iterations<br />

is destroyed again in this process. Pearson (1969) proposes a restart period of 2 n cycles,<br />

while Powell (1970b) suggests regularly adding steps di erent from the predicted ones. It<br />

is thus still true to say of the property of quadratic termination that its \relevance for<br />

general functions has always been questionable" (Fletcher, 1970b). No guarantee is given<br />

that Newtonian directions are better than the (anti-) gradient.<br />

As there is no single objective function that can be taken as representative for determining<br />

experimentally the properties of a strategy in the non-quadratic case, as large<br />

<strong>and</strong> as varied a range of problem types as possible must be included in the numerical<br />

tests. To a certain extent, it is true to say that the greater their number <strong>and</strong> the more<br />

skillfully they are chosen, the greater the value of strategy comparisons. Some problems<br />

have become established as st<strong>and</strong>ard examples, others are added to each experimenter's<br />

own taste. Thus in the catalogue of problems for the second series of tests in the present<br />

strategy comparison, both familiar <strong>and</strong> new problems can be found the latter were mainly<br />

constructed in order to demonstrate the limits of usefulness of the evolution strategies.<br />

It appears that all the previously published tests use as a basis for judging perfor-<br />

mance the number of function calls (with objective function, gradient, <strong>and</strong> Hessian matrix<br />

weighted in the ratio 1 : n : n<br />

(n+1)) <strong>and</strong> the computation time for achieving a prescribed<br />

2<br />

accuracy. Usually the objective functions considered are several times continuously differentiable<br />

<strong>and</strong> depend on relatively few variables, <strong>and</strong> the results lack compatibility from<br />

problem to problem <strong>and</strong> from strategy to strategy. With one method, a rst minimum<br />

may be found very quickly, <strong>and</strong> a second much moreslowly another method may work<br />

just the opposite way round. The abundance of individual results actually makes a comprehensive<br />

judgement more di cult. Hence average values are frequently calculated for<br />

the required computation time <strong>and</strong> the number of function calls. Such tests then result<br />

in establishing that second order methods are faster than rst order <strong>and</strong> these in turn<br />

are faster than direct search methods. These conclusions, which are compatible with<br />

the test results for quadratic problems, lead one to suspect that the selected objective<br />

functions behave quadratically, at least in the neighborhood of the objective. Thus it<br />

is also frequently noted that, at the beginning of a search, gradient methods converge<br />

faster, whereas towards the end Newton methods are faster. The average values that<br />

are measured therefore depend on the chosen starting point <strong>and</strong> the required closeness of<br />

approach to the objective.


206 Comparison of Direct Search Strategies for Parameter Optimization<br />

The assessment is tricky if a method does not converge for a particular problem but<br />

terminates the search following its own criteria without getting anywhere near the solution.<br />

Any strategy that fails frequently in this way cannot be recommended for use in practice<br />

even if it is especially fast in other cases. In a practical problem, unlike a test problem,<br />

the correct solution is not, of course, known in advance. One therefore has to be able<br />

to rely on the results given by a strategy if they cannot be checked by another method.<br />

Hence, reliability is just as important a criterion for assessing optimization methods as<br />

speed.<br />

The second part of the strategy comparison is therefore designed to test the robustness<br />

of the optimization methods. The scale for assessing this is the number of problems that<br />

are solved by agiven method. Since in this respect it is the complexity rather than size<br />

of the problem that is signi cant, the number of variables ranges only from one to six.<br />

All numerical iteration methods in practice can only approximate a solution with a<br />

nite accuracy. In order to be able either to accept the end result of an optimum search<br />

as adequate, or to reject it as inadequate, a border must be de ned explicitly, on one side<br />

of which the solution is exact enough <strong>and</strong> on the other side of which it is unsatisfactory.<br />

It is the structure of the objective function that is the decisive factor determining the<br />

accuracy that can be achieved (Hyslop, 1972). With this in mind the border values for<br />

the purpose of ranking the test results were obtained by the following scheme. Starting<br />

from the known exact or best solution<br />

x =(x 1 x 2 ::: x n ) T<br />

the variables were individually altered by the amounts<br />

4xi =<br />

(<br />

for x i =0<br />

x i for x i 6= 0<br />

in all combinations. For example for n = 2 one obtains eight di erent test values of the<br />

objective function (see Fig. 6.16). In the general case there are 3 n ; 1 di erent values.<br />

The greatest deviation 4F ( ) from the optimal value F (x ) de nes the border between<br />

results that approach the objective su ciently closely <strong>and</strong> results that do not. To obtain<br />

anumber of grades of merit, four di erent test increments j j = 1(1)4 were selected:<br />

1 =10 ;38<br />

2 =10 ;8<br />

3 =10 ;4<br />

4 =10 ;2<br />

A problem is deemed to have been solved \exactly" at ~x if<br />

F (~x) F (x )+4F ( 1)<br />

is attained. On the other h<strong>and</strong>, if at the end of the search<br />

F (~x) >F(x )+4F ( 4)


Numerical Comparison of Strategies 207<br />

x 2<br />

x<br />

1<br />

<strong>Optimum</strong><br />

Test position<br />

Figure 6.16: Eight di erent testvalues of the objective function in case of n =2<br />

the strategy employed has failed. Three intermediate classes of approximation are de ned<br />

in the obvious way.<br />

The maximum possible accuracy was required of all strategies. The corresponding<br />

free parameters of the strategies that enter the termination criteria have already been<br />

de ned in Table 6.2. In contrast to the rst test, no additional common termination rule<br />

was employed.<br />

A total of 50 problems were to be solved. The mathematical formulations of the<br />

problems are given in Appendix A, Section A.2. Some of them are only distinguished by<br />

the chosen initial conditions, others by the applied constraints. Nine out of 14 strategies<br />

or versions of basic strategies are not suited to solving constrained problems, at least not<br />

directly. Methods involving transformations of the variables <strong>and</strong> penalty function methods<br />

were not employed. An exception is the method of Rosenbrock, which only alters the<br />

objective function near the boundaries <strong>and</strong> can be applied in one pass otherwise penalty<br />

functions require a sequence of partial optimizations to be executed. The second series of<br />

tests therefore comprises one set of 28 unconstrained problems for all 14 strategies <strong>and</strong> a<br />

second set of 22 constrained problems for 5 of the strategies. The results are displayed<br />

together in Tables 6.5 to 6.8. The approximation to the objective that has been achieved<br />

in each case is indicated by a corresponding symbol, using the classes of accuracy de ned<br />

above.<br />

Any interesting features in the solution of individual problems are documented in the<br />

Appendix A, Section A.2, in some cases together with a brief analysis. Thus at this point<br />

it is only necessary to make some general observations about the reliability of the search<br />

methods for the totality of problems.<br />

Unconstrained Problems<br />

The results of the three versions of the coordinate strategies are very similar <strong>and</strong> generally<br />

unsatisfactory. A third of all the problems cannot be solved with them at all, or only<br />

very inaccurately. Exact solutions ( =10 ;38 ) are the exception <strong>and</strong> only in less than a


208 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.5: Results of all strategies in the second comparison test,<br />

unconstrained problems<br />

Problem F G L H D D P D S R C E G R<br />

I O A O S S O F I O O V R E<br />

B L G J C C W P M S M O U K<br />

No. O D R E G P E S P E P L P O<br />

2.1 3 3 3 1 1 1 2 1e 2n 1 2 1 1 1<br />

2.2 3 3 3 1 1 1 1 2 2a 1a 2 1 1 1<br />

2.3 2 2 3 1 1 1 1 1 1n 1a 4 1 1 1<br />

2.4 3 3 3 1 1 1 2 2e 3 1 3 1 1 1<br />

2.5 2 2 2 2 1 1 2 1e 2a 1 2 1 1 1<br />

2.6 5 5 5 5 2 2 5 3 2a 1 3 1 1 1<br />

2.7 5 5 4 2 5ea 5ea 5e 5 3 5 3 2 1 1<br />

2.8 5 5 4 3 5ea 5ea 5e 5 3 5 2 1 1 1<br />

2.9 3 3 3 2 2 2 2e 2 1a 3 1 1 2 1<br />

2.10 5 5 4 3 2 2 5a 4e 4n 2 2 4 3 1<br />

2.11 5 5 5 3 2 2 2 4 4n 2 2 4 3 1a<br />

2.12 5 5 5 3 2 3 2 4e 2 2 2 4 3 1a<br />

2.13 3 3 3 2 2 1 3 2 3 3 3 2 1 1<br />

2.14 3 3 3 2 2 2 2a 5e 2n 2 2 3r 3r 3r<br />

2.15 3 3 3 2 2 2 2ea 3 2 2 2 3r 3r 3r<br />

2.16 2 1 2 2 1 1 2 2 2n 2 3 3 2 2<br />

2.17 2 2 1 2 2 2 2e 2 1a 2 1 1 1 1<br />

2.18 5 5 5 2 2 2 2e 2 1an 1 1 1 1 1<br />

2.19 5 5 5 5 2 2 5 2e 2 2 3 3 2 3<br />

2.20 2 2 2 2 3 2 2 1e 3n 2 2 1 1 1<br />

2.21 5 5 5 2 4 2 5 2e 5a 2 5 1 1 1<br />

2.22 2 2 2 2 2 2 2 5 5a 2 5 1 1 1<br />

2.23 1 1 1 1 1 1 5a 5e 1a 1a 1 1 1 1<br />

2.24 3 3 3 2 1 1 2 2 2 2 2 2 2 1<br />

2.25 3 3 3 2 1 1 2e 2e 3 1 3 1 1 1<br />

2.26 1 1 2 1 1 1 1e 1 1n 1 4 1 1 1<br />

2.27 1 1 5 2 1 1 1 5 1 1 2 1 1 1<br />

2.28 4 4 4 3 4 3 2 4e 2 3 1 4 4 3<br />

Sum 91 90 93 61 56 52 74 79 65 54 68 51 51 37<br />

Meaning of the number <strong>and</strong> letter symbols used above:<br />

1 Accuracy achieved better than 10 ;38<br />

2 Accuracy achieved better than 10 ;8<br />

3 Accuracy achieved better than 10 ;4<br />

4 Accuracy achieved better than 10 ;2<br />

5 Accuracy achieved worse than 10 ;2<br />

e Fatal execution error ( oating over ow, oating divide check)<br />

a Termination rule ine ective search in nite with no further convergence<br />

r Computation time too long or convergence too slow search terminated<br />

n Concerns the simplex method of Nelder <strong>and</strong> Mead: restart(s) required


Numerical Comparison of Strategies 209<br />

third of all cases are the end results good ( 10 ;8 ). As already shown by the quadratic<br />

objective function models, it appears again that progress along the directions of the unit<br />

vectors becomes possible only in very small step lengths. The limit of smallest possible<br />

changes in the variables, as de ned by the nite word <strong>and</strong> mantissa lengths of the digital<br />

computer, is often reached before the search has come su ciently close to the objective.<br />

The three methods with rotating axes also behave similarly to one another, namely<br />

the strategies of Rosenbrock <strong>and</strong> of Davies, Swann, <strong>and</strong> Campey. Although the choice<br />

of orthogonalization method (Gram-Schmidt or Palmer) has a considerable e ect on the<br />

computation times it makes little di erence to the accuracies achieved. If \exact" solutions<br />

are required, all three methods prove useful in about 4 out of 10 cases. This proportion<br />

is doubled if the accuracy requirement islowered by a grade. Two problems (Problems<br />

2.7 <strong>and</strong> 2.8) are not solved by anyofthethreevariants. In the Rosenbrock method, the<br />

search isendedavery long way from the objective, while in the DSC method a line search<br />

becomes in nite. To prepare for the single quadratic interpolation it uses a subroutine for<br />

bounding the relative minimum in the chosen direction. In this case, however, the relative<br />

minimum is situated at in nity thus, after some time, the range of numbers that can be<br />

h<strong>and</strong>led by the computer is exceeded. It eventually makes a fatal execution error with<br />

the message: \ oating over ow." In most computers, a program would terminate at this<br />

point, but the PDP 10 continues the calculation using its largest number 2 127 in place of<br />

the value that exceeded the number range. Nevertheless the bounding procedure does not<br />

end because in the DSC method any steps that do not change the value of the objective<br />

function are also regarded as successful. The convergence criterion is not tested within this<br />

subroutine, so the whole procedure becomes in nite without any further changeinvalue<br />

of the objective function. It must be terminated externally. The convergence criterion of<br />

the Rosenbrock method fails in three cases, in spite of the fact that the exact solutions<br />

have already been found. It is noted on the tables wherever fatal execution errors occur or<br />

the optimization does not terminate normally. With 11 or 12 exact results, <strong>and</strong> altogether<br />

23 good results, these three rotating axes methods rank highly.<br />

Fatal errors occur especially frequently in applying the more \thoroughbred" methods,<br />

the method of Powell <strong>and</strong> the DFPS strategy. They are not always accompanied by<br />

termination di culties or bad nal results. The accuracies achieved have therefore been<br />

evaluated independently of the execution errors. Good approximations, of which there are<br />

20 (Powell) <strong>and</strong> 16 (DFPS) out of 28, are also less frequent than in the orthogonalization<br />

strategies. In many cases both of these methods that are so advantageous in theory<br />

completely fail to approach the desired solution usually in the same problems that present<br />

di culties with the much simpler coordinate methods.<br />

Apart from failure of a line search because of a relative minimum at in nity, the causes<br />

are:<br />

The confusion of minima <strong>and</strong> saddle points because of ambiguity in quadratic interpolation<br />

(Problem 2.19 for the Powell strategy, Problem 2.27 for the variable metric<br />

method)<br />

Discontinuities in the objective function or its derivatives (Problems 2.6, 2.21, 2.22)<br />

A singular Hessian matrix (Problem 2.14 in the DFPS method)


210 Comparison of Direct Search Strategies for Parameter Optimization<br />

However, even a completely regular, several times di erentiable objective function of 10th<br />

order (Problem 2.23) is not managed by either of the quadratically convergent strategies.<br />

Their concept of using all the data that can be accumulated during the iterations to<br />

adjust their internal quadratic model apparently leads to completely wrong predictions of<br />

favorable directions <strong>and</strong> step lengths if the function is of appreciably higher than second<br />

order. Not one of the other direct search methods fails on this problem in fact they all<br />

nd the exact solution.<br />

With Powell's method one can choose between two di erent convergence criteria. The<br />

di erence between the stricter one <strong>and</strong> the simple one is that the former displaces slightly<br />

the best position obtained after the sequence of iterations has ended normally <strong>and</strong> searches<br />

again for the minimum. The search is only nally terminated if both results are the same<br />

within the speci ed accuracy. Otherwise the search iscontinued after a line search in<br />

the direction of the di erence vector between the two solutions. Because of the extreme<br />

accuracy requirements in the present cases the search usually ends with the message that<br />

rounding errors in the objective function prevent any closer approach to the objective. In<br />

such cases no additional variation in the nal result is made. Even in other cases, the<br />

stricter convergence criterion only makes a very slight improvement of the results the<br />

grades of merit of the results are not changed at all. In four problems the search becomes<br />

in nite because the step lengths vanish <strong>and</strong> the termination criterion is no longer tested.<br />

The search has to be terminated externally. Fatal execution errors occur very frequently.<br />

In three cases there is a \ oating over ow" <strong>and</strong> in seven cases a \ oating divide check."<br />

This concerns a total of eight problems. The DFPS strategy is even more susceptible.<br />

There are ve occurrences of \ oating over ow" <strong>and</strong> eleven of \ oating divide check."<br />

Twelve problems are involved.<br />

In contrast, the direct search ofHooke <strong>and</strong> Jeeves works without errors, but even this<br />

method fails on two problems one because of sharp corners in the pattern of contour lines<br />

(Problem 2.6) <strong>and</strong> another in the neighborhood of a stationary point withavery narrow<br />

valley leading to the objective (Problem 2.19). Nevertheless it yields 6 exact solutions<br />

<strong>and</strong> 21 good approximations.<br />

The overall behavior of the simplex <strong>and</strong> complex strategies is similar, but there are<br />

di erences in detail. There are 17 good solutions together with 6 exact ones to set against<br />

two failures (Problems 2.21 <strong>and</strong> 2.22). These are provoked by edges on the contour<br />

surfaces in the multidimensional space. The restart rule in the Nelder-Mead method<br />

is invoked during 9 of the solutions. The termination criterion based only on function<br />

values at the simplex corners does not operate in 9 cases. The optimum search becomes<br />

in nite with no apparent improvement in the objective function values. The results of<br />

the complex strategy depend strongly on the initial con guration, which is determined<br />

by r<strong>and</strong>om numbers. In this case the evaluation was made for the best of three attempts<br />

each with di erent sequences of pseudor<strong>and</strong>om numbers. It is especially worth noting the<br />

performance of the complex method in solving Problem 2.28, for which it is better than<br />

all the other methods.<br />

All three versions of the evolution strategy are distinguished by the fact that in no case<br />

do they completely fail, <strong>and</strong> they are able to solve far more than half of all the problems<br />

exactly (in the sense de ned above). Since their behavior, like that of the complex method,


Numerical Comparison of Strategies 211<br />

is in uenced by r<strong>and</strong>om numbers, the same rule was followed: namely, out of three tests<br />

the one with the best end result was accepted. In contrast to the strategy of Box, however,<br />

the evolution methods prove to be less dependent on the actual sequence of r<strong>and</strong>om<br />

numbers. This is especially true of the multimembered versions. Recombination almost<br />

always improves the chance of getting very close to the desired solutions. Fatal errors<br />

due to exceeding the maximum number range or dividing by zero do not occur by virtue<br />

of the simple computational operations in these strategies. Discontinuities in the partial<br />

derivatives, saddle points, <strong>and</strong> the like have noobvious adverse e ects. The search does,<br />

however, become rather time consuming when the minimum is reached via a long, narrow<br />

valley. The step lengths or variances that are set in this case are very small <strong>and</strong> impose<br />

slow convergence in comparison to methods that can perform a line search along the<br />

valley. The average rate of progress of an evolution strategy is not, however, a ected by<br />

bends in the valley, which would retard a one dimensional minimization procedure. Line<br />

searches only a ord a signi cant advantage to the rate of progress if there are directions in<br />

the space along which successful steps can be made of a size that is large compared to the<br />

local radius of curvature of the objective function contour surface. Examples are provided<br />

by Problems 2.14, 2.15, <strong>and</strong> 2.28. In these cases, long before reaching the minimum the<br />

optimal variances of the evolution methods have reached the lower limit as determined<br />

by the machine accuracy. The desired solution cannot therefore be approximated to the<br />

required accuracy. In Problems 2.14 <strong>and</strong> 2.15 the computation time limit did not allow<br />

the convergence criterion to be satis ed although it was actually progressing slowly but<br />

surely, the search was terminated.<br />

Di culties with the termination rule based on function values only occurred in the<br />

solution of one type of problem (Problems 2.11, 2.12) using the (10 , 100) evolution strategy<br />

with recombination. The multimembered method selects the 10 best individuals of a<br />

generation only from the current 100 descendants. Their 10 parents are not included in<br />

the selection process, for reasons associated with the step length adaptation. In general,<br />

the objective function value of the best descendant is closer to the solution than that<br />

of the best parent. In the case of the two problems referred to above, this is initially<br />

the case. As the solution is approached, however, it happens more <strong>and</strong> more frequently<br />

that the best value occurring in a generation is lost again. This is related to the fact<br />

that because of rounding errors in evaluating values near the minimum, the objective<br />

function behaves practically stochastically. Thus the population w<strong>and</strong>ers around in the<br />

neighborhood of the (quasi-singular) optimal solution without being able to satisfy the<br />

convergence criterion. These di culties do not beset the other search methods, including<br />

the multimembered evolution without recombination, because they do not come nearly so<br />

close to the optimum. The fact that the third problem of the same type (Problem 2.10)<br />

is solved without di culties in a nite time, even with recombination, can be considered<br />

a uke. Here too the minimum was reached long before the termination criterion was<br />

satis ed. On the whole, the multimembered evolution strategy with recombination is the<br />

surest <strong>and</strong> safest of all the search methods tested. In only 5 out of 28 cases is the solution<br />

not located exactly, <strong>and</strong> the greatest deviations of the variables were in the accuracy class<br />

=10 ;4 .


212 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.6: Summary of the results from Table 6.5<br />

Strategy Total number of problems No solution Fatal No normal<br />

solved in the accuracy class or >10 ;2 computation termination<br />

10 ;38 10 ;8 10 ;4 10 ;2 errors<br />

FIBO 3 9 18 19 9 0 0<br />

GOLD 4 9 18 19 9 0 0<br />

LAGR 2 7 17 21 7 0 0<br />

HOJE 6 21 26 26 2 0 0<br />

DSCG 11 23 24 26 2 2 2<br />

DSCP 12 24 26 26 2 2 2<br />

POWE 4 20 21 21 7 8 4<br />

DFPS 5 16 18 22 6 12 0<br />

SIMP 7 18 24 26 2 0 9<br />

ROSE 11 23 26 26 2 0 3<br />

COMP 5 17 24 26 2 0 0<br />

EVOL y 17 20 24 28 0 0 0<br />

GRUP y 18 22 27 28 0 0 0<br />

REKO y 23 24 28 28 0 0 2<br />

Table 6.6 presents again a summary of the number of unconstrained problems that<br />

were solved with given accuracy by the search methods under test, together with the number<br />

of unsolved problems, the number of cases of fatal execution errors, <strong>and</strong> the number<br />

of cases in which the termination criteria failed.<br />

Constrained Problems<br />

Tables 6.7 <strong>and</strong> 6.8 show the results of 5 strategies in the 22 constrained problems. Execution<br />

errors such as exceeding the number range or dividing by zero did not occur in<br />

any case. Neither were there any di culties in the termination of the searches.<br />

The method of Rosenbrock can only be applied if the starting point of the search lies<br />

within the allowed or feasible region. For this reason the initial values of the variables in<br />

seven problems had to be altered. All other methods very quickly found a feasible solution<br />

to start with. As in the unconstrained problems, the strategies that depend on r<strong>and</strong>om<br />

numbers were each run three times with di erent sequences of r<strong>and</strong>om numbers. The<br />

best of the three results was accepted for evaluation. The results of the complex method<br />

<strong>and</strong> the two membered evolution turned out to be very variable in quality, whereas the<br />

multimembered versions of the strategy, especially with recombination, proved to be less<br />

in uenced by the particular r<strong>and</strong>om numbers. Two problems (Problems 2.40 <strong>and</strong> 2.41)<br />

caused great di culty to all the search methods. These are simple linear programs that<br />

can be solved rapidly <strong>and</strong> exactly by, for example, the simplex method of Dantzig. In<br />

y Search terminated twice in each case due to too slow convergence


Numerical Comparison of Strategies 213<br />

Table 6.7: Results of all strategies in the second comparison test,<br />

constrained problems<br />

Problem No. ROSE COMP EVOL GRUP REKO<br />

2.29 3 1 4 3 3<br />

2.30 1 5 1 1 1<br />

2.31 3v 3 1 1 1<br />

2.32 3v 3 1 1 1<br />

2.33 3 2 5 4 1<br />

2.34 1 2 3 3 2<br />

2.35 3v 1 4 4 4<br />

2.36 1 1 1 1 1<br />

2.37 3 1 1 1 1<br />

2.38 3v 3 1 1 1<br />

2.39 3 3 4 4 3<br />

2.40 5 5 5 5 5<br />

2.41 5 5 5 5 5<br />

2.42 3 3 2 2 1<br />

2.43 3v 3 2 2 1<br />

2.44 1 5 1 1 1<br />

2.45 3 2 4 2 1<br />

2.46 3 2 3 3 1<br />

2.47 3v 1 1 1 1<br />

2.48 3v 3 1 1 1<br />

2.49 3 2 4 3 1<br />

2.50 3 1 1 1 1<br />

Sum 62 57 55 50 38<br />

The meaning of the symbols is as in Table 6.5 \v" is<br />

used in connection with the Rosenbrock method for<br />

constrained cases: The starting point hadtobe<br />

displaced since it was not feasible for this method.<br />

each case the closest to the objective was again the (10 , 100) evolution strategy with<br />

recombination, but even that result had to be classi ed as \no solution."<br />

On the whole the evolution methods cope with constrained problems no worse than the<br />

Rosenbrock or complex strategies, but they do reveal inadequacies that are not apparent<br />

in unconstrained problems. In particular the 1=5 success rule for adapting the variances<br />

of the mutation step lengths in the (1+1) evolution strategy appears to be unsuitable for<br />

attaining an optimal rate of convergence when several constraints become active.<br />

In problems with active constraints, the tendency of the evolution methods to follow<br />

the average gradient trajectory causes the search to come quickly up against one or more<br />

boundaries of the feasible region. The subsequent migration towards the objective along


214 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.8: Summary of the results from Table 6.7<br />

Total number of problems<br />

solved with accuracy class No solution<br />

Strategy 10 ;38 10 ;8 10 ;4 10 ;2 or >10 ;2<br />

ROSE 4 4 20 20 2<br />

COMP 6 11 18 18 4<br />

EVOL 10 12 14 19 3<br />

GRUP 10 13 17 20 2<br />

REKO 16 17 19 20 2<br />

such edges takes considerable e ort <strong>and</strong> time. In Figure 6.17 the situation is illustrated<br />

for the case of two variables <strong>and</strong> one constraint.<br />

The contours of the objective function run at a narrow angle to the boundary of<br />

the region. Foramutation to count as successful it must fall within the feasible region<br />

as well as improve the objective function value. For simplicity let us assume that all the<br />

mutations fall on the circumference of a circle about the current starting point. In the case<br />

of many variables this point ofviewisvery reasonable (see Chap. 5, Sect. 5.1). To start<br />

with the center of the circle (P 1) will still lie some way from the boundary. If the angle<br />

between the contours of the objective function <strong>and</strong> the edge of the feasible region is small<br />

<strong>and</strong> the step size, or variance of the mutation step size, is large then only a small fraction<br />

of the mutations will be successful (thickly drawn part of the circle 1). The 1=5 success<br />

rule ensures that this fraction is raised to 20%, which if the angle is small enough can<br />

only be achieved by reducing the variance to 2. The search point P is driven closer <strong>and</strong><br />

closer to the boundary <strong>and</strong> eventually lies on it (P 2). Since there is no longer any nite<br />

step size that can provide a su ciently large success rate, the variance is permanently<br />

reduced to the minimum value speci ed in the program. Depending on the particular<br />

problem structure <strong>and</strong> the chosen values of the parameters in the convergence criteria the<br />

search is either slowly continued or it is terminated before reaching the optimum. The<br />

more constraints become active during the search, the smaller is the probability that the<br />

objective will be closely approached. In fact, even in problems with only two variables<br />

<strong>and</strong> one constraint (Problem 2.46) the angle between the contours <strong>and</strong> the edge of the<br />

feasible region can become vanishingly small in the neighborhood of the minimum.<br />

Similar situations to the one depicted in Figure 6.17 can even arise in unconstrained<br />

problems if the objective function displays discontinuities in its rst partial derivatives.<br />

Examples of this kind of behavior are provided by Problems 2.6 <strong>and</strong> 2.21. If only a<br />

few variables are involved there is still a good chance of reaching the objective. Other<br />

search methods, especially those which execute line searches, are generally defeated by<br />

such points of discontinuity.<br />

The multimembered evolution strategy, although it works without a rigid step length<br />

adaptation, also loses its otherwise reliable convergence characteristics when the region of


Numerical Comparison of Strategies 215<br />

Forbidden<br />

region<br />

α<br />

σ1<br />

σ2<br />

P 1<br />

Circles : lines of equal probability of a step<br />

P 2<br />

Negative<br />

gradient<br />

direction<br />

Figure 6.17: The situation at active constraints<br />

σ2<br />

To the<br />

minimum<br />

Lines of<br />

constant F(x)<br />

success is very much narrowed down by constraints. While the individuals are not yet at<br />

the edge of the feasible region, those descendants whose step lengths have become smaller<br />

have a higher probability of survival. Thus here too the entire population eventually<br />

concentrates itself in a smaller <strong>and</strong> smaller area at the edge of the feasible region.<br />

The theory of the rate of progress in the corridor model did not foresee this kind<br />

of di culty, indeed it gives an optimal success rate, almost the same as in the sphere<br />

model, simply because the gradient vector of the objective function always runs parallel<br />

to the boundaries. In this case the search weaves backwards <strong>and</strong> forwards between the<br />

center <strong>and</strong> side of the corridor. The reduced probability of success at multidimensional<br />

edges is compensated by the fact that with a uniform probability of occupation over the<br />

cross section of the corridor, the space that counts as near to the edges represents a very<br />

small fraction of the total. Provided that the success rate is obtained over long enough<br />

periods the 1=5 success rule does not lead to permanent reduction of the variances but to<br />

a constant near optimal step size (it really uctuates) that depends only on the width of<br />

the corridor <strong>and</strong> the number of variables.<br />

The situation is happier than in Figure 6.17 if the constraints are given explicitly as<br />

xi ai or xi bi<br />

For anyonevariable, the region of success at a boundary is reduced by one half. If at some<br />

position m variables are each bounded on one side, then on average it costs 2 m mutations<br />

before one l<strong>and</strong>s within the feasible region. Here again, the 1=5 success rule for m >2<br />

will continuously reduce the variances until they reach their minimum value. Depending<br />

on the route chosen by the search process the limiting values of the variances, which are<br />

individually adjustable for each variable, will be reached at di erent times. Their relative<br />

values thereby alter, <strong>and</strong> with the new combination of step lengths the convergence can<br />

be faster.<br />

The extra exibility of the multimembered evolution strategy with recombination,<br />

in which thevariances of the changes in the variables are individually adaptable during


216 Comparison of Direct Search Strategies for Parameter Optimization<br />

the whole of the optimization process, is a very clear advantage in solving constrained<br />

problems. Suitable combinations of variances are set up in this case before the smallest<br />

possible step lengths are reached. Thus the total computation time is reduced <strong>and</strong> the<br />

nal accuracy is better. The recombination option also appears to have a bene cial e ect<br />

at boundaries that are not explicit it clearly improves the chance that descendants, even<br />

with a larger step size, will be successful near the boundary. Inany case the population<br />

clusters more slowly together than when there is no recombination.<br />

Global Convergence Properties<br />

Among the 50 test problems there are 8 having at least a second local minimum besides<br />

the global one. In the reliability test, the accuracy achieved was only assessed with respect<br />

to the particular optimum that was being approximated. What now is the capability of<br />

each strategy for locating global minima? Several problems were speci cally designed to<br />

investigate this question by having very many local optima, namely Problems 2.3, 2.26,<br />

2.30, <strong>and</strong> 2.44. In Table 6.9 this aspect of the test results is evaluated.<br />

Except for one problem (Problem 2.32), whose global minimum was found by all the<br />

strategies under test, the method of Rosenbrock onlyconverged to local optima. The<br />

complex method <strong>and</strong> the (1+1) evolution strategy were only better in one case: namely,<br />

in Problem 2.45 they both approached the global minimum.<br />

Table 6.9: Results of all strategies in the second comparison test:<br />

global convergence properties<br />

Problem F G L H D D P D S R C E G R<br />

I O A O S S O F I O O V R E<br />

B L G J C C W P M S M O U K<br />

No. O D R E G P E S P E P L P O<br />

2.3 L1 L1 L3 L1 L7 L7 L1 L3 L1 L6 L1 Lm G G<br />

2.36 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 G G<br />

2.30 L4 L1 Lm G G<br />

2.32 G G G G G<br />

2.44 L1 L1 L1 G G<br />

2.45 L G G G G<br />

2.47 L3 L1 L2 G G<br />

2.48 L2 Lm Lm GL GL<br />

Meaning of symbols:<br />

L Search converges to local minimum.<br />

L3 Search converges to the 3rd local minimum (in order of decreasing objective<br />

function values).<br />

Lm Search converges to various local minima depending on the r<strong>and</strong>om numbers.<br />

G Search converges to global minimum.<br />

GL Search converges to local or global minimum depending on the r<strong>and</strong>om numbers.


Numerical Comparison of Strategies 217<br />

The multimembered evolution strategy displays much better global convergence properties,<br />

with or without recombination. Although its actual path through the space was<br />

determined bychance, it always found its way to the absolute minimum. Only in Problem<br />

2.48 was the global optimum not always approached. In this case the feasible region is<br />

not simply connected: Between the starting point <strong>and</strong> the global minimum there is no<br />

connecting line that does not pass through the region excluded by constraints. The path<br />

of the simple evolution strategy <strong>and</strong> the initial condition of the complex method are also<br />

both dependent on the particular sequence of pseudor<strong>and</strong>om numbers however, the main<br />

di erence between the results of three trials in each casewas simply that di erent local<br />

minima were approached. In one case the (1+1) evolution rejected 33 local optima only<br />

to converge at the 34th (Problem 2.3).<br />

In spite of the good convergence properties of the multimembered evolution manifested<br />

in the tests, a certain measure of scepticism is called for. If the search is started with only<br />

small step lengths in the neighborhood of a local minimum, while the global minimum<br />

is far removed <strong>and</strong> is surrounded by only a relatively small region with small objective<br />

function values, then the probability of getting there can be very small.<br />

If in addition there are very many variables, so that the step sizes of the mutations<br />

are small compared to the Euclidean distance between two points in IR n , the search for<br />

a global optimum among many local optima is like the proverbial search for a needle in<br />

ahaystack. Locating singular minima, even with only a few variables, is a practically<br />

hopeless task. Although the multimemberedevolution increases the probability of nding<br />

global minima compared to other methods, it cannot guarantee to do so because of its<br />

basically sequential character.<br />

6.3.3.3 Third Test: Non-Quadratic Problems with Many Variables<br />

In the rst series of tests weinvestigated the rates of convergence for a quadratic objective<br />

function, <strong>and</strong> in the second the reliability of convergence for the general non-linear case.<br />

The aim of the third test is now to study the computational e ort required for nonquadratic<br />

problems. Because of their small number of variables, the problems of the second<br />

test series appear unsuitable for this purpose, as rates of convergence <strong>and</strong> computation<br />

times are only of interest in relation to the number of variables. The construction of<br />

non-quadratic objective functions of a type that can also be extended to an arbitrary<br />

number of variables is not a trivial problem. Another reason, however, for this third<br />

strategy comparison being restricted to only 10 di erent problems is that it required a<br />

large amount of computation time. In some cases CPU times of several hours were needed<br />

to test just one strategy on one problem with a particular number of variables. Seven<br />

of the problems are unconstrained <strong>and</strong> three have constraints. Appendix A, Section A.3<br />

contains the mathematical formulation of the problems together with their solutions.<br />

The procedure followed was the same as in the rst test. Besides the termination<br />

criterion speci c to each strategy, which dem<strong>and</strong>ed maximum accuracy, a further convergence<br />

criterion was applied in common to all strategies. According to the latter the<br />

search was to be ended when a speci ed distance had been covered from the starting point<br />

towards the minimum. The number of variables was varied up to the maximum allowed


218 Comparison of Direct Search Strategies for Parameter Optimization<br />

by the storage capacity, taking the values 3, 10, 30, 100, 300, <strong>and</strong> 1000. Of course, if a<br />

problem with, for example, 30 variables could not be solved by a strategy, or if no result<br />

was forthcoming at the end of the maximum computation time of 8 hours, the number of<br />

variables was not increased any further.<br />

As in the rst test, the initial conditions were speci ed by<br />

x (0)<br />

i<br />

Two exceptions are Problem 3.3 with<br />

= x i + (;1)i<br />

p n i = 1(1)n<br />

x (0)<br />

i = x i + (;1)i<br />

10 p n<br />

to ensure that the search always converged to the desired minimum <strong>and</strong> not to one of the<br />

many others of equal value <strong>and</strong> Problem 3.10 with<br />

x (0)<br />

i = x i + 1<br />

p n<br />

to start the search within the feasible region. Problems 3.8 <strong>and</strong> 3.9, whose minima are at<br />

in nity, required special treatment of the starting point <strong>and</strong> termination conditions (see<br />

Appendix A, Sect. A.3).<br />

The results are presented in Table 6.10. For comparison, some of the results of the<br />

rst test (Problem 1.1) are also displayed. The numbers enable one to assess critically<br />

on the one h<strong>and</strong> the reliability of a strategy <strong>and</strong> on the other the computation times it<br />

requires.


Numerical Comparison of Strategies 219<br />

Table 6.10: Results of all strategies in the third comparison test<br />

The following notation is used in the tables:<br />

n: Number of variables<br />

Case: A label for the convergence behavior, taking the values:<br />

1 Normal end of search required approximation to the objective<br />

was achieved.<br />

2 The search was ended before reaching the desired accuracy.<br />

3 The search became unending without converging it had to be<br />

terminated externally.<br />

4 The maximum computation time of 8 hours was insu cient<br />

to end the search successfully (occasionally more computation<br />

time was invested in trials with the multimembered evolution<br />

strategy that promised to be successful).<br />

- No trial was undertaken.<br />

1(2) Depending on the sequence of r<strong>and</strong>om numbers various cases<br />

occurred the entries in the table refer to the rst case de ned.<br />

OFC: Number of objective function calls.<br />

CFC: Number of constraint function calls.<br />

Time: Computation time in seconds (CPU time).<br />

Iterations, cycles, exploratory cycles, line searches, orthogonalizations, restarts,<br />

etc., were counted as in the rst comparison test.<br />

Fatal execution errors were only registered in the Powell <strong>and</strong> DFPS methods<br />

<strong>and</strong> it is not further speci ed here in which problems they occurred. As a rule<br />

the same types of problem were involved as in the second test.<br />

In unconstrained problems no numbers are tabulated for the number of<br />

objective function calls made by theevolution strategies. This can be<br />

calculated from the number of mutations or generations as follows:<br />

EVOL: 1+number of mutations<br />

GRUP, REKO: 10 + 100 times number of generations<br />

(continued)


220 Comparison of Direct Search Strategies for Parameter Optimization<br />

3870<br />

7017<br />

7013<br />

5207<br />

5207<br />

4849<br />

2501<br />

4808<br />

4790<br />

3695<br />

3695<br />

2509<br />

27.6<br />

101.0<br />

91.2<br />

42.4<br />

53.1<br />

52.5<br />

17.3<br />

66.3<br />

64.4<br />

30.1<br />

35.3<br />

25.5<br />

1<br />

2<br />

2<br />

1<br />

1<br />

2<br />

1<br />

2<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

38703<br />

35300<br />

35303<br />

49132<br />

49132<br />

47767<br />

24001<br />

26415<br />

26332<br />

35247<br />

35247<br />

42993<br />

2670<br />

4740<br />

4580<br />

3680<br />

4750<br />

6720<br />

1620<br />

3600<br />

3500<br />

2600<br />

3240<br />

7890<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

10562<br />

11244<br />

11218<br />

15111<br />

15111<br />

14633<br />

7201<br />

8073<br />

8046<br />

11270<br />

11270<br />

7204<br />

221<br />

430<br />

439<br />

332<br />

440<br />

476<br />

153<br />

345<br />

345<br />

291<br />

319<br />

240<br />

2<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1<br />

3<br />

10<br />

4<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

10<br />

4<br />

1<br />

1<br />

1<br />

1<br />

158<br />

415<br />

1250<br />

630<br />

192<br />

85<br />

192<br />

183<br />

85<br />

198<br />

745<br />

321<br />

140<br />

82<br />

138<br />

87<br />

0.10<br />

0.34<br />

0.84<br />

3.96<br />

0.12<br />

0.08<br />

0.14<br />

0.18<br />

0.04<br />

0.16<br />

0.56<br />

2.08<br />

0.10<br />

0.06<br />

0.12<br />

0.06<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

1<br />

1<br />

1<br />

5<br />

9<br />

9<br />

1<br />

1<br />

1<br />

1<br />

5<br />

9<br />

24<br />

1<br />

1<br />

1<br />

458<br />

1993<br />

3543<br />

2143<br />

589<br />

589<br />

296<br />

271<br />

1285<br />

2142<br />

3178<br />

436<br />

437<br />

264<br />

0.51<br />

3.38<br />

5.54<br />

136.00<br />

0.68<br />

1.10<br />

0.48<br />

0.30<br />

2.22<br />

3.52<br />

199.00<br />

0.56<br />

0.64<br />

0.42<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

1<br />

1<br />

1<br />

4<br />

6<br />

1<br />

1<br />

1<br />

1<br />

4<br />

6<br />

1<br />

1<br />

1<br />

1242<br />

4441<br />

6347<br />

1715<br />

1715<br />

816<br />

781<br />

2816<br />

4012<br />

1274<br />

1274<br />

785<br />

3.14<br />

18.70<br />

28.60<br />

4.68<br />

6.98<br />

3.00<br />

1.80<br />

12.00<br />

17.60<br />

3.20<br />

4.62<br />

2.80<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1<br />

3<br />

10<br />

4<br />

1<br />

1<br />

1<br />

2<br />

158<br />

415<br />

1250<br />

630<br />

192<br />

85<br />

192<br />

183<br />

0.13<br />

0.38<br />

1.04<br />

4.26<br />

0.14<br />

0.08<br />

0.18<br />

0.18<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

1<br />

1<br />

1<br />

5<br />

9<br />

18<br />

1<br />

1<br />

1<br />

456<br />

1997<br />

3470<br />

4381<br />

567<br />

567<br />

296<br />

0.53<br />

4.12<br />

5.90<br />

286.00<br />

0.72<br />

1.10<br />

0.48<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

1<br />

1<br />

1<br />

4<br />

6<br />

1<br />

1<br />

1<br />

1242<br />

4270<br />

6472<br />

1709<br />

1709<br />

816<br />

3.07<br />

18.90<br />

27.50<br />

4.68<br />

2.92<br />

6.92<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

3870<br />

7041<br />

7139<br />

5193<br />

5193<br />

4818<br />

26.5<br />

93.5<br />

94.5<br />

40.0<br />

53.9<br />

52.3<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

10562<br />

11244<br />

11218<br />

15127<br />

15127<br />

14634<br />

210<br />

434<br />

435<br />

334<br />

428<br />

478<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

38701<br />

35300<br />

35303<br />

49255<br />

49255<br />

47799<br />

2500<br />

4590<br />

4730<br />

3670<br />

4590<br />

6770<br />

1<br />

2<br />

2<br />

1<br />

1<br />

2<br />

Probl.<br />

case<br />

cycles<br />

OFC<br />

time<br />

case<br />

cycles<br />

OFC<br />

time<br />

case<br />

cycles<br />

OFC<br />

time<br />

case<br />

cycles<br />

OFC<br />

time<br />

case<br />

cycles<br />

OFC<br />

time<br />

case<br />

cycles<br />

OFC<br />

time<br />

n = 3 n = 10 n = 30 n = 100 n = 300 n = 1000<br />

Table 6.10 continued : Coordinate strategies FIBO, GOLD, LAGR (from top to bottom)


Numerical Comparison of Strategies 221<br />

Table 6.10 continued : HOJE - Direct search of Hooke <strong>and</strong> Jeeves<br />

time<br />

OFC<br />

cycles<br />

case<br />

time<br />

OFC<br />

cycles<br />

case<br />

time<br />

OFC<br />

cycles<br />

case<br />

time<br />

OFC<br />

cycles<br />

case<br />

time<br />

OFC<br />

cycles<br />

case<br />

time<br />

OFC<br />

cycles<br />

case<br />

Probl.<br />

n = 3 n = 10 n = 30 n = 100 n = 300 n = 1000<br />

1460<br />

5710<br />

23505<br />

42004<br />

12<br />

21<br />

21<br />

1<br />

1<br />

100<br />

784<br />

4954<br />

18612<br />

9<br />

33<br />

32<br />

1<br />

1<br />

2.37<br />

4.82<br />

352<br />

352<br />

2<br />

2<br />

1<br />

1<br />

0.43<br />

0.74<br />

168<br />

168<br />

3<br />

3<br />

3<br />

1<br />

1<br />

0.06<br />

0.08<br />

48<br />

48<br />

3<br />

3<br />

3<br />

1<br />

1<br />

0.02<br />

0.02<br />

20<br />

20<br />

4<br />

4<br />

1<br />

1<br />

1.1<br />

3.1<br />

5440<br />

42004<br />

1<br />

-<br />

758<br />

17714<br />

1<br />

-<br />

4.78<br />

352<br />

2<br />

1<br />

-<br />

168 0.74<br />

130 7493 4210.00<br />

3 168 0.48<br />

1<br />

1<br />

0.10<br />

48<br />

1<br />

1<br />

0.02<br />

0.12<br />

20<br />

19<br />

4<br />

3<br />

1<br />

1<br />

3.2<br />

3.3<br />

1700<br />

23505<br />

12<br />

119<br />

4954<br />

9<br />

2.86<br />

352<br />

16.70<br />

0.06<br />

237<br />

48<br />

12<br />

3<br />

1<br />

2<br />

1<br />

2<br />

2<br />

1<br />

2<br />

1<br />

2<br />

1<br />

2<br />

0.02<br />

0.18<br />

20<br />

4<br />

1<br />

2<br />

3.4<br />

3.5<br />

0.70<br />

0.62<br />

151<br />

20<br />

2160<br />

2410<br />

23505<br />

23505<br />

12<br />

12<br />

1<br />

1<br />

153<br />

137<br />

4954<br />

4954<br />

9<br />

2<br />

1<br />

1<br />

3.72<br />

3.68<br />

352<br />

352<br />

2<br />

2<br />

1<br />

1<br />

168<br />

168<br />

3<br />

3<br />

1<br />

1<br />

0.08<br />

0.08<br />

48<br />

48<br />

3<br />

3<br />

1<br />

1<br />

0.02<br />

0.02<br />

20<br />

25<br />

4<br />

4<br />

1<br />

1<br />

3.6<br />

3.7<br />

ROSE - Rosenbrock method with Gram-Schmidt orthogonalization<br />

time<br />

CFC<br />

OFC<br />

orth.<br />

case<br />

time<br />

CFC<br />

OFC<br />

orth.<br />

case<br />

time<br />

CFC<br />

OFC<br />

orth.<br />

case<br />

time<br />

CFC<br />

OFC<br />

orth.<br />

case<br />

Probl.<br />

(max)<br />

n = 75<br />

n = 30<br />

n = 10<br />

n = 3<br />

145<br />

899<br />

1<br />

1<br />

1<br />

1.18<br />

121<br />

0<br />

3<br />

3<br />

1<br />

1<br />

1<br />

0.91<br />

120<br />

2<br />

3<br />

4<br />

1<br />

1<br />

1<br />

0.08<br />

27<br />

1<br />

3<br />

5<br />

3<br />

2<br />

2<br />

2<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1.1<br />

242<br />

342<br />

2352<br />

3879<br />

2<br />

3<br />

4<br />

18.70<br />

19.10<br />

575<br />

575<br />

1.36<br />

2.10<br />

161<br />

282<br />

0.14<br />

0.30<br />

41<br />

98<br />

3.1<br />

3.2<br />

-<br />

1<br />

1690.00<br />

3077<br />

121<br />

8<br />

0<br />

1<br />

1<br />

5.02<br />

1.98<br />

73<br />

1<br />

4<br />

1<br />

1<br />

0.40<br />

0.14<br />

45<br />

44<br />

3.3<br />

3.4<br />

4660<br />

728<br />

29 0.12 1 3<br />

2.02 1 0 121<br />

1<br />

44 0.14 1 4 279 2.06 1 0 121<br />

1.32 1 47 83059 4830<br />

29 0.10 1 2 152 1.10 1 0 121<br />

1.32 1 3 1871<br />

236<br />

1 1 28 101 0.16 1 1 91 1128 1.60 1 1 241 7861 17.30 1 0 226 33448 85<br />

1 1 28 28 0.10 1 6 427 309 3.46 1 1 512 215 9.76 1 2 2833 969 194<br />

83059<br />

7337<br />

47<br />

9<br />

1.20<br />

1.36<br />

279<br />

295<br />

3.5<br />

3.6<br />

3.7<br />

3.8<br />

3.9<br />

-<br />

1 8 268 367 1.16 2 12 2953 9766 30.50 2<br />

3.10


222 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.10 continued : DSCG - Davies-Swann-Campey method with Gram-Schmidt orthonormalization<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

Probl.<br />

n = 75 (max)<br />

n = 30<br />

n = 10<br />

n = 3<br />

6.10<br />

338<br />

75<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

1<br />

1<br />

1<br />

1.18<br />

136<br />

30<br />

0<br />

1<br />

3<br />

28<br />

0<br />

0<br />

0<br />

0<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

0.20<br />

56<br />

10<br />

0<br />

2<br />

3<br />

8<br />

0<br />

0<br />

0<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

0.04<br />

20<br />

0<br />

1<br />

3<br />

2<br />

0<br />

1<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1.1<br />

78.40<br />

563<br />

150<br />

7.58<br />

321<br />

91<br />

0.86<br />

119<br />

30<br />

0.08<br />

32<br />

3<br />

6<br />

3.1<br />

79.00<br />

563<br />

150<br />

18.70<br />

2030.00<br />

1.18<br />

1.58<br />

1.48<br />

1.26<br />

535<br />

3366<br />

136<br />

6.68<br />

78.20<br />

7.08<br />

75<br />

150<br />

75<br />

150<br />

0<br />

1<br />

0<br />

1<br />

165<br />

151<br />

1087<br />

30<br />

30<br />

30<br />

30<br />

1.12<br />

26.00<br />

0.20<br />

0.20<br />

0.22<br />

0.22<br />

147<br />

377<br />

56<br />

47<br />

56<br />

56<br />

40<br />

112<br />

10<br />

10<br />

10<br />

10<br />

0.12<br />

0.28<br />

0.06<br />

0.08<br />

0.06<br />

0.10<br />

48<br />

35<br />

20<br />

30<br />

20<br />

30<br />

12<br />

9<br />

3<br />

6<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

76.90<br />

338<br />

490<br />

338<br />

564<br />

136<br />

136<br />

3<br />

6<br />

3.7<br />

DSCP - Davies-Swann-Campey method with Palmer orthonormalization<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

time<br />

OFC<br />

lin. search<br />

orth.<br />

case<br />

Probl.<br />

n = 95 (max)<br />

n = 75 (max)<br />

n = 30<br />

n = 10<br />

n = 3<br />

9.49<br />

448<br />

95<br />

6.10<br />

338<br />

75<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

1<br />

1<br />

1<br />

1.16<br />

136<br />

30<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

0.22<br />

56<br />

10<br />

0<br />

2<br />

3<br />

8<br />

0<br />

0<br />

0<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

0.04<br />

20<br />

3<br />

6<br />

12<br />

9<br />

3<br />

6<br />

3<br />

6<br />

0<br />

1<br />

3<br />

2<br />

0<br />

1<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1.1<br />

11.90<br />

23.20<br />

428<br />

713<br />

95<br />

190<br />

0<br />

0<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

1<br />

1<br />

1<br />

14.90<br />

14.80<br />

563<br />

563<br />

150<br />

150<br />

3.58<br />

6.56<br />

321<br />

527<br />

91<br />

151<br />

932<br />

30<br />

30<br />

30<br />

30<br />

1<br />

3<br />

0.58<br />

0.72<br />

119<br />

147<br />

30<br />

40<br />

0.08<br />

0.12<br />

32<br />

48<br />

3.1<br />

3.2<br />

1670.00<br />

9.96<br />

9.22<br />

11.1<br />

11.2<br />

428<br />

334<br />

428<br />

428<br />

95<br />

95<br />

95<br />

95<br />

0<br />

0<br />

0<br />

0<br />

6.36<br />

12.90<br />

7.18<br />

75<br />

150<br />

75<br />

0<br />

1<br />

0<br />

1<br />

1.28<br />

1.60<br />

1.36<br />

1.26<br />

2924<br />

136<br />

165<br />

136<br />

136<br />

28<br />

0<br />

0<br />

13.70<br />

338<br />

490<br />

338<br />

564<br />

150<br />

0<br />

0<br />

25.50<br />

0.20<br />

0.20<br />

0.22<br />

0.22<br />

383<br />

56<br />

47<br />

56<br />

56<br />

112<br />

10<br />

10<br />

10<br />

10<br />

0.28<br />

0.06<br />

0.08<br />

0.06<br />

0.08<br />

35<br />

20<br />

31<br />

20<br />

30<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7


Numerical Comparison of Strategies 223<br />

Table 6.10 continued : POWE - Powell’s method of conjugate directions<br />

time<br />

OFC<br />

lin. search<br />

iterations<br />

case<br />

time<br />

OFC<br />

lin. search<br />

iterations<br />

case<br />

time<br />

OFC<br />

lin. search<br />

iterations<br />

case<br />

time<br />

OFC<br />

lin. search<br />

iterations<br />

case<br />

time<br />

OFC<br />

lin. search<br />

iterations<br />

case<br />

Probl.<br />

n = 135 (max)<br />

n = 100<br />

n = 30<br />

n = 10<br />

n = 3<br />

8.60<br />

407<br />

135<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

3<br />

3.72<br />

202<br />

100<br />

1<br />

2<br />

2<br />

1<br />

1<br />

1<br />

3<br />

1<br />

2<br />

1<br />

3<br />

0.60<br />

92<br />

30<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

3<br />

0.12<br />

32<br />

10<br />

1<br />

2<br />

3<br />

9<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

3<br />

0.02<br />

11<br />

3<br />

7<br />

11<br />

7<br />

3<br />

3<br />

3<br />

13<br />

1<br />

2<br />

3<br />

2<br />

1<br />

1<br />

1<br />

4<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1.1<br />

15.50<br />

15.50<br />

541<br />

541<br />

135<br />

135<br />

15.10<br />

15.00<br />

701<br />

701<br />

200<br />

200<br />

304 2.50<br />

288 2.32<br />

1963 1100.00<br />

152 0.94<br />

91<br />

91<br />

3<br />

3<br />

23<br />

1<br />

0.36<br />

0.46<br />

21<br />

32<br />

0.08<br />

0.12<br />

32<br />

38<br />

3.1<br />

3.2<br />

16.20<br />

811<br />

135<br />

1<br />

7.82<br />

502<br />

100<br />

1<br />

700<br />

30<br />

16.90<br />

0.44<br />

84<br />

112<br />

257<br />

128<br />

96<br />

20<br />

19.00<br />

811<br />

135<br />

1<br />

8.74<br />

502<br />

100<br />

1<br />

1.14<br />

152<br />

30<br />

1<br />

0.48<br />

128<br />

20<br />

2<br />

0.18<br />

0.06<br />

0.14<br />

0.06<br />

0.46<br />

23<br />

20<br />

55<br />

20<br />

166<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

DFPS - Stewart’s modification of the Davidon-Fletcher-Powell method<br />

time<br />

OFC<br />

iterations<br />

case<br />

time<br />

OFC<br />

iterations<br />

case<br />

time<br />

OFC<br />

iterations<br />

case<br />

time<br />

OFC<br />

iterations<br />

case<br />

time<br />

OFC<br />

iterations<br />

case<br />

Probl.<br />

n = 180 (max)<br />

n = 100<br />

n = 30<br />

n = 10<br />

n = 3<br />

9.56<br />

364<br />

3.19<br />

204<br />

0.32<br />

64<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

2<br />

0.06<br />

24<br />

1<br />

3<br />

4<br />

4<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

0.02<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1.1<br />

84.40<br />

1274<br />

22.00<br />

1<br />

5<br />

5<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

2<br />

1.82<br />

4<br />

4<br />

45<br />

1<br />

0.20<br />

0.06<br />

3<br />

4<br />

2<br />

3.1<br />

84.20<br />

1274<br />

1<br />

6<br />

6<br />

23.10<br />

612<br />

612<br />

10.20<br />

364<br />

1<br />

1<br />

1<br />

1<br />

-<br />

1<br />

2<br />

1<br />

2<br />

3.34<br />

204<br />

1<br />

1.78<br />

932.00<br />

0.34<br />

160<br />

160<br />

1640<br />

64<br />

0.30<br />

3.96<br />

0.06<br />

48<br />

65<br />

61<br />

24<br />

0.06<br />

0.12<br />

0.02<br />

0.02<br />

0.02<br />

0.26<br />

10<br />

20<br />

25<br />

15<br />

10<br />

11<br />

10<br />

101<br />

1<br />

1<br />

11.80<br />

364<br />

1<br />

3.92<br />

204<br />

1<br />

0.40<br />

7.84<br />

64<br />

1<br />

0.08<br />

1.42<br />

24<br />

780<br />

15<br />

307<br />

1<br />

16<br />

1<br />

15<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7


224 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.10 continued : SIMP - Simplex method of Nelder <strong>and</strong> Mead (with restart rule)<br />

time<br />

OFC<br />

restarts<br />

case<br />

time<br />

OFC<br />

restarts<br />

case<br />

time<br />

OFC<br />

restarts<br />

case<br />

time<br />

OFC<br />

restarts<br />

case<br />

time<br />

OFC<br />

restarts<br />

case<br />

Probl.<br />

n = 135 (max)<br />

n = 100<br />

n = 30<br />

n = 10<br />

n = 3<br />

5270<br />

1 5142<br />

1<br />

4<br />

4<br />

-<br />

-<br />

-<br />

862<br />

1789<br />

0<br />

1<br />

0<br />

1<br />

1<br />

1<br />

-<br />

37.4<br />

664<br />

0<br />

1<br />

1.49<br />

138<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

0.09<br />

28<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1.1<br />

25800<br />

6190<br />

42099<br />

10082<br />

1.56 1 0 1514 95.2<br />

3.54 1 0 1277 79.7<br />

12.10 1 1 12936 8130.0<br />

1.62 1 7 17357 1100.0<br />

206.00 2 547 101142 4030.0<br />

1.68 1 7 17357 1140.0<br />

39.10 1 181 21387 1020.0<br />

152<br />

300<br />

163<br />

163<br />

22362<br />

163<br />

3968<br />

0<br />

0<br />

0<br />

0<br />

185<br />

0<br />

48<br />

0.10<br />

0.12<br />

34<br />

40<br />

3.1<br />

3.2<br />

4<br />

-<br />

0.28<br />

0.08<br />

0.08<br />

32<br />

25<br />

32<br />

3.3<br />

3.4<br />

3.5<br />

-<br />

-<br />

4<br />

4<br />

1<br />

1<br />

0.08<br />

0.08<br />

25<br />

28<br />

1<br />

1<br />

3.6<br />

3.7<br />

COMP - Complex method of Box ( no. of vertices = 2n )<br />

time<br />

OFC<br />

case<br />

time<br />

CFC<br />

OFC<br />

case<br />

time<br />

CFC<br />

OFC<br />

case<br />

time<br />

CFC<br />

OFC<br />

case<br />

Probl.<br />

n = 95 (max)<br />

n = 30<br />

n = 10<br />

n = 3<br />

11000<br />

13100<br />

14902<br />

17397<br />

13300<br />

17431<br />

32196 25300<br />

1.1 1 69 0.22 1 535 6.72 1 2621 211 1<br />

3.1 1 76 0.28 1 691 9.30 1 2770 247 1<br />

3.2 1 83 0.28 1 527 6.99 1 3092 259 1<br />

3.3 1 58 0.55 1 966 74.30 2(4) 19407 12100 -<br />

3.4 1 74 0.24 1(2) 529 6.71 1(2) 4722 402 4<br />

3.5 1 57 0.23 1 1266 17.20 2(4) 16936 1450 4<br />

3.6 1 85 0.32 1 556 7.34 1(2) 9011 781 4<br />

3.7 1 76 0.30 1 587 7.68 1 9537 840 1<br />

3.8 1 40 186 0.33 1 207 4680 8.32 1 1041 75867 240 4<br />

3.9 1 35 44 0.19 1 384 600 8.46 2 2622 4508 357 -<br />

3.10 1 33 208 0.25 1(4) 486 8919 12.10 4 -


Numerical Comparison of Strategies 225<br />

Table 6.10 continued : EVOL - (1+1) evolution strategy<br />

time<br />

CFC<br />

OFC<br />

mutations<br />

case<br />

time<br />

CFC<br />

OFC<br />

mutations<br />

case<br />

time<br />

CFC<br />

OFC<br />

mutations<br />

case<br />

Probl.<br />

n = 30<br />

n = 3 n = 10<br />

1 630<br />

14.0<br />

1 730<br />

16.6<br />

1 1041<br />

26.5<br />

1 7103<br />

4060.0<br />

1 10939<br />

244.0<br />

2<br />

1 15769<br />

365.0<br />

1 1154 26.7<br />

1 668 300 24629 46.2<br />

1 1435 579 1436 34.5<br />

4<br />

1 224<br />

1.74<br />

1 221<br />

1.88<br />

1 537<br />

4.46<br />

1 2244<br />

156.00<br />

1 257<br />

2.18<br />

2 200<br />

1.82<br />

1 325<br />

2.87<br />

1 206 1.74<br />

1 319 136 3980 5.80<br />

1 388 148 389 3.33<br />

1 59757 39076 802511 824.00<br />

1 49<br />

0.17<br />

1 67<br />

0.22<br />

1 179<br />

0.52<br />

1 61<br />

0.50<br />

1 66<br />

0.20<br />

1 74<br />

0.36<br />

1 89<br />

0.32<br />

1 63 0.20<br />

1 99 45 301 0.54<br />

1 78 31 79 0.29<br />

1 925 592 3913 4.06<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

3.8<br />

3.9<br />

3.10<br />

time<br />

CFC<br />

OFC<br />

mutations<br />

case<br />

time<br />

CFC<br />

OFC<br />

mutations<br />

case<br />

time<br />

CFC<br />

OFC<br />

mutations<br />

case<br />

Probl.<br />

n = 100 n = 300 n = 1000<br />

15600<br />

17100<br />

17200<br />

23607<br />

23374<br />

23819<br />

1<br />

1<br />

1<br />

-<br />

-<br />

2<br />

-<br />

4<br />

-<br />

-<br />

-<br />

1 6666<br />

1310<br />

1 7638<br />

1700<br />

1 6916<br />

1480<br />

-<br />

4<br />

2<br />

4<br />

1 14818 3060<br />

1 3161 3129 1883309 14100<br />

1 13035 5374<br />

13036 2830<br />

-<br />

1 2192<br />

149<br />

1 2185<br />

164<br />

1 2208<br />

188<br />

-<br />

1 47720<br />

3320<br />

2<br />

1 37985<br />

2710<br />

1 5145 389<br />

1 1696 998 255392 803<br />

1 4633 1870 4634 336<br />

-<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

3.8<br />

3.9<br />

3.10


226 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.10 continued : GRUP - (10,100) evolution strategy<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

Probl.<br />

n = 30<br />

n = 3 n = 10<br />

1 55<br />

145<br />

1 77<br />

201<br />

1 67<br />

194<br />

1 1462<br />

82100<br />

1(2) 848<br />

2140<br />

2 481 1300<br />

1 446<br />

1130<br />

1 80 200<br />

1 51 2983 226805 390<br />

1 96 4801 9610 239<br />

4<br />

1 17<br />

16.8<br />

1 20<br />

20.3<br />

1 25<br />

24.1<br />

1 23<br />

164.0<br />

1 18<br />

20.9<br />

1 21<br />

21.8<br />

1 19<br />

19.6<br />

1 20 20.8<br />

1 25 1355 33663 48.0<br />

1 37 2067 3710 35.4<br />

1 3707 230227 4745318 5360<br />

1 4<br />

1.81<br />

1 7<br />

3.18<br />

1 4<br />

1.56<br />

1 5<br />

5.02<br />

1 3<br />

1.44<br />

1 4<br />

1.90<br />

1 5<br />

2.26<br />

1 5 2.08<br />

1 9 540 2940 5.19<br />

1 6 414 610 2.68<br />

1 103 6937 45174 58.0<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

3.8<br />

3.9<br />

3.10<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

Probl.<br />

n = 100 n = 300 n = 435 (max)<br />

28300<br />

30400<br />

33100<br />

854<br />

865<br />

940<br />

1<br />

1<br />

1<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

1 551<br />

13600<br />

1 677<br />

16400<br />

1 604<br />

14600<br />

-<br />

-<br />

-<br />

-<br />

1 1655 38100<br />

4<br />

1 986 47168<br />

98673 23200<br />

-<br />

1 194<br />

1600<br />

1 213<br />

1730<br />

1 199<br />

1660<br />

-<br />

2 1653<br />

13200<br />

-<br />

2 1650<br />

13200<br />

1 473 3790<br />

1 109 5925 1614891 5090<br />

1 329 15941 32915 2620<br />

-<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

3.8<br />

3.9<br />

3.10


Numerical Comparison of Strategies 227<br />

Table 6.10 continued : REKO - (10,100) evolution strategy with recombination<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

Probl.<br />

n = 30<br />

n = 3 n = 10<br />

1 34<br />

177<br />

1 32<br />

170<br />

1 35<br />

198<br />

1 1365<br />

79700<br />

1 28<br />

152<br />

1(2) 213 1120<br />

1 29<br />

148<br />

1 41 209<br />

1 45 3920 242482 526<br />

1 92 6321 9210 487<br />

4<br />

1 13<br />

23.3<br />

1 15<br />

31.3<br />

1 11<br />

19.8<br />

1 14<br />

122.0<br />

1 15<br />

26.3<br />

1 19<br />

35.7<br />

1 12<br />

21.5<br />

1 17 30.4<br />

1 20 1410 30006 60.5<br />

1 28 1978 2810 53.9<br />

1 456 27875 586357 1050.0<br />

1 4<br />

2.67<br />

1 5<br />

3.56<br />

1 6<br />

4.32<br />

1 1<br />

1.40<br />

1 3<br />

2.06<br />

1 5<br />

3.43<br />

1 3<br />

2.22<br />

1 5 3.60<br />

1 10 584 3549 8.25<br />

1 10 638 1010 7.09<br />

1 8 462 3321 6.88<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

3.8<br />

3.9<br />

3.10<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

time<br />

CFC<br />

OFC<br />

generations<br />

case<br />

Probl.<br />

n = 100 n = 300 n = 435 (max)<br />

21100<br />

22700<br />

19400<br />

9340 1 289<br />

10800 1 305<br />

10800 1 257<br />

-<br />

10800<br />

1 305<br />

-<br />

10600 1 307<br />

20800 -<br />

-<br />

25200 -<br />

-<br />

49410<br />

36708<br />

1 180<br />

1 206<br />

1 212<br />

-<br />

1 210<br />

-<br />

1 205<br />

1 399<br />

4<br />

1 494<br />

-<br />

1420<br />

1420<br />

1350<br />

21900<br />

1250<br />

28200<br />

23000<br />

1420<br />

2370<br />

9080<br />

3110<br />

2529790<br />

18610<br />

1 84<br />

1 80<br />

1 77<br />

-<br />

1 77<br />

2 1653<br />

1 82<br />

1 137<br />

1 128 12354<br />

1 186 13547<br />

-<br />

1.1<br />

3.1<br />

3.2<br />

3.3<br />

3.4<br />

3.5<br />

3.6<br />

3.7<br />

3.8<br />

3.9<br />

3.10


228 Comparison of Direct Search Strategies for Parameter Optimization<br />

With only three variables, nearly all the problems were solved perfectly by all strategies<br />

i.e., the required approximation to the objective was achieved. The only exception<br />

is Problem 3.5, which ended in failure for the coordinate strategies, the method of Hooke<br />

<strong>and</strong> Jeeves, <strong>and</strong> the methods of Powell <strong>and</strong> of Davidon-Fletcher-Powell-Stewart. In apparent<br />

contradiction to this, the corresponding Problem 2.21 for n =5was satisfactorily<br />

solved by the Hooke-Jeeves strategy <strong>and</strong> the DFPS method. The causes are to be found<br />

in the di erent initial values of the variables. With the variable metric method, fatal<br />

execution errors occurred in both cases.<br />

If there are 10 or more variables, even the two membered evolution strategy does not<br />

nd the minimum in Problem 3.5, due to the extremely unfavorable starting point. The<br />

probability of making from there a rst step with a lower objective function value is 2 ;n .<br />

Thus with manyvariables, the termination condition is usually met before a single success<br />

has been scored. The simplex method of Nelder <strong>and</strong> Mead with n = 10 took 185 restarts<br />

to reach the desired approximation to the objective. For more than 10 parameters the<br />

solution can no longer be su ciently well approximated in spite of an increasing number<br />

of restarts. With stricter accuracy requirements the simplex method fails much sooner<br />

(Problem 2.21 with n =5).<br />

The complex strategy likewise was no longer able to solve the same problem for n<br />

30. Depending on the sequence of r<strong>and</strong>om numbers it either ended the search before<br />

achieving the required accuracy, oritwas still far from the minimum when the allowed<br />

computation time (8 hours) expired. The multimembered evolution strategy also proved<br />

to be dependent, although less strongly, on the particular sequence of r<strong>and</strong>om numbers.<br />

The version without recombination failed on Problem 3.5 for n 30 with recombination<br />

it failed for n 100. Without recombination <strong>and</strong> for n 100 it ended the minimum<br />

search prematurely also in Problems 3.4 <strong>and</strong> 3.6. The simplex <strong>and</strong> complex methods had<br />

convergence di culties with both types of objective function, usually even for only a few<br />

variables. Several times they had to be interrupted because of exceeding the time limit.<br />

Further details can be found in the tables <strong>and</strong> Appendix A, Section A.3.<br />

The search for the minima in Problems 3.4 <strong>and</strong> 3.6 presents no di culties to the<br />

coordinate strategies, <strong>and</strong> the methods of Hooke <strong>and</strong> Jeeves, Rosenbrock, Davies-Swann-<br />

Campey, Powell, <strong>and</strong> Davidon-Fletcher-Powell-Stewart. The three rotating coordinate<br />

strategies are the only ones that manage to solve Problem 3.5 satisfactorily for any number<br />

of variables. Nevertheless it would be hasty to conclude that these methods are<br />

therefore clearly better than the others an attempt to analyze the reasons for their success<br />

reveals that only slight changes in the objective functions are enough to undermine<br />

their apparently advantageous way ofworking.<br />

The signi cant di erence in this respect between the above group of strategies <strong>and</strong><br />

the others (complex, simplex, <strong>and</strong> evolution strategies) is that the former operate with<br />

amuch more limited set of search directions than the latter. There are usually only n<br />

directions, e.g., the n coordinate directions of the axes-parallel search methods, compared<br />

to an in nite number (in principle) in the evolution methods. In the case of Problems 3.4<br />

to 3.6 the most favorable search directions are the n directions of the unit vectors. All<br />

methods with one dimensional minimizations use precisely these directions in their rst<br />

iteration cycle, so they do not usually require any further iterations to achieve the required


Numerical Comparison of Strategies 229<br />

accuracy. Bykeeping the starting conditions the same but rotating the coordinates with<br />

respect to the contours of the objective function (Problem 3.6), or slightly tilting the<br />

contours with respect to the coordinate axes (Problem 3.5), or both together (Problem<br />

3.4), one could easily cause all the line searches to fail. On the other h<strong>and</strong> the strategies<br />

without line searches would not be impaired by these changes. Thus the advantage of<br />

selected directions can turn into a disadvantage. These coordinated strategies can never<br />

solve the problem referred to, whereas, as we have seen, the strategies that have a large<br />

set of search directions at their disposal only fail when a particular number of variables<br />

is exceeded. Problems 3.4 <strong>and</strong> 3.6 are therefore suitable for assessing the reliability of<br />

simplex, complex, <strong>and</strong> evolution strategies, but not for the other methods. Together they<br />

belong to the type of problems which Himmelblau designates as \pathological."<br />

Leaning more to the conservative side are the several times continuously di erentiable<br />

objective functions of Problems 3.1, 3.2, 3.3, <strong>and</strong> 3.7. The rst two problems were tackled<br />

successfully by all the strategies for any number of variables. The simplex method did,<br />

however, need at least one restart for Problem 3.1 with n 100. For 135 variables it<br />

exceeded the time limit before Problems 3.1 <strong>and</strong> 3.2 were solved to su cient accuracy.<br />

Problem 3.3 gave trouble to several search procedures when there were 10 or more<br />

variables. The coordinate strategies were the rst to fail. For only n = 10, the step<br />

lengths of the line searches would have had to be smaller than allowed by thenumber<br />

precision of the computer used. At n = 30, the DSC strategy with Gram-Schmidt<br />

orthogonalization also ends without having located the minimum accurately enough. The<br />

simplex method with one restart still found the solution for n = 30, but the complex<br />

strategy failed here, either by premature termination of the search orbyreaching the<br />

maximum permitted computation time. Problem 3.3, because the cost per objective<br />

function evaluation increases as O(n 2 ), requires the longest computation times for its<br />

solution. Since the objective function also took O(n 2 ) units of storage, this problem could<br />

not be used for more than 30 variables.<br />

Problem 3.7, like the analogous Problem 2.31, gave trouble to the two quadratically<br />

convergent strategies. The method of Powell was only successful for n = 3. For more<br />

variables it became stuck in the search process without the termination rule taking e ect.<br />

The variable metric strategy behaved in just the same way. For n 30, it no longer<br />

came as near as required to the optimum. Under the stricter conditions of the second set<br />

of tests it failed already at n = 5. With both methods fatal execution errors occurred<br />

during the search. No other direct search strategies had any di culty with Problem 3.7,<br />

which is a simple 10th order polynomial. Only the simplex method would not have found<br />

the solution su ciently accurately without the restart rule. For n = 100, it reached the<br />

time limit before the search simplex had collapsed for the rst time.<br />

The advantage shown by the complex strategy was due to the complex's having 2 n<br />

vertices, which is almost twice as many asthen + 1 of the simplex. An attempt to solve<br />

Problems 3.1 to 3.10 for n = 30 with a complex constructed of 40 points failed completely.<br />

The search ended, in every case, without having reached the required accuracy.<br />

How do the computation times compare when the problems are no longer only quadratically<br />

non-linear? For solving the \pathological" Problems 3.4 to 3.6 all the methods with<br />

a line search take about the same times, with the same number of variables, as they do


230 Comparison of Direct Search Strategies for Parameter Optimization<br />

for solving the simple quadratic Problem 1.1, if indeed they actually can nd a solution.<br />

With any of the remaining methods the computation times increase somewhat more<br />

steeply with the number of variables, up to the limiting number beyond whichconvergence<br />

cannot be guaranteed in every case.<br />

The solution times for Problems 3.1 <strong>and</strong> 3.2 usually turn out to be several times<br />

greater than those for Problem 1.1. The cost of the coordinate strategies is up to 1000%<br />

more for a few variables, which reduces to 100% as the number of variables increases. As<br />

in the case of Problem 1.1, the solution times for Problems 3.1 <strong>and</strong> 3.2 using the method<br />

of Hooke <strong>and</strong> Jeeves increase somewhat faster than the square of the number of variables.<br />

For very many variables 250% more computation time is required.<br />

For n 30, the Rosenbrock method requires 70% to 250% more time (depending on<br />

the number of orthogonalizations) for the rst two problems of the third set of tests than<br />

for the simple quadratic problem. The computation time still increases as O(n 3 )inall<br />

cases because of the costly procedure of rotating the coordinates. For example, for n =75,<br />

up to 90% of the total time is taken up by the orthogonalizations. The DSC strategies<br />

reached the desired accuracy in Problem 1.1 without orthogonalizations. Since solving<br />

Problems 3.1 <strong>and</strong> 3.2 requires more than n line searches in each case, the computation<br />

times di er signi cantly, depending on the chosen method of orthogonalization. Palmer's<br />

program holds down the increase in the computation times to O(n 2 ) whereas the Gram-<br />

Schmidt method leads to an O(n 3 ) increase. It therefore is not meaningful to quote the<br />

extra cost as a percentage with respect to Problem 1.1. In the extreme case instead of 6<br />

seconds at n = 75 the procedure took nearly 80 seconds.<br />

The method of Powell requires two to four times as much time, depending on whether<br />

one or two extra iterations are needed. However, even for the same number of iterations,<br />

i.e., also with the same number of line searches (n = 135), the number of function calls<br />

in Problems 3.1 <strong>and</strong> 3.2 is greater than in Problem 1.1. The reason for this is that in the<br />

quadratic reference problem a simpli ed form of the parabolic interpolation can be used.<br />

The variable metric strategy, in order to solve thetwo non-quadratic problems (Problems<br />

3.1 <strong>and</strong> 3.2) with n = 180, requires about nine times as much computation time as for<br />

Problem 1.1. This factor increases with n since the number of gradient determinations<br />

increases gradually with n.<br />

The pattern of behavior of the simplex method of Nelder <strong>and</strong> Mead is very irregular.<br />

If the number of variables is small, the computation times for all three problems are about<br />

equal. However, for n = 100, Problem 3.2 requires about seven times as muchtimetobe<br />

solved as Problem 1.1 <strong>and</strong>, because of a restart, Problem 3.1 requires even thirty times<br />

as much. With n = 135, neither of the two non-quadratic problems can be solved within<br />

8 hours, whereas 1:5 hours are su cient for Problem 1.1. On the other h<strong>and</strong> the complex<br />

strategy requires only slightly more time, about 20%, than in the simple quadratic case,<br />

provided 2 n vertices are taken. The time taken by this method on the whole for all<br />

problems, however, exhibits the strongest rate <strong>and</strong> range of variation with the number of<br />

parameters.<br />

The evolution strategies prove to be completely una ected by the altered topology<br />

of the objective function as compared with the case of spherically symmetrical contour<br />

surfaces. Within the expected deviations, due to di erent sequences of r<strong>and</strong>om numbers,


Numerical Comparison of Strategies 231<br />

the measured computation times for all three problems are equal. The results show that<br />

Rechenberg's (1973) theory of the rate of progress, which does not assume a quadratic<br />

objective function but merely concentric hypersphere contour surfaces, is valid over a<br />

wide range of conditions. Even more surprising, however, is the behavior of the (10<br />

, 100) evolution method with recombination in the solution of Problems 3.4 <strong>and</strong> 3.6,<br />

whose objective functions have discontinuous rst derivatives, i.e., their contour surfaces<br />

display sharp edges <strong>and</strong> corners. The mixing of the components of variables representing<br />

individuals on di erent sides of a discontinuity appears sometimes to have a kind of<br />

smoothing e ect. In any case it can be seen that the strategy with recombination needs<br />

no more computation time or objective function calls for Problems 3.4 <strong>and</strong> 3.6 than for<br />

Problems 1.1, 3.1, <strong>and</strong> 3.2.<br />

With all the methods under test, the computation times for solving Problem 3.7 are<br />

about twice as high as those measured in the simple quadratic case. Only the simplex<br />

method is signi cantly more dem<strong>and</strong>ing of time. Since the search simplex frequently<br />

collapses in on itself it must repeatedly be reinitialized.<br />

Since Problem 3.3 could only be tackled with 3, 10, <strong>and</strong> 30 variables it is not easy to<br />

analyze the resulting data. In addition, the dependence of the increase in di cultyonthe<br />

number of parameters is not so clear-cut in this problem. Nevertheless the results seem to<br />

indicate that at least the number of objective function calls, in many strategies, increases<br />

with n in a way similar to that in the pure quadratic Problem 1.2. Because an objective<br />

function evaluation takes about O(n 2 ) operations in Problem 3.3, the total cost generally<br />

increases as one higher power of n than in Problem 1.2. The cost of the variable metric<br />

strategy <strong>and</strong> both versions of the (10 , 100) evolution strategy seems to increase even more<br />

rapidly. In the latter case there is a suspicion that the chosen initial step lengths are too<br />

large for this problem when there are very many variables. Their reduction to a suitable<br />

size then takes a few additional generations. The twomembered evolution strategy, which<br />

is able to adjust unsuitable initial step lengths relatively quickly, needed about the same<br />

number of mutations for both Problems 1.2 <strong>and</strong> 3.3. Since only one experiment per<br />

strategy <strong>and</strong> number of variables was performed, the e ect of the particular sequence<br />

of r<strong>and</strong>om numbers on the recorded computation times is not known. The particularly<br />

advantageous behavior of the DFPS method on exactly quadratic objective functions is<br />

clearly wasted once the problem deviates from this model structure in fact it seems that<br />

the search process is appreciably held back byaninterpretation of the measured data in<br />

terms of an inappropriate internal model.<br />

So far we have only discussed the results for the seven unconstrained problems, since<br />

they were amenable to solution by all the search strategies. Problem 3.8, with constraints,<br />

corresponds to the second model function (corridor model) for which Rechenberg (1973)<br />

has obtained theoretically the rate of progress of the two membered evolution strategy<br />

with optimal adaptation of variances. According to his analysis, one expects a linear rate<br />

of convergence increasing with the width of the corridor <strong>and</strong> inversely proportional to<br />

the number of variables. The results of the third set of tests con rm that the number<br />

of mutations or generations increases linearly with n if the width of the corridor <strong>and</strong><br />

the reference distance to be covered are held constant. The picture for the Rosenbrock<br />

strategy is as usual: the time consumption increases as O(n 3 ) again. The point atn =75


232 Comparison of Direct Search Strategies for Parameter Optimization<br />

departs from the general trend of the others simply because no orthogonalizations were<br />

performed in this case. But the di erence is not dramatic, because the cost of testing the<br />

constraints is of the same order of magnitude as that of rotating the coordinates. The<br />

complex method takes computation times that initially increase somewhat more rapidly<br />

than O(n 3 ). This corresponds to a greater than linearly increasing number of objective<br />

function evaluations. As we have already seen in other problems, the increase becomes<br />

even steeper as the number of parameters increases. With n =95variables, the required<br />

distance was only partially covered within the maximum computation time.<br />

Problem 3.9 represents a modi cation of Problem 3.8 with respect to the constraints.<br />

In place of the (2 n ; 2) linear constraints, the corridor is bounded by a single non-linear<br />

boundary condition. The cost of testing the feasibility of an iteration point is thereby<br />

greatly reduced. The number of mutations or generations of the evolution strategies is<br />

higher than in Problem 3.8 but still increases as O(n) the computation times in contrast<br />

to Problem 3.8 only increase as O(n 2 ). The Rosenbrock method also has no di culty with<br />

this problem, although the necessary rotations of the coordinate system make the times<br />

of order O(n 3 ). The complex method could only solve Problem 3.9 for n =3upwards<br />

of n = 10 it no longer converged.<br />

The last problem, Problem 3.10, which also has inequality constraints, turned out<br />

to be extremely di cult for all the search methods in the test. The main problem is<br />

one of scaling. Convergence in the neighborhood of the minimum can be achieved if, <strong>and</strong><br />

practically only if, the step lengths in the coordinate directions are individually adjustable.<br />

They have to di er from each other by several powers of 10. For n = 30, no strategy<br />

managed to solve the problem within the maximum allowed computation time. The<br />

complex method sometimes failed to end the search within this time for n = 10. The<br />

intermediate results achieved after 8 hours are presented in Appendix A, Section A.3. All<br />

of the evolution strategies do better than the methods of Rosenbrock <strong>and</strong> Box.<br />

The result that the two membered evolution strategy came closer to the objective<br />

than the multimembered evolution without recombination was not completely unexpected,<br />

because considerably fewer generations than mutations can occur within the allowed time.<br />

What is more surprising is that the (10 , 100) strategy with recombination does almost as<br />

well as the two membered version. Here once again, the degree of freedom gained by the<br />

possibilities of recombination shows itself to advantage. The variances of the mutation<br />

step lengths do adjust themselves individually quite di erently according to the situation<br />

<strong>and</strong> thus permit much faster convergence than with equal variances for all variables.<br />

The other evolution strategies only come as close as they do to the solution because<br />

the variances reach their relative lower bounds at di erent times, whereby di erences in<br />

their sizes are introduced. This scaling process is, however, very much slower than the<br />

continuous process of adaptation brought aboutby the recombination mechanism.<br />

6.4 Core storage required<br />

Up to now, only the time has been considered as a measure of the computational cost.<br />

There is, however, another important characteristic that a ects the applicability of optimization<br />

strategies, namely the core storage required. (Today nobody would use this


Core storage required 233<br />

term \core" here, but at the time these tests were performed, it was so called.) All indirect<br />

methods of quadratic optimization, which solve the linear equations for the extremal,<br />

require storage of order O(n 2 ) for the matrix of coe cients. The same holds for quasi-<br />

Newton methods, except that here the signi cant r^oleisplayed by the approximation to<br />

the inverse Hessian matrices. Most strategies that perform line searches in other than<br />

coordinate directions also require O(n 2 )words for the storage of n vectors, each withn<br />

coe cients. An exception to this rule is the conjugate gradient method of Fletcher <strong>and</strong><br />

Reeves, which at each stage only needs to retain the latest generated direction vector<br />

for the subsequent iteration. Of the direct search methods included in the tests, the coordinate<br />

methods, the method of Hooke <strong>and</strong> Jeeves,<strong>and</strong>theevolution strategies work<br />

with only O(n) words of core storage. How important the formal storage requirement<br />

of an optimization method can be is shown by the maximum number of variables for<br />

the tested strategies in Table 6.2. The limiting values range from 75 to 4,000 under the<br />

given conditions. There exist, of course, tricks such as segmentation for enabling larger<br />

programs to be run on smaller machines the cost of the strategy should then take into<br />

account, however, the extra cost in preparation time for an optimization. (Here again,<br />

modern virtual storage techniques <strong>and</strong> the relative cheapness of memory chips make the<br />

considerations above look rather old-fashioned.)<br />

In the following Table 6.11, all the strategies compared are listed again, together with<br />

the order of magnitude of their required computation time as obtained from the rst set of<br />

tests (columns 1 <strong>and</strong> 2). The third column shows how the computation time would vary<br />

if each function call performed O(n 2 ) rather than O(n) operations, as would occur for the<br />

worst case of a general quadratic objective function. The fourth column gives the storage<br />

requirement, again only as an order of magnitude, <strong>and</strong> the fth displays the product<br />

of the time <strong>and</strong> storage requirements from the two previous columns. Judging by the<br />

computation computation time alone, the variable metric strategy seems the best suited<br />

for true quadratic problems. In the least favorable case, however, it is more expensive<br />

than an indirect method <strong>and</strong> only faster in special cases. Problems having a very simple<br />

structure (e.g., Problem 1.1) can be solved just as well by direct search methods the time<br />

they take isatworst only a constant factor more than that of a second order method.<br />

If the total cost is measured by the product of time <strong>and</strong> storage requirements, all those<br />

strategies that store a two dimensional array of data, show up badly at least for problems<br />

with many variables. Since the coordinate methods have shown unreliable convergence,<br />

the method of Hooke <strong>and</strong> Jeeves <strong>and</strong> the evolution strategies remain as the least costly<br />

optimization methods. Their cost does not exceed that of indirect methods. The product<br />

of time <strong>and</strong> storage is not such a bad measure of the total cost in many computing centers<br />

jobs have been, in fact, charged with the product of storage requested in K words <strong>and</strong><br />

the time in seconds of occupation of the central processing unit (K-core-sec).<br />

A comparison of the two membered <strong>and</strong> multimembered evolution strategies seems<br />

clearly to favor the simpler method. This is not surprising as several individuals in the<br />

multimembered procedure have to nd their way towards the optimum. In nature, this<br />

process runs in parallel. Already in the early 1970s, rst e orts towards constructing<br />

multi-processor computers were undertaken (see Barnes et al., 1968 Miranker, 1971).


234 Comparison of Direct Search Strategies for Parameter Optimization<br />

Table 6.11: The dependence of the total costs of the search methods<br />

on the number of variables (n)<br />

Strategy Computation Computation Computation Core K-core-sec<br />

time for time for time for gen. storage<br />

Problem 1.1 Problem 1.2 quadr. probl.<br />

FIBO,GOLD,LAGR n 2<br />

HOJE >n 2<br />

DSCG n 2<br />

DSCP n 2<br />

POWE n 2<br />

DFPS n 2<br />

SIMP >n 3<br />

ROSE n 3<br />

COMP >n 3<br />

EVOL,GRUP,REKO n 2<br />

n 3y<br />

n 3<br />

n 4<br />

n 3<br />

n 3y<br />

n 2:5<br />

n 5<br />

n 4<br />

n 5y<br />

n 3<br />

n 4<br />

n 4<br />

n 4<br />

n 4<br />

n 4<br />

n 3:5<br />

n 5<br />

n 4<br />

n 5<br />

n 4<br />

n n 5<br />

n n 5<br />

n 2<br />

n 2<br />

n 2<br />

n 2<br />

n 2<br />

n 2<br />

n 2<br />

n 6<br />

n 6<br />

n 6<br />

n 5:5<br />

n 7<br />

n 6<br />

n 7<br />

n n 5<br />

On such a parallel computer, supposing it had 100 sub-units, one could simultaneously<br />

perform all the mutations <strong>and</strong> objective function evaluations of one generation in the<br />

(10 , 100) evolution strategy. The time required for the optimization would be about<br />

two orders of magnitude less than it is with a serially operating machine. In Figures 6.14<br />

<strong>and</strong> 6.15 the dotted lines show the results that would be obtained by the (10 , 100) strategy<br />

without recombination in the hypothetical case of parallel operation. No other methods<br />

can make use of parallel operations to such an extent. On SIMD (single instructions,<br />

multiple data) architectures, the possible speedup is sharply limited by the percentages of<br />

a program's scalar <strong>and</strong> vector operations. Using array arithmetic for all matrix <strong>and</strong> vector<br />

operations, the execution time of a program may be accelerated at most by a factor of ve,<br />

given that these operations would serially take 80% of the computation time. On MIMD<br />

(multiple instructions, multiple data) machines, the speedup is limited by thenumber of<br />

processing units a program can make useof<strong>and</strong>by the amount ofcommunication needed<br />

between the processors <strong>and</strong> the data store(s). Most classical optimization algorithms<br />

cannot economically employ large MIMD computers{even the less, the more sophisticated<br />

the procedures are. Multimembered evolution strategies, however, are easily scalable to<br />

any number of processors <strong>and</strong> communication links between them. For a taxonomy of<br />

parallel versions of evolution strategies, see Ho meister <strong>and</strong> Schwefel (1990).<br />

y Not sure to converge


Chapter 7<br />

Summary <strong>and</strong> Outlook<br />

So, is the evolution strategy the long-sought-after universal method of optimization?<br />

Unfortunately, things are not so simple <strong>and</strong> this question cannot be answered with a clear<br />

\yes." In two situations, in particular, the evolution strategies proved to be inferior to<br />

other methods: for linear <strong>and</strong> quadratic programming problems. These cases demonstrate<br />

the full e ectiveness of methods that are specially designed for them, <strong>and</strong> that cannot<br />

be surpassed by strategies that operate without an adequate internal model. Thus if one<br />

knows the topology of the problem to be solved <strong>and</strong> it falls into one of these categories,<br />

one should always make use of such special methods. For this reason there will always<br />

rightly exist a number of di erent optimization methods.<br />

In other cases one would naturally not search for a minimum or maximum iteratively<br />

if an analytic approach presented itself, i.e., if the necessary existence conditions lead to<br />

an easily <strong>and</strong> uniquely soluble system of equations. Nearest to this kind of indirect optimization<br />

come the hill climbing strategies, which operate with a global internal model.<br />

They approximate the relation between independent <strong>and</strong> dependent variables by a function<br />

(e.g., a polynomial of high order) <strong>and</strong> then follow the analytic route, but within the<br />

model rather than reality. Since the approximation will inevitably not be exact, the process<br />

of analysis <strong>and</strong> synthesis must be repeated iteratively in order to locate an extremum<br />

exactly. The rst part, identi cation of the parameters or construction of the model,<br />

costs a lot in trial steps. The cost increases with n, thenumber of variables, <strong>and</strong> p, the<br />

order of the tting polynomial, as O(n p ). For this reason hill climbing methods usually<br />

keep to a linear model ( rst order strategies, gradient methods) or a quadratic model<br />

(second order strategies, Newton methods). All the more highly developed methods also<br />

try as infrequently as possible to adjust the model to the local topology (e.g., the method<br />

of steepest descents) or to advance towards the optimum during the model construction<br />

stage (e.g., the quasi-Newton <strong>and</strong> conjugate gradient strategies). Whether this succeeds,<br />

<strong>and</strong> the information gathered is su cient, depends entirely on the optimization problem<br />

in question. A quadratic model seems obviously more suited to a non-linear problem than<br />

a linear model, but both have only a limited, local character. Thus in order to prove that<br />

the sequence of iterations converges <strong>and</strong> to make general statements about the speed of<br />

convergence <strong>and</strong> the Q-properties, very strict conditions must be satis ed by the objec-<br />

235


236 Summary <strong>and</strong> Outlook<br />

tive function <strong>and</strong>, if they exist, also by the constraints, such as unimodality, convexity,<br />

continuity, <strong>and</strong> di erentiability. Linear or quadratic convergence properties require not<br />

only conditions on the structure of the problem, which frequently cannot be satis ed,<br />

but also presuppose that the mathematical operations are in principle carried out with<br />

in nite accuracy . Many an attractive strategy thus fails not only because a problem<br />

is \pathological," having non-optimal stationary points, an inde nite Hessian matrix, or<br />

discontinuous partial derivatives, but simply because of inevitable rounding errors in the<br />

calculation, which works with a nite number of signi cant gures. Theoretical predictions<br />

are often irrelevant to practical problems <strong>and</strong> the strength of a strategy certainly lies<br />

in its capability of dealing with situations that it recognizes as precarious: for example, by<br />

cyclically erasing the information that has been gathered or by introducing r<strong>and</strong>om steps.<br />

As the test results con rm, the second order methods are particularly susceptible. A<br />

questionable feature of their algorithms is, for example, the line search for relative optima<br />

in prescribed directions. Contributions to all conferences in the late 1970s clearly showed<br />

a leaning towards strategies that do not employ line searches, thereby requiring more iterations<br />

but o ering greater stability. The simpler its internal model, the less complete the<br />

required information, the more robust an optimization strategy can be. The more rigid<br />

the representation of the model is, the more e ect perturbations of the objective function<br />

have, even those that merely result from the implementation on digital, analogue, or hybrid<br />

computers. Strategies that accept no worsening of the objective function are very<br />

easily led astray.<br />

Every attempt to accelerate the convergence is paid for by loss in reliability. The<br />

ideal of guaranteed absolute reliability, from which springs the stochastic approximation<br />

(in which the measured objective function values are assumed to be samples of a<br />

stochastic, e.g., Gaussian distribution), leads directly to a large reduction in the rates of<br />

convergence. The starkest contradiction, however, between the requirements for speed<br />

<strong>and</strong> reliability can be seen in the problem of discovering a global optimum among several<br />

local optima. Imagine the situation of a blind person who arrives at New York <strong>and</strong><br />

wishes, without assistance or local knowledge, to reach the summit of Mt. Whitney. For<br />

how long might he seek? The task becomes far more formidable if there are more than<br />

two variables (here longitude <strong>and</strong> latitude) to determine. The most reliable global search<br />

method is the volume-oriented grid method, which at the same time is the costliest. In the<br />

multidimensional case its information requirement istoohuge to be satis ed. There is,<br />

therefore, often no alternative but to strike a compromise between reliability <strong>and</strong> speed.<br />

Here we might adopt the sequential r<strong>and</strong>om search with normally distributed steps <strong>and</strong><br />

xed variances. It has the property ofalways maintaining a chance of global convergence,<br />

<strong>and</strong> is just as reliable (although slower) in the presence of stochastic perturbations. It also<br />

has a path-oriented character: According to the sizes of the selected st<strong>and</strong>ard deviations<br />

of the r<strong>and</strong>om components, it follows more or less exactly the gradient path <strong>and</strong> thus<br />

avoids testing systematically the whole parameter space. A further advantage is that<br />

its storage requirement increases only linearly with the number of variables. This can<br />

sometimes be a decisive factor in favor of its implementation. Most of the deterministic<br />

hill climbing methods require storage space of order O(n 2 ). The simple operations of<br />

the algorithm guarantee the least e ect of rounding errors <strong>and</strong> are safe from forbidden


Summary <strong>and</strong> Outlook 237<br />

numerical operations (division by zero, square root of a negative number, etc.). No<br />

conditions of continuity or di erentiability are imposed on the objective function. These<br />

advantages accrue from doing without an internal model, not insisting on an improvement<br />

at each step, <strong>and</strong> having an almost unlimited set of search directions <strong>and</strong> step lengths. It<br />

is surely not by chance that this method of zero order corresponds to the simplest rules<br />

of organic evolution, which can also cope, <strong>and</strong> has coped, with di cult situations. Two<br />

objections are nevertheless sometimes raised against the analogy of mutations to r<strong>and</strong>om<br />

steps.<br />

The rst is directed against r<strong>and</strong>omness as such. A common point of view, which<br />

need not be explicitly countered, is to equate r<strong>and</strong>omness with arbitraryness, even going<br />

so far as to suppose that \r<strong>and</strong>om" events are the result of a superhuman h<strong>and</strong> sharing<br />

out luck <strong>and</strong> misfortune but it is then further asserted that mutations do after all have<br />

causes, <strong>and</strong> it is concluded that they should not be regarded as r<strong>and</strong>om. Against this it<br />

can be pointed out that r<strong>and</strong>omness <strong>and</strong> causality are not contradictory concepts. The<br />

statistical point of view that is expressed here simply represents an avoidance of statements<br />

about individual events <strong>and</strong> their causes. This is especially useful if the causal relation<br />

is very complicated <strong>and</strong> one is really only interested in the global behavioral laws of a<br />

stochastic set of events, as they are expressed by probability density distributions. The<br />

treatment ofmutations as stochastic events rather than otherwise is purely <strong>and</strong> simply a<br />

re ection of the fact that they represent undirected <strong>and</strong> on average small deviations from<br />

the initial condition. Since one has had to accept that non-linear dynamic systems rather<br />

frequently produce behaviors called deterministic chaos (which in turn is used to create<br />

pseudor<strong>and</strong>om numbers on computers), arguments against speaking of r<strong>and</strong>om events in<br />

nature have diminished considerably.<br />

The second objection concerns the unbelievably small probability, asproved by calculation,<br />

that a living thing, or even a mere wristwatch, could arise from a chance step<br />

of nature. In this case, biological evolution is implicitly being equated to the simultaneous,<br />

pure r<strong>and</strong>om methods that resemble the grid search. In fact the achievements<br />

of nature are not explicable with this model concept. If mutations were r<strong>and</strong>om events<br />

evenly distributed in the whole parameter space it would follow that later events would<br />

be completely independent of the previous results that is to say that descendants of a<br />

particular parent would bear no resemblance to it. This overlooks the sequential character<br />

of evolution, which is inherent in the consecutive generations. Only the sequential<br />

r<strong>and</strong>om search can be regarded as an analogue of organic evolution. The changes from<br />

one generation to the next, expressed as rates of mutation, are furthermore extremely<br />

small. The fact that this must be so for a problem with so many variables is shown by<br />

Rechenberg's theory of the two membered evolution strategy: optimal (i.e., fastest possible)<br />

progress to the optimum is achieved if, <strong>and</strong> only if, the st<strong>and</strong>ard deviations of the<br />

r<strong>and</strong>om components of the vector of changes are inversely proportional to the number of<br />

variables. The 1=5 success rule for adaptation of the step length parameters does not,<br />

incidentally, have a biological basis rather it is suited to the requirements of numerical<br />

optimization. It allows rates of convergence to be achieved that are comparable to those<br />

of most other direct search strategies. As the comparison tests show, because of its low<br />

computational cost per iteration the evolution strategy is actually far superior to some


238 Summary <strong>and</strong> Outlook<br />

methods for many variables, for example those that employ costly orthogonalization processes.<br />

The external control of step lengths sometimes, however, worsens the reliabilityof<br />

the strategy. In \pathological" cases it leads to premature termination of the search <strong>and</strong><br />

reduces besides the chance of global convergence.<br />

Now instead of concluding like Bremermann that organic evolution has only reached<br />

stagnation points <strong>and</strong> not optima, for example, in ecological niches, one should rather<br />

ask whether the imitation of the natural process is su ciently perfect. One can scarcely<br />

doubt the capability ofevolution to create optimal adaptations <strong>and</strong> ever higher levels<br />

of development the already familiar examples of the achievements of biological systems<br />

are too numerous. Failures with simulated evolution should not be imputed to nature<br />

but to the simulation model. The two membered scheme incorporates only the principles<br />

of mutation <strong>and</strong> selection <strong>and</strong> can only be regarded as a very simple basis for a true<br />

evolution strategy. On the other h<strong>and</strong> one must proceed with care in copying nature, as<br />

demonstrated by Lilienthal's abortive attempt, which is ridiculed nowadays, to build a<br />

ying machine by imitating the birds. The objective, to produce high lift with low drag,<br />

is certainly the same in both cases, but the boundary conditions (the ow regime, as<br />

expressed by the Reynolds number) are not. Bionics, the science of evaluating nature's<br />

patents, teaches us nowadays to beware of imitating in the sense of slavishly copying all the<br />

details but rather to pay attention to the principle. Thus Bremermann's concept of varying<br />

the variables individually instead of all together must also be regarded as an inappropriate<br />

way to go about an optimization with continuously variable quantities. In spite of the<br />

many, often very detailed investigations made into the phenomenon of evolution, biology<br />

has o ered no clues as to how an improved imitation should look, perhaps because it has<br />

hitherto been a largely descriptive rather than analytic science. The di culties of the two<br />

membered evolution with step length adaptation teach us to look here to the biological<br />

example. It also alters the st<strong>and</strong>ard deviations through the generations, as proved by<br />

the existence of mutator genes <strong>and</strong> repair enzymes. Whilst nature cannot in uence the<br />

mutation-provoking conditions of the environment, it can reduce their e ects to whatever<br />

level is suitable. The step lengths are genetically determined they can be thought ofas<br />

strategy parameters of nature that are subject to the mutation-selection process just like<br />

the object parameters.<br />

To carry through this principle as the algorithm of an improved evolution strategy<br />

one has to go over from the two membered toamultimembered scheme. The ( , )<br />

strategy does so by employing the population principle <strong>and</strong> allowing parents in each<br />

generation to produce descendants, of which the best are selected as parents of the<br />

following generation. In this way the sequential as well as the simultaneous character<br />

of organic evolution is imitated the two membered concept only achieves this insofar<br />

as a single parent produces descendents until it is surpassed by one of them in vitality,<br />

the biological criterion of goodness. According to Rechenberg's hypothesis that the forms<br />

<strong>and</strong> intricacies of the evolutionary process that weobservetoday are themselves the result<br />

of development towards an optimal optimization strategy, our measures should lead to<br />

improved results. The test results show that the reliability of the (10 , 100) strategy,<br />

taken as an example, is indeed better than that of the (1+1) evolution strategy. In<br />

particular, the chances of locating global optima in multimodal problems have become


Summary <strong>and</strong> Outlook 239<br />

considerably greater. Global convergence can even be achieved in the case of a non-convex<br />

<strong>and</strong> disconnected feasible region. In the rate of convergence test the (10 , 100) strategy<br />

does a lot worse, but not by the factor 100 that might be expected. In terms of the number<br />

of required generations, rather than the computation time, the multimembered strategy<br />

is actually considerably faster. The increase in speed compared to the two membered<br />

method comes about because not only the sign of 4F ,thechange in the function value,<br />

but also its magnitude plays a r^ole in the selection process. Nature possesses a way of<br />

exploiting this advantage that is denied to conventional, serially operating computers: It<br />

operates in parallel. All descendants of a generation are produced at the same time, <strong>and</strong><br />

their vitality is tested simultaneously. If nature could be imitated in this way, the ( , )<br />

strategy would make bothavery reliable <strong>and</strong> a fast optimization method.<br />

The following two paragraphs, though completely out of date, have been left in place<br />

mainly to demonstrate the considerable shift in the development of computers during the<br />

last 20 years (compare with Schwefel, 1975a). Meanwhile parallel computers are beginning<br />

to conquer desk tops.<br />

Long <strong>and</strong> complicated iterative processes, such as occur in many other branches<br />

of numerical mathematics, led engineers <strong>and</strong> scientists of the University of<br />

Illinois, U.S.A., to consider new ways of reducing the computation times of<br />

programs. They built their own computer, Illiac IV, which has especially short<br />

data retrieval <strong>and</strong> transfer times (Barnes et al., 1968). They were unable to<br />

approach the 10 20 bits/sec given by Bledsoe (1961) as an upper limit for serial<br />

computers, but there will inevitably always be technological barriers to<br />

achieving this physical maximum.<br />

Anovel organizational principle of Illiac IV is much more signi cant inthis<br />

connection. A bank of satellite computers are attached to a central unit, each<br />

with its own processor <strong>and</strong> access to a common memory. The idea is for the<br />

sub-units to execute simultaneously various parts of the same program <strong>and</strong> by<br />

this true parallel operation to yield higher e ective computation speeds. In<br />

fact not every algorithm can take advantage of this capability, for it is impossible<br />

to execute two iterations simultaneously if the result of one in uences<br />

the next. It may sometimes be necessary to reconsider <strong>and</strong> make appropriate<br />

modi cations to conventional methods, e.g., of linear algebra, before the advantages<br />

of the next generation of computers can be exploited. The potential<br />

<strong>and</strong> the problems of implementing parallel computers are already receiving<br />

close attention: Shedler (1967), Karp <strong>and</strong> Miranker (1968), Miranker (1969,<br />

1971), Chazan <strong>and</strong> Miranker (1970), Abe <strong>and</strong> Kimura (1970), Sameh (1971),<br />

Patrick (1972), Gilbert <strong>and</strong> Ch<strong>and</strong>ler (1972), Hansen (1972), Eisenberg <strong>and</strong><br />

McGuire (1972), Casti, Richardson, <strong>and</strong> Larson (1973), Larson <strong>and</strong> Tse (1973),<br />

Miller (1973), Stone (1973a,b). A version of FORTRAN for parallel computers<br />

has already been devised (Millstein, 1973).<br />

Another signi cant advantage of the multimembered as against the two membered scheme<br />

that also holds for serial calculations is that the self-adjustment of step lengths can be<br />

made individually for each component. An automatic scaling of the variables results from


240 Summary <strong>and</strong> Outlook<br />

this, which in certain cases yields a considerable improvement in the rate of progress. It<br />

can be achieved either by separate variation of the st<strong>and</strong>ard deviations i for i = 1(1)n,<br />

by recombination alone, or, even better, by both measures together. Whereas in the two<br />

membered scheme, in which (unless the (0)<br />

i are initially given di erentvalues) the contour<br />

lines of equiprobable steps are circles, or hyperspherical surfaces, they are now ellipses or<br />

hyperellipsoids that can extend or contract along the coordinate directions following the<br />

n-dimensional normal distribution of the set of n r<strong>and</strong>om components zi for i = 1(1)n :<br />

1<br />

w(z) =<br />

(2 ) n 2<br />

nQ<br />

i=1<br />

i<br />

This is not yet, however, the most general form of a normal distribution, which is rather:<br />

p<br />

Det A<br />

w(z) =<br />

(2 ) n exp<br />

2<br />

; 1<br />

2 (z ; )T A (z ; )<br />

The expectation value vector can be regarded as a deterministic part of the r<strong>and</strong>om step<br />

z. However, the comparison made by Schrack <strong>and</strong>Borowski (1972) between the r<strong>and</strong>om<br />

strategies of Schumer-Steiglitz <strong>and</strong> Matyas shows that even an ingenious learning scheme<br />

for adapting to the local conditions only improves the convergence in special cases. A<br />

much more important feature seems to be the step length adaptation. It is now possible<br />

for the elements of the matrix A to be chosen so as to give the ellipsoid of variation any<br />

desired orientation in the space. Its axes, the regression directions of the r<strong>and</strong>om vector,<br />

only coincide with the coordinate axes if A is a diagonal matrix. In that case the old<br />

scheme is recovered whereby thevariances ii or the 2<br />

i reappear as diagonal elements of<br />

the inverse matrix A ;1 . If, however, the other elements, the covariances ij = ji are nonzero,<br />

the ellipsoids are rotated in the space. The r<strong>and</strong>om components zi become mutually<br />

dependent, or correlated. The simplest kind of correlation is linear, which is the only case<br />

to yield hyperellipsoids as surfaces of constant step probability. Instead of just n strategy<br />

parameters i one would now have tovary n<br />

2 (n + 1) di erent quantities ij. Although in<br />

principle the multimembered evolution strategy allows an arbitrary number of strategy<br />

variables to be included in the mutation-selection process, in practice the adaptation of<br />

so many parameters could take too long <strong>and</strong> cancel out the advantage of more degrees of<br />

freedom. Furthermore, the ij must satisfy certain compatibility conditions (Sylvester's<br />

criteria, see Faddejew <strong>and</strong> Faddejewa, 1973) to ensure an orthogonal coordinate system<br />

exp<br />

; 1<br />

2<br />

or a positive de nite matrix A. In the simplest case, n =2,with<br />

there is only one condition:<br />

<strong>and</strong> the quantity de ned by<br />

12 =<br />

A ;1 =<br />

"<br />

11 12<br />

21 22<br />

2<br />

12 = 2<br />

21 < 11 22 = 2<br />

1<br />

#<br />

nX<br />

i=1<br />

2<br />

2<br />

zi<br />

12 q<br />

( 1 2) ;1 < 12 < 1<br />

i<br />

2 !


Summary <strong>and</strong> Outlook 241<br />

is called the correlation coe cient. If the covariances were generated independently by<br />

amutation process in the multimembered evolution scheme, with subsequent application<br />

of the rules of Scheuer <strong>and</strong> Stoller (1962) or Barr <strong>and</strong> Slezak (1972), there would be no<br />

guarantee that the surfaces of equal probability density would actually be hyperellipsoids.<br />

It follows that such a linear correlation of the r<strong>and</strong>om changes can be constructed more<br />

easily by rst generating as before (0 2<br />

i ) normally distributed, independent r<strong>and</strong>om components<br />

<strong>and</strong> then making a coordinate rotation through prescribed angles. These angles,<br />

rather than the covariances ij, represent the additional strategy variables. In the most<br />

general case there are a total of np = n(n<br />

; 1) such angles, which can take all values<br />

2<br />

between 0 0 <strong>and</strong> 360 0 (or ; <strong>and</strong> ). Including the ns = n \step lengths" i, the total<br />

number of strategy parameters to be speci ed in the population bymutation <strong>and</strong> selection<br />

is n<br />

2 (n + 1). It is convenient to generate the angles j by an additive mutation process<br />

(cf. Equations (5.36) <strong>and</strong> (5.37))<br />

(g)<br />

Nj = (g)<br />

Ej + ^ Z (g)<br />

j for j = 1(1)np<br />

where the ^Z (g)<br />

j can again be normally distributed, for example, with a st<strong>and</strong>ard deviation<br />

4 which is the same for all angles. Let 4x0 i represent themutations as produced by the<br />

old scheme <strong>and</strong> 4xi the correlated changes in the object variables produced by the rotation<br />

for the two dimensional case (n = ns =2np = 1) the coordinate transformation<br />

for the rotation can simply be read o from Figure 7.1<br />

4x1 = 4x 0<br />

1 cos ;4x0 2 sin<br />

4x 2 = 4x 0<br />

1 sin + 4x 0<br />

2 cos<br />

For n = ns = 3 three consecutive rotations would need to be made:<br />

In the (4x 1 4x 2) plane through an angle 1<br />

In the (4x0 1 4x0<br />

2 ) plane through an angle 2<br />

In the (4x 00<br />

2 4x 00<br />

3) plane through an angle 3<br />

Starting from the uncorrelated r<strong>and</strong>om changes 4x 000<br />

1 4x 000<br />

2 4x 000<br />

3 these rotations would<br />

have to be made in the reverse order. Thus also, in the general case with n<br />

2<br />

(n ; 1)<br />

rotations, each one only involves two coordinates so that the computational cost increases<br />

as O(np). The validity of this algorithm has been proved by Rudolph (1992a).<br />

An immediate simpli cation can be made if not all the ns step lengths are di erent, i.e.,<br />

if the hyperellipsoid of equal probabilityofamutation has rotational symmetry about one<br />

or more axes. In the extreme case ns = 2 there are n ; ns such axes <strong>and</strong> only np = n ; 1<br />

relevant angles of rotation. Except for one distinct principle axis, the ellipsoid resembles<br />

a sphere. If in the course of the optimization the minimum search leads through a narrow<br />

valley (e.g., in Problem 2.37 or 3.8 of the catalogue of test problems), it will often be<br />

quite adequate to work with such a greatly reduced variability of the mutation ellipsoid.


242 Summary <strong>and</strong> Outlook<br />

α<br />

∆<br />

x’ 1<br />

2<br />

x<br />

2<br />

x ’<br />

x’<br />

∆ x<br />

1<br />

Mutation<br />

∆ x<br />

2<br />

∆<br />

’<br />

x 2<br />

Figure 7.1: Generation of correlated mutations<br />

Between the two extreme cases ns = n <strong>and</strong> ns =2(ns =1would be the uncorrelated<br />

case with hyperspheres as mutation ellipsoids) any choice of variability is possible. In<br />

general we have<br />

2 ns n<br />

np = n ; ns<br />

(ns ; 1)<br />

2<br />

For a given problem the most suitable choice of ns, the number of di erent step lengths,<br />

would have to be obtained by numerical experiment.<br />

For this purpose the subroutine KORR <strong>and</strong> its associated subroutines listed in Appendix<br />

B, Section B.3 is exibly coded to give the user considerable freedom in the choice<br />

of quantities that determine the strategy parameters. This variant oftheevolution strategy<br />

(Schwefel, 1974) could not be fully included in the strategy test (performed in 1973)<br />

however, initial results con rmed that, as expected, it is able to construct a kind of variable<br />

metric for the changes in the object variables by adapting the angles to the local<br />

topology of the objective function.<br />

The slow convergence of the two membered evolution strategy can often be traced<br />

to the fact that the problem has long elliptical (or nearly elliptical) contours of constant<br />

objective function value. If the function is quadratic, their extension (or eccentricity)<br />

can be expressed by the condition number of the matrix of second order coe cients. In<br />

the worst case, in which the search is started at the point of greatest curvature of the<br />

contour surface F (x) = const:, the rate of progress seems to be inversely proportional<br />

1<br />

x<br />

1


Summary <strong>and</strong> Outlook 243<br />

to the product of the number of variables <strong>and</strong> the square root of the condition number.<br />

This dependence on the metric would be eliminated if the directions of the axes of the<br />

variance ellipsoid corresponded to those of the contour ellipsoid, which is exactly what the<br />

introduction of correlated r<strong>and</strong>om numbers should achieve. Extended valleys in other than<br />

coordinate directions then no longer hinder the search because, after a transition phase,<br />

an initially elliptical problem is reduced to a spherical one. In this way theevolution<br />

strategy acquires properties similar to those of the variable metric method of Davidon-<br />

Fletcher-Powell (DFP). In the test, for just the reason discussed above, the latter proved<br />

to be superior to all other methods for quadratic objective functions. For such problems<br />

one should not expect it to be surpassed by the evolution strategy, since compared to the<br />

Qnproperty of the DFP method the evolution strategy has only a QO(n) property<br />

i.e., it does not nd the optimum after exactly n iterations but rather it reaches a given<br />

approximation to the objective after O(n) generations. This disadvantage, only slight in<br />

practice, is outweighed by the following advantages:<br />

Greater exibility, hence reliability, in other than quadratic cases<br />

Simpler computational operations<br />

Storage required increases only as O(n) (unless one chooses ns = n)<br />

While one has great hopes for this extension of the multimembered evolution strategy,<br />

one should not be blinded by enthusiasm to limitations in its capability. It would yield<br />

computation times no better than O(n3 ) if it turns out that a population of O(n) parents<br />

is needed for adjusting the strategy parameters <strong>and</strong> if pure serial rather than parallel<br />

computation is necessary.<br />

Does the new scheme still correspond to the biological paradigm? It has been discovered<br />

that one gene often in uences several phenotypic characteristics of an individual<br />

(pleiotropy) <strong>and</strong> conversely that many characteristics depend on the cooperative e ect<br />

of several genes (polygeny). These interactions just mean that the characteristics are<br />

correlated. A linear correlation as in Figure 7.1 represents only one of the many conceivable<br />

types in which (x0 1x0 2) is the plane of the primary, independent genetic changes <strong>and</strong><br />

(x1x2) that of the secondary, mutually correlated changes in the characteristics. Particular<br />

kinds of such dependence, for example, allometric growth, have beenintensively<br />

studied (e.g., Grasse, 1973). There is little doubt that the relationships have also adapted,<br />

during the history of development, to the topological requirements of the objective function.<br />

The observable di erences between life forms are at least suggestive ofthis. Even<br />

non-linear correlations may occur. <strong>Evolution</strong> has indeed to cope with far greater di culties,<br />

for it has no ordered number system at its disposal. In the rst place it had to create<br />

a scale of measure{with the genetic code, for example, which has been learned during the<br />

early stages of life on earth.<br />

Whether it is ultimately worth proceeding so far or further to mimicevolution is still<br />

an open question, but it is surely a path worth exploring perhaps not for continuous, but<br />

for discrete or mixed parameter optimization. Here, in place of the normal distribution<br />

of r<strong>and</strong>om changes, a discrete distribution must be applied, e.g., a binomial or better<br />

still a distribution with maximum entropy (see Rudolph, 1994b), so that for small \total


244 Summary <strong>and</strong> Outlook<br />

step lengths" the probability really is small that two ormorevariables are altered simultaneously.<br />

Occasional stagnation of the search will only be avoided, in this case, if the<br />

population allows worsening within a generation. Worsening is not allowed by the two<br />

membered strategy, but it is by the multimembered ( , ) strategy , in which the parents,<br />

after producing descendants, no longer enter the selection process. Perhaps this shows<br />

that the limited life span of individuals is no imperfection of nature, no consequence of<br />

an inevitable weakness of the system, but rather an intelligent, indeed essential means of<br />

survival of the species. This conjecture is again supported by the genetically determined,<br />

in e ect preprogrammed, ending of the process of cell division during the life of an individual.<br />

Sequential improvement <strong>and</strong> consequent rapid optimization is only made possible<br />

by the following of one generation after another. However, one should be extremely wary<br />

of applying such concepts directly to mankind. Human evolution long ago left the purely<br />

biological domain <strong>and</strong> is more active nowadays in the social one. One properly refers<br />

now to a cultural evolution. There is far too little genetic information to specify human<br />

behavior completely.<br />

Little is known of which factors are genetically inherited <strong>and</strong> which socially acquired,<br />

as shown by the continuing discussions over the results of behavioral research <strong>and</strong> the<br />

diametrically opposite points of view of individual scientists in the eld. The two most<br />

important evolutionary principles, mutation <strong>and</strong> selection, also belong to social development<br />

(All<strong>and</strong>, 1970). Actually, even more complicated mechanisms are at work here.<br />

Oversimpli cations can have quite terrifying consequences, as shown by the example of<br />

social Darwinism, to which Koch (1973) attributes responsibility for racist <strong>and</strong> imperialist<br />

thinking <strong>and</strong> hence for the two World Wars. No such further speculation with the evolution<br />

strategy will therefore be embarked upon here. The fact remains that the recognition<br />

of evolution as representing a sequential optimization process is too valuable to be dismissed<br />

to oblivion as evolutionism (Goll, 1972). Rather one should consider what further<br />

factors are known in organic evolution that might beworth imitating, in order to make of<br />

the evolution strategy an even more general optimization method for up to now several<br />

developments have con rmed Rechenberg's hypothesis that the strategy can be improved<br />

by taking into account further factors, at least when this is done adequately <strong>and</strong> the<br />

biological <strong>and</strong> mathematical boundary conditions are compatible with each other. Furthermore,<br />

by no means all evolutionary principles have yet been adopted for optimizing<br />

technical systems.<br />

The search for global optima remains a particularly di cult problem. In such cases<br />

nature seems to hunt for all, or at least a large number of maxima or minima at the same<br />

time by the splitting of a population (the isolation principle). After a transition phase<br />

the individuals of both or all the subpopulations can no longer intermix. Thereafter each<br />

group only seeks its own speci c local optimum, which might perhaps be the global one.<br />

This principle could easily be incorporated into the multimembered scheme if a criterion<br />

could be de ned for performing the splitting process.<br />

Many evolution principles that appeared later on the scene can be explained as affording<br />

the greater chance of survival to a population having the better mechanism of<br />

inheritance (for these are also variable) compared to an other forming a worse \strategy<br />

of life." In this way the evolution method could itself be optimized by organizing a compe-


Summary <strong>and</strong> Outlook 245<br />

tition between several populations that alter the concept of the optimum seeking strategy<br />

itself. The simplest possibility, for example, would be to vary the number of parents <strong>and</strong><br />

of descendants two or more groups would be set up each with its own values of these<br />

parameters, each group would be given a xed time to seek the optimum then the group<br />

that has advanced the most would be allowed to \survive." In this way these strategy<br />

variables would be determined to best suit the particular problem <strong>and</strong> computer, with the<br />

objective of minimizing the required computation time. One might call such an approach<br />

meta- or hierarchical evolution strategy (see Back, 1994a,b).<br />

The solution of problems with multiple objectives could also be approached with the<br />

multimembered evolution strategy. This is really the most common type of problem<br />

in nature. The selection step, the reduction to the best of the descendants, could be<br />

subdivided into several partial steps, in each ofwhich only one of the criteria for selection<br />

is applied. In this way noweighting of the partial objectives would be required. First<br />

attempts with only two variables <strong>and</strong> two partial objectives showed that a point onthe<br />

Pareto line is always approached as the optimum. By unequal distribution of the partial<br />

selections the solution point could be displaced towards one of the partial objectives. At<br />

this stage subjective information would have to be applied because all the Pareto-optimal<br />

solutions are initially equally good (see Kursawe, 1991, 1992).<br />

Contrary to many statements or conjectures that organic evolution is a particularly<br />

wasteful optimization process, it proves again <strong>and</strong> again to be precisely suited to advancing<br />

with maximum speed without losing reliability ofconvergence, even to better <strong>and</strong><br />

better local optima. This is just what is required in numerical optimization. In both<br />

cases the available resources limit what can be achieved. In one case these are the limitations<br />

of food <strong>and</strong> the nite area of the earth for accommodating life, in the other they<br />

are the nite number of satellite processors of a parallel-organized mainframe computer<br />

<strong>and</strong> its limited (core) storage space. If the evolution strategy can be considered as the<br />

sought-after universal optimization method, then this is not in the sense that it solves<br />

a particular problem (e.g., a linear or quadratic function) exactly, with the least iterations<br />

or generations, but rather refers to its being the most readily extended concept,<br />

able to solve very di cult problems, problems with particularly many variables, under<br />

unfavorable conditions such as stochastic perturbations, discrete variables, time-varying<br />

optima, <strong>and</strong> multimodal objective functions (see Hammel <strong>and</strong> Back, 1994). Accordingly,<br />

the results <strong>and</strong> assessments introduced in the present work can at best be considered as<br />

a rst step in the development towards a universal evolution strategy.<br />

Finally, some early applications of the evolution strategy will be cited. Experimental<br />

tasks were the starting point for the realization of the rst ideas for an optimization<br />

strategy based on the example of biological evolution. It was also rst applied here to<br />

the solution of practical problems (see Schwefel, 1968 Klockgether <strong>and</strong> Schwefel, 1970<br />

Rechenberg, 1973). Meanwhile it is being applied just as widely to optimization problems<br />

that can be expressed in computational or algorithmic form, e.g., in the form of simulation<br />

models. The following is a list of some of the successful applications, with references to<br />

the relevant publications.<br />

1. Optimal dimensioning of the core of a fast sodium-type breeder reactor (Heusener,<br />

1970)


246 Summary <strong>and</strong> Outlook<br />

2. Optimal allocation of investments to various health-service programs in Columbia<br />

(Schwefel, 1972)<br />

3. Solving curve- tting problems by combining a least-squares method with the evolution<br />

strategy (Plaschko <strong>and</strong> Wagner, 1973)<br />

4. Minimum-weight designing of truss constructions partly in combination with linear<br />

programming (Ley ner, 1974 <strong>and</strong> Ho er, 1976)<br />

5. Optimal shaping of vaulted reinforced concrete shells (Hartmann, 1974)<br />

6. Optimal dimensioning of quadruple-joint drives (Anders, 1977)<br />

7. Approximating the solution of a set of non-linear di erential equations (Rodlo ,<br />

1976)<br />

8. Optimal design of arm prostheses (Brudermann, 1977)<br />

9. Optimization of urban <strong>and</strong> regional water supply systems (Cembrowicz <strong>and</strong> Krauter,<br />

1977)<br />

10. Combining the evolution strategy with factorial design techniques (Kobelt <strong>and</strong><br />

Schneider, 1977)<br />

11. Optimization within a dynamic simulation model of a socioeconomic system (Krallmann,<br />

1978)<br />

12. Optimization of a thermal water jet propulsion system (Markwich, 1978)<br />

13. Optimization of a regional system for the removal of refuse (von Falkenhausen, 1980)<br />

14. Estimation of parameters within a model of oods (North, 1980)<br />

15. Interactive superimposing of di erent direct search techniques onto dynamic simulation<br />

models, especially models of the energy system of the Federal Republic of<br />

Germany (Heckler, 1979 Drepper, Heckler, <strong>and</strong> Schwefel, 1979).<br />

Much longer lists of references concerning applications as well as theoretical work in<br />

the eld of evolutionary computation have been compiled meanwhile by Al<strong>and</strong>er (1992,<br />

1994) <strong>and</strong> Back, Ho meister, <strong>and</strong> Schwefel (1993).<br />

Among the many di erent elds of applications only one will be addressed here, i.e.,<br />

non-linear regression <strong>and</strong> correlation analysis. In general this leads to a multimodal<br />

optimization problem when the parameters searched for enter the hypotheses non-linearly,<br />

e.g., as exponents. Very helpful under such circumstances is a tool with which one can<br />

switch from one to the other minimization method. Beginning with a multimembered<br />

evolution strategy <strong>and</strong> re ning the intermediate results by means of a variable metric<br />

method has often led to practically useful results (e.g., Frankhauser <strong>and</strong> Schwefel, 1992).<br />

In some cases of practical applications of evolution strategies it turns out that the<br />

number of variables describing the objective function has to vary itself. An example was


Summary <strong>and</strong> Outlook 247<br />

the experimental optimization of the shape of a supersonic one-component two-phase ow<br />

nozzle (Schwefel, 1968). Conically bored rings with xed lengths could be put side by<br />

side, thus forming potentially millions of di erent inner nozzle contours. But the total<br />

length of the nozzle had to be varied itself. So the number of rings <strong>and</strong> thus the number<br />

of variables (inner diameters of the rings) had to be mutated during the search foran<br />

optimum shape as well. By imitating gene duplication <strong>and</strong> gene deletion at r<strong>and</strong>omly<br />

chosen positions, a rather simple technique was found to solve the variable number of<br />

variables problem. Such a procedure might be helpful for many structural optimization<br />

problems (e.g., Rozvany, 1994) as well.<br />

If the decision variables are to be taken from a discrete set only (the distinct values may<br />

be equidistant ornotinteger <strong>and</strong> binary values just form special subclasses), ESs may be<br />

used sometimes without any change. Within the objective function the real values must<br />

simply undergo a suitable rounding-o process as shown at the end of Appendix B, Section<br />

B.3. Since all ESs h<strong>and</strong>le unchanged objective functionvalues as improvements, the selfadaptation<br />

of the st<strong>and</strong>ard deviations on a plateau will always lead to their enlargement,<br />

until the plateaus F (x) =const: built by rounding o can be left. On a plateau, the ES<br />

performs a r<strong>and</strong>om walk with ever increasing step sizes.<br />

Towards the end of the search, however, more <strong>and</strong> more of the individual step sizes<br />

have to become very small, whereas others{singly or in some combination{should be increased<br />

to allow hopping from one to the next n-cube in the decision-variable space. The<br />

chances for that kind of adaptation are good enough as long as sequential improvements<br />

are possible the last few of them will not happen that way,however. A method of escaping<br />

from that awkward situation has been shown (Schwefel, 1975b), imitating multicellular<br />

individuals <strong>and</strong> introducing so-called somatic mutations. Even in the case of binary variables<br />

an ES thus can reach the optimum. Since no real application has been done this<br />

way until now, no further details will be given here.<br />

An interesting question is whether there are intermediate cases between a plus <strong>and</strong> a<br />

comma version of the multimembered ES. The answer must be, \Yes, there are." Instead<br />

of neglecting the parents during the selection step (within comma-ESs), or allowing them<br />

to live forever in principle (within plus-ESs only until o spring surpass them, of course),<br />

one might implant a generation counter into each individual. As soon as a pre xed limit<br />

is reached, they leave the scene automatically. Such a more general ES version could<br />

be termed a ( ) strategy, where denotes the maximal number of generations<br />

(iterations), an individual is allowed to \survive" in the population. For =1wethen<br />

get the old comma-version, whereas the old plus-version is reached if goes to in nity.<br />

There are some preliminary results now, but as yet they are too unsystematic to be<br />

presented here.<br />

Is the strict synchronization of the evolutionary process within ESs as well as GAs<br />

the best way to do the job? The answer to this even more interesting question is, \No,"<br />

especially if one makes use of MIMD parallel machines or clusters of workstations. Then<br />

one should switch to imitating life more closely: Birth <strong>and</strong> death events mayhappenatthe<br />

same time. Instead of modelling a central decision maker for the selection process (whichis<br />

an oversimpli cation) one could use a predator-prey model like that of Lotka <strong>and</strong>Volterra.<br />

Adding a neighborhood model (see Gorges-Schleuter, 1991a,b Sprave, 1993, 1994) for the


248 Summary <strong>and</strong> Outlook<br />

recombination process would free the whole strategy from all kinds of synchronization<br />

needs. Initial tests have shown that this is possible. Niching <strong>and</strong> migration as used by<br />

Rudolph (1991) will be the next features to be added to the APES (asynchronous parallel<br />

evolution strategy).<br />

A couple of earlier attempts towards parallelizing ESs will be mentioned at the end<br />

of this chapter. Since all of them are somehow intermediate solutions, however, none of<br />

them will be explained in detail. The reader is referred to the literature.<br />

A taxonomy, more or less complete with respect to possible ways of parallelizing EAs,<br />

may be found in Ho meister <strong>and</strong> Schwefel (1990) or Ho meister (1991). Rudolph (1991)<br />

has realized a coarse-grained parallel ES with subpopulations on each processor <strong>and</strong> more<br />

or less frequent migration events, whereas Sprave (1994) gave preference to a ne-grained<br />

di usion model. Both of these more volume-oriented approaches delivered great advances<br />

in solving multimodal optimization problems as compared with the more greedy <strong>and</strong> pathoriented<br />

\canonical" ( , ) ES. The comma version, by theway, is necessary to follow a<br />

nonstationary optimum (see Schwefel <strong>and</strong> Kursawe, 1992), <strong>and</strong> only such anESisableto<br />

solve on-line optimization problems.<br />

Nevertheless, one should never forget that there are many other specialized optimum<br />

seeking methods. For a practitioner, a tool box withmany di erent algorithms might<br />

always be the \optimum optimorum." Whether he or she chooses a special tool by<br />

h<strong>and</strong>, so to speak (see Heckler <strong>and</strong> Schwefel, 1978 Heckler, 1979 Schwefel, 1980, 1981<br />

Hammel, 1991 Bendin, 1992 Back <strong>and</strong> Hammel, 1993), or relies upon some knowledgebased<br />

selection scheme (see Campos, 1989 Campos <strong>and</strong> Schwefel, 1989 Campos, Peters,<br />

<strong>and</strong> Schwefel, 1989 Peters, 1989, 1991 Lehner, 1991) will largely depend on his or her<br />

experience.


Chapter 8<br />

References<br />

Glossary of abbreviations at the end of this list<br />

Aarts, E., J. Korst (1989), Simulated annealing <strong>and</strong> Boltzmann machines, Wiley, Chichester<br />

Abadie, J. (Ed.) (1967), Nonlinear programming, North-Holl<strong>and</strong>, Amsterdam<br />

Abadie, J. (Ed.) (1970), Integer <strong>and</strong> nonlinear programming, North-Holl<strong>and</strong>, Amsterdam<br />

Abadie, J. (1972), Simplex-like methods for non-linear programming, in: Szego (1972),<br />

pp. 41-60<br />

Abe, K., M. Kimura (1970), Parallel algorithm for solving discrete optimization problems,<br />

IFACKyoto Symposium on Systems Engineering Approach to Computer Control,<br />

Kyoto, Japan, Aug. 1970, paper 35.1<br />

Ablay, P. (1987), Optimieren mit <strong>Evolution</strong>sstrategien, Spektrum der Wissenschaft<br />

(1987, July), 104-115 (see also discussion in (1988, March), 3-4 <strong>and</strong> (1988, June),<br />

3-4)<br />

Ackley, D.H. (1987), A connectionist machine for genetic hill-climbing, Kluwer Academic,<br />

Boston<br />

Adachi, N. (1971), On variable-metric algorithms, JOTA 7, 391-410<br />

Adachi, N. (1973a), On the convergence of variable-metric methods, Computing 11,<br />

111-123<br />

Adachi, N. (1973b), On the uniqueness of search directions in variable-metric algorithms,<br />

JOTA 11, 590-604<br />

Adams, R.J., A.Y. Lew (1966), Modi ed sequential r<strong>and</strong>om search using a hybrid computer,<br />

University of Southern California, Electrical Engineering Department, report,<br />

May 1966<br />

249


250 References<br />

Ahrens, J.H., U. Dieter (1972), Computer methods for sampling from the exponential<br />

<strong>and</strong> normal distributions, CACM 15, 873-882, 1047<br />

Aizerman, M.A., E.M. Braverman, L.I. Rozonoer (1965), The Robbins-Monro process<br />

<strong>and</strong> the method of potential functions, ARC 26, 1882-1885<br />

Akaike, H. (1960), On a successive transformation of probability distribution <strong>and</strong> its<br />

application to the analysis of the optimum gradient method, Ann. Inst. Stat.<br />

Math. Tokyo 11, 1-16<br />

Al<strong>and</strong>er, J.T. (Ed.) (1992), Proceedings of the 1st Finnish Workshop on Genetic Algorithms<br />

<strong>and</strong> their Applications, Helsinki, Nov. 4-5, 1992, Bibliography pp. 203-281,<br />

Helsinki UniversityofTechnology, Department of Computer Science, Helsinki, Finl<strong>and</strong><br />

Al<strong>and</strong>er, J.T. (1994), An indexed bibliography of genetic algorithms, preliminary edition,<br />

Jarmo T. Al<strong>and</strong>er, Espoo, Finl<strong>and</strong><br />

Albrecht, R.F., C.R. Reeves, N.C. Steele (Eds.) (1993), Arti cial neural nets <strong>and</strong> genetic<br />

algorithms, Proceedings of an International Conference, Innsbruck, Austria,<br />

Springer, Vienna<br />

Aleks<strong>and</strong>rov, V.M., V.I. Sysoyev, V.V. Shemeneva (1968), Stochastic optimization, Engng.<br />

Cybern. 6(5), 11-16<br />

All<strong>and</strong>, A., Jr. (1970), <strong>Evolution</strong> und menschliches Verhalten, S. Fischer, Frankfort/Main<br />

Allen, P., J.M. McGlade (1986), Dynamics of discovery <strong>and</strong> exploitations|the case of<br />

the Scotian shelf ground sh sheries, Can. J. Fish. Aquat. Sci. 43, 1187-1200<br />

Altman, M. (1966), Generalized gradient methods of minimizing a functional, Bull.<br />

Acad. Polon. Sci. 14, 313-318<br />

Amann, H. (1968a), Monte-Carlo Methoden und lineare R<strong>and</strong>wertprobleme, ZAMM 48,<br />

109-116<br />

Amann, H. (1968b), Der Rechenaufw<strong>and</strong> bei der Monte-Carlo Methode mit Informationsspeicherung,<br />

ZAMM 48, 128-131<br />

Anders, U. (1977), Losung getriebesynthetischer Probleme mit der <strong>Evolution</strong>sstrategie,<br />

Feinwerktechnik und Me technik 85(2), 53-57<br />

Anderson, N., A. Bjorck (1973), A new high order method of Regula Falsi type for<br />

computing a root of an equation, BIT 13, 253-264<br />

Anderson, R.L. (1953), Recent advances in nding best operating conditions, J. Amer.<br />

Stat. Assoc. 48, 789-798<br />

Andrews, H.C. (1972), Introduction to mathematical techniques in pattern recognition,<br />

Wiley-Interscience, New York


References 251<br />

Anscombe, F.J. (1959), Quick analysis methods for r<strong>and</strong>om balance screening experiments,<br />

Technometrics 1, 195-209<br />

Antonov, G.E., V.Ya. Katkovnik (1972), Method of synthesis of a class of r<strong>and</strong>om search<br />

algorithms, ARC 32, 990-993<br />

Aoki, M. (1971), Introduction to optimization techniques|fundamentals <strong>and</strong> applications<br />

of nonlinear programming, Macmillan, New York<br />

Apostol, T.M. (1957), Mathematical analysis|a modern approach toadvanced calculus,<br />

Addison-Wesley, Reading MA<br />

Arrow, K.J., L. Hurwicz (1956), Reduction of constrained maxima to saddle-point problems,<br />

in: Neyman (1956), vol. 5, pp. 1-20<br />

Arrow, K.J., L. Hurwicz (1957), Gradient methods for constrained maxima, Oper. Res.<br />

5, 258-265<br />

Arrow, K.J., L. Hurwicz, H. Uzawa (Eds.) (1958), Studies in linear <strong>and</strong> non-linear<br />

programming, Stanford University Press, Stanford CA<br />

Asai, K., S. Kitajima (1972), Optimizing control using fuzzy automata, Automatica 8,<br />

101-104<br />

Ashby, W.R. (1960), Design for a brain, 2nd ed., Wiley, New York<br />

Ashby, W.R. (1965), Constraint analysis of many-dimensional relations, in: Wiener <strong>and</strong><br />

Schade (1965), pp. 10-18<br />

Ashby, W.R. (1968), Some consequences of Bremermann's limit for information-processing<br />

systems, in: Oestreicher <strong>and</strong> Moore (1968), pp. 69-76<br />

Avriel, M., D.J. Wilde (1966a), Optimality proof for the symmetric Fibonacci search<br />

technique, Fibonacci Quart. 4, 265-269<br />

Avriel, M., D.J. Wilde (1966b), Optimal search for a maximum with sequences of simultaneous<br />

function evaluations, Mgmt. Sci. 12, 722-731<br />

Avriel, M., D.J. Wilde (1968), Golden block search for the maximum of unimodal functions,<br />

Mgmt. Sci. 14, 307-319<br />

Axelrod, R. (1984), The evolution of cooperation, Basic Books, New York<br />

Azencott, R. (Ed.) (1992), Simulated annealing|parallelization techniques, Wiley, New<br />

York<br />

Bach, H. (1969), On the downhill method, CACM 12, 675-677<br />

Back, T. (1992a), Self-adaptation in genetic algorithms, in: Varela <strong>and</strong> Bourgine (1992),<br />

pp. 263-271


252 References<br />

Back, T. (1992b), The interaction of mutation rate, selection, <strong>and</strong> self-adaptation within<br />

a genetic algorithm, in: Manner <strong>and</strong> M<strong>and</strong>erick (1992), pp. 85-94<br />

Back, T. (1993), Optimal mutation rates in genetic search, in: Forrest (1993), pp. 2-9<br />

Back, T. (1994a), <strong>Evolution</strong>ary algorithms in theory <strong>and</strong> practice, Dr. rer. nat. Diss.,<br />

University of Dortmund, Department of Computer Science, Feb. 1994<br />

Back, T. (1994b), Parallel optimization of evolutionary algorithms, in: Davidor, Schwefel,<br />

<strong>and</strong> Manner (1994), pp. 418-427<br />

Back, T., U. Hammel (1993), Einsatz evolutionarer Algorithmen zur Optimierung von<br />

Simulationsmodellen, in: Szczerbicka <strong>and</strong> Ziegler (1993), pp. 1-22<br />

Back, T., U. Hammel, H.-P.Schwefel (1993), Modelloptimierung mit evolutionaren Algorithmen,<br />

in: Sydow (1993), pp. 49-57<br />

Back, T., F. Ho meister, H.-P. Schwefel (1991), A survey of evolution strategies, in:<br />

Belew <strong>and</strong> Booker (1991), pp. 2-9<br />

Back, T., F. Ho meister, H.-P. Schwefel (1993), Applications of evolutionary algorithms,<br />

technical report SYS-2/92, 4th ext. ed., Systems Analysis Research Group, University<br />

ofDortmund, Department of Computer Science, July 1993<br />

Back, T., G. Rudolph, H.-P. Schwefel (1993), <strong>Evolution</strong>ary programming <strong>and</strong> evolution<br />

strategies|similarities <strong>and</strong> di erences, in: Fogel <strong>and</strong> Atmar (1993), pp. 11-22<br />

Back, T., H.-P. Schwefel (1993), An overview of evolutionary algorithms for parameter<br />

optimization, <strong>Evolution</strong>ary Computation 1, 1-23<br />

Baer, R.M. (1962), Note on an extremum locating algorithm, Comp. J. 5, 193<br />

Balakrishnan, A.V. (Ed.) (1972), Techniques of optimization, Academic Press, New<br />

York<br />

Balakrishnan, A.V., M. Contensou, B.F. DeVeubeke, P. Kree, J.L. Lions, N.N. Moiseev<br />

(Eds.) (1970), Symposium on optimization, Springer, Berlin<br />

Balakrishnan, A.V., L.W. Neustadt (Eds.) (1964), Computing methods in optimization<br />

problems, Academic Press, New York<br />

Balakrishnan, A.V., L.W. Neustadt (Eds.) (1967), Mathematical theory of control,<br />

Academic Press, New York<br />

Balinski, M.L., P. Wolfe (Eds.) (1975), Nondi erentiable optimization, vol. 3 of Mathematical<br />

Programming Studies, North-Holl<strong>and</strong>, Amsterdam<br />

B<strong>and</strong>ler, J.W. (1969a), Optimization methods for computer-aided design, IEEE Trans.<br />

MTT-17, 533-552


References 253<br />

B<strong>and</strong>ler, J.W. (1969b), Computer optimization of inhomogeneous waveguide transformers,<br />

IEEE Trans. MTT-17, 563-571<br />

B<strong>and</strong>ler, J.W., C. Charalambous (1974), Nonlinear programming using minimax techniques,<br />

JOTA 13, 607-619<br />

B<strong>and</strong>ler, J.W., P.A. MacDonald (1969), Optimization of microwave networks by razor<br />

search, IEEE Trans. MTT-17, 552-562<br />

Banzhaf, W., M. Schmutz (1992), Some notes on competition among cell assemblies,<br />

Int'l J. Neural Syst. 2, 303-313<br />

Bard, Y. (1968), On a numerical instability ofDavidon-like methods, Math. Comp. 22,<br />

665-666<br />

Bard, Y. (1970), Comparison of gradient methods for the solution of nonlinear parameter<br />

estimation problems, SIAM J. Numer. Anal. 7, 157-186<br />

Barnes, G.H., R.M. Brown, M. Kato, D.J. Kuck, D.L. Slotnick, R.A. Stokes (1968), The<br />

Illiac IV computer, IEEE Trans. C-17, 746-770<br />

Barnes, J.G.P. (1965), An algorithm for solving non-linear equations based on the secant<br />

method, Comp. J. 8, 66-72<br />

Barnes, J.L. (1965), Adaptive control as the basis of life <strong>and</strong> learning systems, Proceedings<br />

of the IFAC Tokyo Symposium on Systems Engineering Control <strong>and</strong> Systems<br />

Design, Tokyo, Japan, Aug. 1965, pp. 187-191<br />

Barr, D.R., N.L. Slezak (1972), A comparison of multivariate normal generators, CACM<br />

15, 1048-1049<br />

Bass, R. (1972), A rank two algorithm for unconstrained minimization, Math. Comp.<br />

26, 129-143<br />

Bauer, F.L. (1965), Elimination with weighted row combinations for solving linear equations<br />

<strong>and</strong> least squares problems, Numer. Math. 7, 338-352<br />

Bauer, W.F. (1958), The Monte Carlo method, SIAM J. 6, 438-451<br />

Beale, E.M.L. (1956), On quadratic programming, Nav. Res. Log. Quart. 6, 227-243<br />

Beale, E.M.L. (1958), On an iterative method for nding a local minimum of a function<br />

of more than one variable, Princeton University, Statistical Techniques Research<br />

Group, technical report 25, Princeton NJ, Nov. 1958<br />

Beale, E.M.L. (1967), Numerical methods, in: Abadie (1967), pp. 133-205<br />

Beale, E.M.L. (1970), Computational methods for least squares, in: Abadie (1970), pp.<br />

213-228


254 References<br />

Beale, E.M.L. (1972), A derivation of conjugate gradients, in: Lootsma (1972a), pp.<br />

39-43<br />

Beamer, J.H., D.J. Wilde (1969), An upper bound on the number of measurements<br />

required by the contour tangent optimization technique, IEEE Trans. SSC-5, 27-<br />

30<br />

Beamer, J.H., D.J. Wilde (1970), Minimax optimization of unimodal functions by variable<br />

block search, Mgmt. Sci. 16, 529-541<br />

Beamer, J.H., D.J. Wilde (1973), A minimax search plan for constrained optimization<br />

problems, JOTA 12, 439-446<br />

Beckman, F.S. (1967), Die Losung linearer Gleichungssysteme nach der Methode der<br />

konjugierten Gradienten, in: Ralston <strong>and</strong> Wilf (1967), pp. 106-126<br />

Beckmann, M. (Ed.) (1971), Unternehmensforschung heute, Springer, Berlin<br />

Beier, W., K. Gla (1968), Bionik|eine Wissenschaft der Zukunft, Urania, Leipzig,<br />

Germany<br />

Bekey, G.A., M.H. Gran, A.E. Sabro , A. Wong (1966), Parameter optimization by<br />

r<strong>and</strong>om search using hybrid computer techniques, AFIPS Conf. Proc. 29, 191-200<br />

Bekey, G.A., W.J. Karplus (1971), Hybrid-Systeme, Berliner Union und Kohlhammer,<br />

Stuttgart<br />

Bekey, G.A., R.B. McGhee (1964), Gradient methods for the optimization of dynamic<br />

system parameters by hybrid computation, in: Balakrishnan <strong>and</strong> Neustadt (1964),<br />

pp. 305-327<br />

Belew, R.K., L.B. Booker (Eds.) (1991), Proceedings of the 4th International Conference<br />

on Genetic Algorithms, University of California, San Diego CA, July 13-16, 1991,<br />

Morgan Kaufmann, San Mateo CA<br />

Bell, D.E., R.E. Keeney, H. Rai a (Eds.) (1977), Con icting objectives in decisions,<br />

vol. 1 of Wiley IIASA International Series on Applied Systems Analysis, Wiley,<br />

Chichester<br />

Bell, M., M.C. Pike (1966), Remark on algorithm 178 (E4)|direct search, CACM 9,<br />

684-685<br />

Bellman, R.E. (1967), Dynamische Programmierung und selbstanpassende Regelprozesse,<br />

Oldenbourg, Munich<br />

Beltrami, E.J., J.P. Indusi (1972), An adaptive r<strong>and</strong>om search algorithm for constrained<br />

minimization, IEEE Trans. C-21, 1004-1008


References 255<br />

Bendin, F. (1992), Ein Praktikum zu Verfahren zur Losung zentraler und dezentraler Optimierungsprobleme<br />

und Untersuchungen hierarchisch zerlegter Optimierungsaufgaben<br />

mit Hilfe von Parallelrechnern, Dr.-Ing. Diss., Technical University of Ilmenau,<br />

Germany, Faculty ofTechnical Sciences, Sept. 1992<br />

Berg, R.L., N.W. Timofejew-Ressowski (1964), Uber Wege der <strong>Evolution</strong> des Genotyps,<br />

in: Ljapunov, Kammerer, <strong>and</strong> Thiele (1964b), pp. 201-221<br />

Bergmann, H.W. (1989), Optimization|methods <strong>and</strong> applications, possibilities <strong>and</strong> limitations,<br />

vol. 47 of Lecture Notes in Engineering, Springer, Berlin<br />

Berlin, V.G. (1969), Acceleration of stochastic approximations by a mixed search method,<br />

ARC 30, 125-129<br />

Berlin, V.G. (1972), Parallel r<strong>and</strong>omized search strategies, ARC 33, 398-403<br />

Berman, G. (1966), Minimization by successive approximation, SIAM J. Numer. Anal.<br />

3, 123-133<br />

Berman, G. (1969), Lattice approximations to the minima of functions of several variables,<br />

JACM 16, 286-294<br />

Bernard, J.W., F.J. Sonderquist (1959), Progress report on OPCON|Dow evaluates<br />

optimizing control, Contr. Engng. 6(11), 124-128<br />

Bertram, J.E. (1960), Control by stochastic adjustment, AIEE Trans. II Appl. Ind. 78,<br />

485-491<br />

Beveridge, G.S.G., R.S. Schechter (1970), Optimization|theory <strong>and</strong> practice, McGraw-<br />

Hill, New York<br />

Beyer, H.-G. (1989), Ein <strong>Evolution</strong>sverfahren zur mathematischen Modellierung stationarer<br />

Zust<strong>and</strong>e in dynamischen Systemen, Dr. rer. nat. Diss., University of<br />

Architecture <strong>and</strong> Civil Engineering, Weimar, Germany, June 1989<br />

Beyer, H.-G. (1990), Simulation of steady states in dissipative systems by Darwin's<br />

paradigm of evolution, J. of Non-Equilibrium Thermodynamics 15, 45-58<br />

Beyer, H.-G. (1992), Some aspects of the `evolution strategy' for solving TSP-like optimization<br />

problems, in: Manner <strong>and</strong> M<strong>and</strong>erick (1992), pp. 361-370<br />

Beyer, H.-G. (1993), Toward a theory of evolution strategies|some asymptotical results<br />

from the (1 + ) - theory, <strong>Evolution</strong>ary Computation 1, 165-188<br />

Beyer, H.-G. (1994a), Towards a theory of `evolution strategies'|results for (1 + )strategies<br />

on (nearly) arbitrary tness functions, in: Davidor, Schwefel, <strong>and</strong> Manner<br />

(1994), pp. 58-67


256 References<br />

Beyer, H.-G. (1994b), Towards a theory of `evolution strategies'|results from the Ndependent<br />

( ) <strong>and</strong> the multi-recombinant ( = )theory, technical report SYS-<br />

5/94, Systems Analysis Research Group, University of Dortmund, Department of<br />

Computer Science, Oct. 1994<br />

Biggs, M.C. (1971), Minimization algorithms making use of non-quadratic properties of<br />

the objective function, JIMA 8, 315-327 (errata in 9 (1972))<br />

Biggs, M.C. (1973), A note on minimization algorithms which make use of non-quadratic<br />

properties of the objective function, JIMA 12, 337-338<br />

Birkho , G., S. MacLane (1965), A survey of modern algebra, 3rd ed., Macmillan, New<br />

York<br />

Blakemore, J.W., S.H. Davis, Jr. (Eds.) (1964), Optimization techniques, AIChE Chemical<br />

Engineering Progress Symposium Series 60, no.50<br />

Bledsoe, W.W. (1961), A basic limitation on the speed of digital computers, IRE Trans.<br />

EC-10, 530<br />

Blum, J.R. (1954a), Approximation methods which converge with probability one, Ann.<br />

Math. Stat. 25, 382-386<br />

Blum, J.R. (1954b), Multidimensional stochastic approximation methods, Ann. Math.<br />

Stat. 25, 737-744<br />

Boas, A.H. (1962), What optimization is all about, Chem. Engng. 69(25), 147-152<br />

Boas, A.H. (1963a), How to use Lagrange multipliers, Chem. Engng. 70(1), 95-98<br />

Boas, A.H. (1963b), How search methods locate optimum in univariable problems, Chem.<br />

Engng. 70(3), 105-108<br />

Boas, A.H. (1963c), Optimizing multivariable functions, Chem. Engng. 70(5), 97-104<br />

Boas, A.H. (1963d), Optimization via linear <strong>and</strong> dynamic programming, Chem. Engng.<br />

70(7), 85-88<br />

Bocharov, I.N., A.A. Feldbaum (1962), An automatic optimizer for the search for the<br />

smallest of several minima|a global optimizer, ARC 23, 260-270<br />

Bohling, K.H., P.P. Spies (Eds.) (1979), 9th GI-Jahrestagung, Bonn, Oct. 1979,<br />

Springer, Berlin<br />

Boltjanski, W.G. (1972), Mathematische Methoden der optimalen Steuerung, Hanser,<br />

Munich<br />

Booth, A.D. (1949), An application of the method of steepest descents to the solution<br />

of systems of non-linear simultaneous equations, Quart. J. Mech. Appl. Math. 2,<br />

460-468


References 257<br />

Booth, A.D. (1955), Numerical methods, Butterworths, London<br />

Booth, R.S. (1967), Location of zeros of derivatives, SIAM J. Appl. Math. 15, 1496-1501<br />

Boothroyd, J. (1965), Certi cation of algorithm 2|Fibonacci search, Comp. Bull. 9,<br />

105, 108<br />

Born, J. (1978), <strong>Evolution</strong>sstrategien zur numerischen Losung von Adaptationsaufgaben,<br />

Dr. rer. nat. Diss., Humboldt University at Berlin<br />

Box, G.E.P. (1957), <strong>Evolution</strong>ary operation|a method for increasing industrial productivity,<br />

Appl. Stat. 6, 81-101<br />

Box, G.E.P., D.W. Behnken (1960), Simplex-sum designs|a class of second order rotatable<br />

designs derivable from those of rst order, Ann. Math. Stat. 31, 838-864<br />

Box, G.E.P., N.R. Draper (1969), <strong>Evolution</strong>ary operation|a statistical method for process<br />

improvement, Wiley, New York<br />

Box, G.E.P., N.R. Draper (1987), Empirical model-building <strong>and</strong> response surfaces, Wiley,<br />

New York<br />

Box, G.E.P., J.S. Hunter (1957), Multi-factor experimental designs for exploring response<br />

surfaces, Ann. Math. Stat. 28, 195-241<br />

Box, G.E.P., M.E. Muller (1958), A note on the generation of r<strong>and</strong>om normal deviates,<br />

Ann. Math. Stat. 29, 610-611<br />

Box, G.E.P., K.B. Wilson (1951), On the experimental attainment ofoptimum conditions,<br />

J. of the Royal Statistical Society B, Methodological 8, 1-45<br />

Box, M.J. (1965), A new method of constrained optimization <strong>and</strong> a comparison with<br />

other methods, Comp. J. 8, 42-52<br />

Box, M.J. (1966), A comparison of several current optimization methods <strong>and</strong> the use of<br />

transformations in constrained problems, Comp. J. 9, 67-77<br />

Box, M.J., D. Davies, W.H. Swann (1969), Nonlinear optimization techniques, ICI Monograph<br />

5, Oliver Boyd, Edinburgh<br />

Bracken, J., G.P. McCormick (1970), Ausgewahlte Anwendungen nicht-linearer Programmierung,<br />

Berliner Union und Kohlhammer, Stuttgart<br />

Brajnes, S.N., V.B. Svecinskij (1971), Probleme der Neurokybernetik und Neurobionik,<br />

2nd ed., G. Fischer, Stuttgart<br />

Br<strong>and</strong>l, V. (1969), Ein wirksames Monte-Carlo-Schatzverfahren zur simultanen Beh<strong>and</strong>lung<br />

hinreichend eng verw<strong>and</strong>ter Probleme angew<strong>and</strong>t auf Fragen der Neutronenphysik,<br />

Tagungsbericht der Reaktortagung des Deutschen Atomforums, Frankfort/<br />

Main, April 1969, Sektion 1, pp. 6-7


258 References<br />

Branin, F.H., Jr., S.K. Hoo (1972), A method for nding multiple extrema of a function<br />

of n variables, in: Lootsma (1972a), pp. 231-237<br />

Brazdil, P.B. (Ed.) (1993), Machine learning|ECML '93, vol. 667 of Lecture Notes in<br />

Arti cial Intelligence, Springer, Berlin<br />

Brebbia, C.A., S. Hern<strong>and</strong>ez (Eds.) (1989), Computer aided optimum design of structures|applications,<br />

Proceedings of the 1st International Conference, Southampton<br />

UK, June 1989, Springer, Berlin<br />

Bremermann, H.J. (1962), Optimization through evolution <strong>and</strong> recombination, in: Yovits,<br />

Jacobi, <strong>and</strong> Goldstein (1962), pp. 93-106<br />

Bremermann, H.J. (1963), Limits of genetic control, IEEE Trans. MIL-7, 200-205<br />

Bremermann, H.J. (1967), Quantitative aspects of goal-seeking self-organizing systems,<br />

in: Snell (1967), pp. 59-77<br />

Bremermann, H.J. (1968a), Numerical optimization procedures derived from biological<br />

evolution processes, in: Oestreicher <strong>and</strong> Moore (1968), pp. 597-616<br />

Bremermann, H.J. (1968b), Principles of natural <strong>and</strong> arti cial intelligence, AGARD<br />

report AD-684-952, Sept. 1968, pp. 6c1-6c2<br />

Bremermann, H.J. (1968c), Pattern recognition, functionals, <strong>and</strong> entropy, IEEETrans.<br />

BME-15, 201-207<br />

Bremermann, H.J. (1970), A method of unconstrained global optimization, Math. Biosci.<br />

9, 1-15<br />

Bremermann, H.J. (1971), What mathematics can <strong>and</strong> cannot do for pattern recognition,<br />

in: Grusser <strong>and</strong> Klinke (1971), pp. 31-45<br />

Bremermann, H.J. (1973a), On the dynamics <strong>and</strong> trajectories of evolution processes, in:<br />

Locker (1973), pp. 29-37<br />

Bremermann, H.J. (1973b), Algorithms <strong>and</strong> complexityofevolution <strong>and</strong> self-organization,<br />

Kybernetik-Kongre der Deutschen Gesellschaft fur Kybernetik und der Nachrichtentechnischen<br />

Gesellschaft im VDE, Nuremberg, Germany, March1973<br />

Bremermann, H.J., L.S.-B. Lam (1970), Analysis of spectra with non-linear superposition,<br />

Math. Biosci. 8, 449-460<br />

Bremermann, H.J., M. Rogson, S. Sala (1965), Search byevolution, in: Max eld,<br />

Callahan, <strong>and</strong> Fogel (1965), pp. 157-167<br />

Bremermann, H.J., M. Rogson, S. Sala (1966), Global properties of evolution processes,<br />

in: Pattee et al. (1966), pp. 3-41


References 259<br />

Brent, R.P. (1971), An algorithm with guaranteed convergence for nding a zero of a<br />

function, Comp. J. 14, 422-425<br />

Brent, R.P. (1973), Algorithms for minimization without derivatives, Prentice-Hall, Englewood<br />

Cli s NJ<br />

Bromberg, N.S. (1962), Maximization <strong>and</strong> minimization of complicated multivariable<br />

functions, AIEE Trans. I Comm. Electron. 80, 725-730<br />

Brooks, S.H. (1958), A discussion of r<strong>and</strong>om methods for seeking maxima, Oper. Res.<br />

6, 244-251<br />

Brooks, S.H. (1959), A comparison of maximum-seeking methods, Oper. Res. 7, 430-457<br />

Brooks, S.H., M.R. Mickey (1961), <strong>Optimum</strong> estimation of gradient direction in steepest<br />

ascent experiments, Biometrics 17, 48-56<br />

Brown, K.M. (1969), A quadratically convergent Newton-like method based upon Gaussian<br />

elimination, SIAM J. Numer. Anal. 6, 560-569<br />

Brown, K.M., J.E. Dennis, Jr. (1968), On Newton-like iteration functions|general<br />

convergence theorems <strong>and</strong> a speci c algorithm, Numer. Math. 12, 186-191<br />

Brown, K.M., J.E. Dennis, Jr. (1972), Derivative free analogues of the Levenberg-<br />

Marquardt <strong>and</strong> Gauss algorithms for non-linear least squares approximation,<br />

Numer. Math. 18, 289-297<br />

Brown, R.R. (1959), A generalized computer procedure for the design of optimum systems,<br />

AIEE Trans. I Comm. Electron. 78, 285-293<br />

Broyden, C.G. (1965), A class of methods for solving nonlinear simultaneous equations,<br />

Math. Comp. 19, 577-593<br />

Broyden, C.G. (1967), Quasi-Newton methods <strong>and</strong> their application to function minimisation,<br />

Math. Comp. 21, 368-381<br />

Broyden, C.G. (1969), A new method of solving nonlinear simultaneous equations, Comp.<br />

J. 12, 94-99<br />

Broyden, C.G. (1970a), The convergence of single-rank quasi-Newton methods, Math.<br />

Comp. 24, 365-382<br />

Broyden, C.G. (1970b), The convergence of a class of double-rank minimization algorithms,<br />

part 1|general considerations, JIMA 6, 76-90<br />

Broyden, C.G. (1970c), The convergence of a class of double-rank minimization algorithms,<br />

part 2|the new algorithm, JIMA 6, 222-231<br />

Broyden, C.G. (1971), The convergence of an algorithm for solving sparse non-linear<br />

systems, Math. Comp. 25, 285-294


260 References<br />

Broyden, C.G. (1972), Quasi-Newton methods, in: Murray (1972a), pp. 87-106<br />

Broyden, C.G. (1973), Some condition-number bounds for the Gaussian elimination process,<br />

JIMA 12, 273-286<br />

Broyden, C.G., J.E. Dennis, Jr., J.J. More (1973), On the local <strong>and</strong> superlinear convergence<br />

of quasi-Newton methods, JIMA 12, 223-245<br />

Broyden, C.G., M.P. Johnson (1972), A class of rank-l optimization algorithms, in:<br />

Lootsma (1972a), pp. 35-38<br />

Brudermann, U. (1977), Entwicklung und Anpassung eines vollst<strong>and</strong>igen Ansteuersystems<br />

fur fremdenergetisch angetriebene Ganzarmprothesen, Fortschrittberichte der<br />

VDI-Zeitschriften, vol. 17 (Biotechnik), no. 6, Dec. 1977<br />

Bryson, A.E., Jr., Y.C. Ho (1969), Applied optimal control, Blaisdell, Waltham MA<br />

Budne, T.A. (1959), The application of r<strong>and</strong>om balance designs, Technometrics 1, 139-<br />

155<br />

Buehler, R.J., B.V. Shah, O. Kempthorne (1961), Some properties of steepest ascent <strong>and</strong><br />

related procedures for nding optimum conditions, Iowa State University, Statistical<br />

Laboratory, technical report 1, Ames IA, April 1961<br />

Buehler, R.J., B.V. Shah, O. Kempthorne (1964), Methods of parallel tangents, in:<br />

Blakemore <strong>and</strong> Davis (1964), pp. 1-7<br />

Burkard, R.E. (1972), Methoden der Ganzzahligen Optimierung, Springer, Vienna<br />

Campbell, D.T. (1960), Blind variation <strong>and</strong> selective survival as a general strategy in<br />

knowledge-processes, in: Yovits <strong>and</strong> Cameron (1960), pp. 205-231<br />

Campos Pinto, I. (1989), Wissensbasierte Unterstutzung bei der Losung von Optimierungsaufgaben,<br />

Dr. rer. nat. Diss., University of Dortmund, Department of<br />

Computer Science, June 1989<br />

Campos, I., E. Peters, H.-P. Schwefel (1989), Zwei Beitrage zum wissensbasierten Einsatz<br />

von <strong>Optimum</strong>suchverfahren, technical report 311 (green series), University of<br />

Dortmund, Department of Computer Science<br />

Campos, I., H.-P. Schwefel (1989), KBOPT|a knowledge based optimisation system,<br />

in: Brebbia <strong>and</strong> Hern<strong>and</strong>ez (1989), pp. 211-221<br />

Canon, M.D., C.D. Cullum, Jr., E. Polak (1970), Theory of optimal control <strong>and</strong> mathematical<br />

programming, McGraw-Hill, New York<br />

Cantrell, J.W. (1969), Relation between the memory gradient method <strong>and</strong> the Fletcher-<br />

Reeves method, JOTA 4, 67-71


References 261<br />

Carroll, C.W. (1961), The created response surface technique for optimizing nonlinear,<br />

restrained systems, Oper. Res. 9, 169-185<br />

Casey, J.K., R.C. Rustay (1966), AID|a general purpose computer program for optimization,<br />

in: Lavi <strong>and</strong> Vogl (1966), pp. 81-100<br />

Casti, J., M. Richardson, R. Larson (1973), Dynamic programming <strong>and</strong> parallel computers,<br />

JOTA 12, 423-438<br />

Cauchy, A. (1847), Methode generale pour la resolution des systemes d'equations simultanees,<br />

Compt. Rend. Acad. Sci. URSS (USSR), New Ser. 25, 536-538<br />

Cea, J. (1971), Optimisation|theorie et algorithmes, Dunod, Paris<br />

Cerny, V. (1985), Thermodynamical approach to the traveling salesman problem|an<br />

e cient simulation algorithm, JOTA 45, 41-51<br />

Cembrowicz, R.G., G.E. Krauter (1977), Optimization of urban <strong>and</strong> regional water supply<br />

systems, Proceedings of the Conference on Systems Approach for Development,<br />

Cairo, Egypt, Nov. 1977<br />

Chang, S.S.L. (1961), Synthesis of optimum control systems, McGraw-Hill, New York<br />

Chang, S.S.L. (1968), Stochastic peak tracking <strong>and</strong> the Kalman lter, IEEE Trans. AC-<br />

13, 750<br />

Chatterji, B.N., B. Chatterjee (1971), Performance optimization of a self-organizing<br />

feedback control system in presence of inherent coupling signals, Automatica 7,<br />

599-605<br />

Chazan, D., W.L. Miranker (1970), A nongradient <strong>and</strong> parallel algorithm for unconstrained<br />

minimization, SIAM J. Contr. 8, 207-217<br />

Checkl<strong>and</strong>, P., I. Kiss (Eds.) (1987), Problems of constancy <strong>and</strong> change|the complementarity<br />

of systems approaches to complexity, papers presented at the 31st Annual<br />

Meeting of the International Society for General System Research, Budapest, Hungary,<br />

June 1-5, International Society for General System Research<br />

Cheng, W.-M. (Ed.) (1988), Proceedings of the International Conference on Systems<br />

Science <strong>and</strong> Engineering (ICSSE '88), Beijing, July 25-28, 1988, International Academic<br />

Publishers/Pergamon Press, Oxford UK<br />

Chichinadze, V.K. (1960), Logical design problems of self-optimizing <strong>and</strong> learning-optimizing<br />

control systems based on r<strong>and</strong>om searching, Proceedings of the 1st IFAC<br />

Congress, Moscow, June-July 1960, vol. 2, pp. 653-657<br />

Chichinadze, V.K. (1967), R<strong>and</strong>om search to determine the extremum of the functions<br />

of several variables, Engng. Cybern. 5(1), 115-123


262 References<br />

Chichinadze, V.K. (1969), The Psi-transform for solving linear <strong>and</strong> non-linear programming<br />

problems, Automatica 5, 347-356<br />

Cizek, F., D. Hodanova (1971), <strong>Evolution</strong> als Selbstregulation, G. Fischer, Jena, Germany<br />

Clayton, D.G. (1971), Algorithm AS-46|Gram-Schmidt orthogonalization, Appl. Stat.<br />

20, 335-338<br />

Clegg, J.C. (1970), Variationsrechnung, Teubner, Stuttgart<br />

Cochran, W.G., G.M. Cox (1950), Experimental designs, Wiley, NewYork<br />

Cockrell, L.D. (1969), A comparison of several r<strong>and</strong>om search techniques for multimodal<br />

surfaces, Proceedings of the National Electronics Conference, Chicago IL, Dec. 1969,<br />

pp. 18-23<br />

Cockrell, L.D. (1970), On search techniques in adaptive systems, Ph.D. thesis, Purdue<br />

University, Lafayette IN, June 1970<br />

Cohen, A.I. (1972), Rate of convergence of several conjugate gradient algorithms, SIAM<br />

J. Numer. Anal. 9, 248-259<br />

Cohn, D.L. (1954), Optimal systems I|the vascular system, Bull. Math. Biophys. 16,<br />

59-74<br />

Collatz, L., W. Wetterling (1971), Optimierungsaufgaben, 2nd ed., Springer, Berlin<br />

Colville, A.R. (1968), A comparative study on nonlinear programming codes, IBM New<br />

York Science Center, report 320-2949, June 1968<br />

Colville, A.R. (1970), A comparative study of nonlinear programming codes, in: Kuhn<br />

(1970), pp. 487-501<br />

Conrad, M. (1988), Prolegomena to evolutionary programming, in: Kochen <strong>and</strong> Hastings<br />

(1988), pp. 150-168<br />

Converse, A.O. (1970), Optimization, Holt, Rinehart, Winston, New York<br />

Cooper, L. (Ed.) (1962), Applied mathematics in chemical engineering, AIChE Engineering<br />

Progress Symposium Series 58, no. 37<br />

Cooper, L., D. Steinberg (1970), Introduction to methods of optimization, W.B. Saunders,<br />

Philadelphia<br />

Cornick, D.E., A.N. Michel (1972), Numerical optimization of distributed parameter<br />

systems by the conjugate gradient method, IEEE Trans. AC-17, 358-362<br />

Courant, R. (1943), Variational methods for the solution of problems of equilibrium <strong>and</strong><br />

vibrations, Bull. Amer. Math. Soc. 49, 1-23


References 263<br />

Courant, R., D. Hilbert (1968a), Methoden der mathematischen Physik, 3rd ed., vol. 1,<br />

Springer, Berlin<br />

Courant, R., D. Hilbert (1968b), Methoden der mathematischen Physik, 2nd ed., vol. 2,<br />

Springer, Berlin<br />

Cowdrey, D.R., C.M. Reeves (1963), An application of the Monte Carlo method to the<br />

evaluation of some molecular integrals, Comp. J. 6, 277-286<br />

Cox, D.R. (1958), Planning of experiments, Wiley, New York<br />

Cragg, E.E., A.V. Levy (1969), Study on a supermemory gradient method for the minimization<br />

of functions, JOTA 4, 191-205<br />

Crippen, G.M., H.A. Scheraga (1971), Minimization of polypeptide energy, X|a global<br />

search algorithm, Arch. Biochem. Biophys. 144, 453-461<br />

Crockett, J.B., H. Cherno (1955), Gradient methods of maximization, Pacif. J. Math.<br />

5, 33-50<br />

Crowder, H., P. Wolfe (1972), Linear convergence to the conjugate gradient method,<br />

IBM T.J. Watson Research Center, report RC-3330, Yorktown Heights NY, May<br />

1972<br />

Cryer, C.W. (1971), The solution of a quadratic programming problem using systematic<br />

overrelaxation, SIAM J. Contr. 9, 385-392<br />

Cullum, J. (1972), An algorithm for minimizing a di erentiable function that uses only<br />

function values, in: Balakrishnan (1972), pp. 117-127<br />

Curry, H.B. (1944), The method of steepest descent for non-linear minimization problems,<br />

Quart. Appl. Math. 2, 258-261<br />

Curtis, A.R., J.K. Reid (1974), The choice of step lengths when using di erences to<br />

approximate Jacobian matrices, JIMA 13, 121-126<br />

Curtiss, J.H. (1956), A theoretical comparison of the e ciencies of two classical methods<br />

<strong>and</strong> a Monte Carlo method for computing one component of the solution of a set of<br />

linear algebraic equations, in: Meyer (1956), pp. 191-233<br />

Dambrauskas, A.P. (1970), The simplex optimization method with variable step, Engng.<br />

Cybern. 8, 28-36<br />

Dambrauskas, A.P. (1972), Investigation of the e ciency of the simplex method of optimization<br />

with variable step in a noise situation, Engng. Cybern. 10, 590-599<br />

Daniel, J.W. (1967a), The conjugate gradient method for linear <strong>and</strong> nonlinear operator<br />

equations, SIAM J. Numer. Anal. 4, 10-26


264 References<br />

Daniel, J.W. (1967b), Convergence of the conjugate gradient method with computationally<br />

convenient modi cations, Numer. Math. 10, 125-131<br />

Daniel, J.W. (1969), On the approximate minimization of functionals, Math. Comp. 23,<br />

573-581<br />

Daniel, J.W. (1970), A correction concerning the convergence rate for the conjugate<br />

gradient method, SIAM J. Numer. Anal. 7, 277-280<br />

Daniel, J.W. (1971), The approximate minimization of functionals, Prentice-Hall, Englewood<br />

Cli s NJ<br />

Daniel, J.W. (1973), Global convergence for Newton methods in mathematical programming,<br />

JOTA 12, 233-241<br />

Dantzig, G.B. (1966), Lineare Programmierung und Erweiterungen, Springer, Berlin<br />

Darwin, C. (1859), Die Entstehung der Arten durch naturliche Zuchtwahl, translation<br />

from \The origin of species by means of natural selection", Reclam, Stuttgart, 1974<br />

Darwin, C. (1874), Die Abstammung des Menschen, translation of the 2nd rev. ed. of<br />

\The descent of man", Kroner, Stuttgart, 1966<br />

Davidon, W.C. (1959), Variable metric method for minimization, Argonne National<br />

Laboratory, report ANL-5990 rev., Lemont IL,Nov. 1959<br />

Davidon, W.C. (1968), Variance algorithm for minimization, Comp. J. 10, 406-410<br />

Davidon, W.C. (1969), Variance algorithm for minimization, in: Fletcher (1969a), pp.<br />

13-20<br />

Davidor, Y. (1990), Genetic algorithms <strong>and</strong> robotics, a heuristic strategy for optimization,<br />

World Scienti c, Singapore<br />

Davidor, Y., H.-P.Schwefel (1992), An introduction to adaptive optimization algorithms<br />

based on principles of natural evolution, in: Soucek (1992), pp. 183-202<br />

Davidor, Y., H.-P. Schwefel, R. Manner (Eds.) (1994), Parallel problem solving from<br />

nature 3, Proceedings of the 3rd PPSN Conference, Jerusalem, Oct. 9-14, 1994, vol.<br />

866 of Lecture Notes in Computer Science, Springer, Berlin<br />

Davies, D. (1968), The use of Davidon's method in nonlinear programming, ICI Management<br />

Service report MSDH-68-110, Middlesborough, Yorks, Aug. 1968<br />

Davies, D. (1970), Some practical methods of optimization, in: Abadie (1970), pp. 87-<br />

118<br />

Davies, D., W.H. Swann (1969), Review of constrained optimization, in: Fletcher (1969a),<br />

pp. 187-202


References 265<br />

Davies, M., I.J. Whitting (1972), A modi ed form of Levenberg's correction, in: Lootsma<br />

(1972a), pp. 191-201<br />

Davies, O.L. (Ed.) (1954), The design <strong>and</strong> analysis of industrial experiments, Oliver<br />

Boyd, London<br />

Davis, L. (Ed.) (1987), Genetic algorithms <strong>and</strong> simulated annealing, Pitman, London,<br />

1987<br />

Davis, L. (Ed.) (1991), H<strong>and</strong>book of genetic algorithms, Van Nostr<strong>and</strong> Reinhold, New<br />

York<br />

Davis, R.H., P.D. Roberts (1968), Method of conjugate gradients applied to self-adaptive<br />

digital control systems, IEE Proceedings 115, 562-571<br />

DeGraag, D.P. (1970), Parameter optimization techniques for hybrid computers, Proceedings<br />

of the VIth International Analogue Computation Meeting, Munich, Aug.-<br />

Sept. 1970, pp. 136-139<br />

Dejon, B., P. Henrici (Eds.) (1969), Constructive aspects of the fundamental theorem<br />

of algebra, Wiley-Interscience, London<br />

De Jong, K. (1975), An analysis of the behavior of a class of genetic adaptive systems,<br />

Ph.D. thesis, University ofMichigan, Ann Arbor MI<br />

De Jong, K. (Ed.) (1993), <strong>Evolution</strong>ary computation (journal), MIT Press, Cambridge<br />

MA<br />

De Jong, K., W. Spears (1993), On the state of evolutionary computation, in: Forrest<br />

(1993), pp. 618-623<br />

Dekker, L., G. Savastano, G.C. Vansteenkiste (Eds.) (1980), Simulation of systems '79,<br />

Proceedings of the 9th IMACS Congress, Sorento, Italy, North-Holl<strong>and</strong>, Amsterdam<br />

Dekker, T.J. (1969), Finding a zero by means of successive linear interpolation, in: Dejon<br />

<strong>and</strong> Henrici (1969), pp. 37-48<br />

Demyanov, V.F., A.M. Rubinov (1970), Approximate methods in optimization problems,<br />

Elsevier, New York<br />

Denn, M.M. (1969), Optimization by variational methods, McGraw-Hill, New York<br />

Dennis, J.E., Jr. (1970), On the convergence of Newton-like methods, in: Rabinowitz<br />

(1970), pp. 163-181<br />

Dennis, J.E., Jr. (1971), On the convergence of Broyden's method for nonlinear systems<br />

of equations, Math. Comp. 25, 559-567<br />

Dennis, J.E., Jr. (1972), On some methods based on Broyden's secant approximation to<br />

the Hessian, in: Lootsma (1972a), pp. 19-34


266 References<br />

D'Esopo, D.A. (1956), A convex programming procedure, Nav. Res. Log. Quart. 6,<br />

33-42<br />

DeVogelaere, R. (1968), Remark on algorithm 178 (E4)|direct search, CACM 11, 498<br />

Dickinson, A.W. (1964), Nonlinear optimization|some procedures <strong>and</strong> examples, Proceedings<br />

of the XIXth ACM National Conference, Philadelphia, Aug. 1964, paper<br />

E1.2<br />

Dijkhuis, B. (1971), An adaptive algorithm for minimizing a unimodal function of one<br />

variable, ZAMM 51(Sonderheft), T45-T46<br />

Dinkelbach, W. (1969), Sensitivitatsanalysen und parametrische Programmierung, Springer,<br />

Berlin<br />

Dixon, L.C.W. (1972a), Nonlinear optimization, English University Press, London<br />

Dixon, L.C.W. (1972b), The choice of step length, a crucial factor in the performance of<br />

variable metric algorithms, in: Lootsma (1972a), pp. 149-170<br />

Dixon, L.C.W. (1972c), Variable metric algorithms|necessary <strong>and</strong> su cient conditions<br />

for identical behavior of nonquadratic functions, JOTA 10, 34-40<br />

Dixon, L.C.W. (1973), Conjugate directions without linear searches, JOTA 11, 317-328<br />

Dixon, L.C.W., M.C. Biggs (1972), The advantages of adjoint-control transformations<br />

when determining optimal trajectories by Pontryagin's Maximum Principle, Aeronautical<br />

J. 76, 169-174<br />

Dobzhansky, T. (1965), Dynamik der menschlichen <strong>Evolution</strong>|Gene und Umwelt, S.<br />

Fischer, Frankfort/Main<br />

Dowell, M., P. Jarratt (1972), The Pegasus method for computing the root of an equation,<br />

BIT 12, 503-508<br />

Drenick, R.F. (1967), Die Optimierung linearer Regelsysteme, Oldenbourg, Munich<br />

Drepper, F.R., R. Heckler, H.-P. Schwefel (1979), Ein integriertes System von Schatzverfahren,<br />

Simulations- und Optimierungstechnik zur rechnergestutzten Langfristplanung,<br />

in: Bohling <strong>and</strong> Spies (1979), pp. 115-129<br />

Dueck, G. (1993), New optimization heuristics, the great deluge algorithm <strong>and</strong> the<br />

record-to-record-travel, J. Computational Physics 104, 86-92<br />

Dueck, G., T. Scheuer (1990), Threshold accepting|a general purpose optimization<br />

algorithm appearing superior to simulated annealing, J. Computational Physics 90,<br />

161-175<br />

Du n, R.J., E.L. Peterson, C. Zener (1967), Geometric programming|theory <strong>and</strong> application,<br />

Wiley, New York


References 267<br />

Dvoretzky, A. (1956), On stochastic approximation, in: Neyman (1956), pp. 39-56<br />

Ebeling, W. (1992), The optimization of a class of functionals based on developmental<br />

strategies, in: Manner <strong>and</strong> M<strong>and</strong>erick (1992), pp. 463-468<br />

Edelbaum, T.N. (1962), Theory of maxima <strong>and</strong> minima, in: Leitmann (1962), pp. 1-32<br />

Edelman, G.B. (1987), Neural Darwinism|the theory of group selection, Basic Books,<br />

New York<br />

Eigen, M. (1971), Self-organization of matter <strong>and</strong> the evolution of biological macromolecules,<br />

Naturwissenschaften 58, 465-523<br />

Eisenberg, M.A., M.R. McGuire (1972), Further comments on Dijkstra's concurrent<br />

programming control problem, CACM 15, 999<br />

Eisenhart, C., M.W. Hastay, W.A. Wallis (Eds.) (1947), Selected techniques of statistical<br />

analysis for scienti c <strong>and</strong> industrial research <strong>and</strong> production <strong>and</strong> management<br />

engineering, McGraw-Hill, New York<br />

Elkin, R.M. (1968), Convergence theorems for Gauss-Seidel <strong>and</strong> other minimization algorithms,<br />

University of Maryl<strong>and</strong>, Computer Science Center, technical report 68-59,<br />

College Park MD, Jan. 1968<br />

Elliott, D.F., D.D. Sworder (1969a), A variable metric technique for parameter optimization,<br />

Automatica 5, 811-816<br />

Elliott, D.F., D.D. Sworder (1969b), Design of suboptimal adaptive regulator systems<br />

via stochastic approximation, Proceedings of the National Electronics Conference,<br />

Chicago IL, Dec. 1969, pp. 29-33<br />

Elliott, D.F., D.D. Sworder (1970), Applications of a simpli ed multidimensional stochastic<br />

approximation algorithm, IEEE Trans. AC-15, 101-104<br />

Elliott, D.G. (Ed.) (1970), Proceedings of the 11th Symposium on Engineering Aspects<br />

of Magnetohydrodynamics, Caltech, March 24-26, 1970, California Institute<br />

of Technology, Pasadena CA<br />

Emery, F.E., M. O'Hagan (1966), Optimal design of matching networks for microwave<br />

transistor ampli ers, IEEE Trans. MTT-14, 696-698<br />

Engelhardt, M. (1973), On upper bounds for variances in stochastic approximation,<br />

SIAM J. Appl. Math. 24, 145-151<br />

Engeli, M., T. Ginsburg, H. Rutishauser, E. Stiefel (1959), Re ned iterative methods<br />

for computation of the solution <strong>and</strong> the eigen-values of self-adjoint boundary value<br />

problems, Mitteilungen des Instituts fur Angew<strong>and</strong>te Mathematik, Technical University<br />

(ETH) of Zurich, Switzerl<strong>and</strong>, Birkhauser, Basle, Switzerl<strong>and</strong>


268 References<br />

Erlicki, M.S., J. Appelbaum (1970), Solution of practical optimization problems, IEEE<br />

Trans. SSC-6, 49-52<br />

Ermakov, S. (Ed.) (1992), Int'l J. on Stochastic Optimization <strong>and</strong> Design, Nova Science,<br />

New York<br />

Ermoliev, Yu. (1970), R<strong>and</strong>om optimization <strong>and</strong> stochastic programming, in: Moiseev<br />

(1970), pp. 104-115<br />

Ermoliev, Yu., R.J.-B. Wets (1988), Numerical techniques for stochastic optimization,<br />

Springer, Berlin<br />

Faber, M.M. (1970), Stochastisches Programmieren, Physica-Verlag, Wurzburg, Germany<br />

Fabian, V. (1967), Stochastic approximation of minimawithimproved asymptotic speed,<br />

Ann. Math. Stat. 38, 191-200<br />

Fabian, V. (1968), On the choice of design in stochastic approximation methods, Ann.<br />

Math. Stat. 39, 457-465<br />

Faddejew, D.K., W.N. Faddejewa (1973), Numerische Methoden der linearen Algebra,<br />

3rd ed., Oldenbourg, Munich<br />

Falkenhausen, K. von (1980), Optimierung regionaler Entsorgungssysteme mit der <strong>Evolution</strong>sstrategie,<br />

Proceedings in Operations Research 9,Physica-Verlag, Wurzburg,<br />

Germany, pp. 46-51<br />

Favreau, R.F., R. Franks (1958), R<strong>and</strong>om optimization by analogue techniques, Proceedings<br />

of the IInd Analogue Computation Meeting, Strasbourg, Sept. 1958, pp.<br />

437-443<br />

Feigenbaum, E.A., J. Feldman (Eds.) (1963), Computers <strong>and</strong> thought, McGraw-Hill,<br />

New York<br />

Feistel, R., W. Ebeling (1989), <strong>Evolution</strong> of complex systems, Kluwer, Dordrecht, The<br />

Netherl<strong>and</strong>s<br />

Feldbaum, A.A. (1958), Automatic optimalizer, ARC 19, 718-728<br />

Feldbaum, A.A. (1960), Statistical theory of gradient systems of automatic optimization<br />

for objects with quadratic characteristics, ARC 21, 111-118<br />

Feldbaum, A.A. (1962), Rechengerate in automatischen Systemen, Oldenbourg, Munich<br />

Fend, F.A., C.B. Ch<strong>and</strong>ler (1961), Numerical optimization for multi-dimensional problems,<br />

General Electric, General Engineering Laboratory, report 61-GL-78, March<br />

1961


References 269<br />

Fiacco, A.V. (1974), Convergence properties of local solutions of sequences of mathematical<br />

programming problems in general spaces, JOTA 13, 1-12<br />

Fiacco, A.V., G.P. McCormick (1964), The sequential unconstrained minimization technique<br />

for nonlinear programming|a primal-dual method, Mgmt. Sci. 10, 360-366<br />

Fiacco, A.V., G.P. McCormick (1968), Nonlinear programming|sequential unconstrained<br />

minimization techniques, Wiley, New York<br />

Fiacco, A.V., G.P. McCormick (1990), Nonlinear programming|sequential unconstrained<br />

minimization techniques, vol. 63 of CBMS-NSF Regional Conference Series<br />

on Applied Mathematics <strong>and</strong> vol. 4 of Classics in Applied Mathematics, SIAM,<br />

Philadelphia<br />

Fielding, K. (1970), Algorithm 387 (E4)|function minimization <strong>and</strong> linear search,<br />

CACM 13, 509-510<br />

Fisher, R.A. (1966), The design of experiments, 8th ed., Oliver Boyd, Edinburgh<br />

Fletcher, R. (1965), Function minimization without evaluating derivatives|a review,<br />

Comp. J. 8, 33-41<br />

Fletcher, R. (1966), Certi cation of algorithm 251 (E4)|function minimization, CACM<br />

9, 686-687<br />

Fletcher, R. (1968), Generalized inverse methods for the best least squares solution of<br />

systems of non-linear equations, Comp. J. 10, 392-399<br />

Fletcher, R. (Ed.) (1969a), Optimization, Academic Press, London<br />

Fletcher, R. (1969b), A review of methods for unconstrained optimization, in: Fletcher<br />

(1969a), pp. 1-12<br />

Fletcher, R. (1970a), A class of methods for nonlinear programming with termination<br />

<strong>and</strong> convergence properties, in: Abadie (1970), pp. 157-176<br />

Fletcher, R. (1970b), A new approach tovariable metric algorithms, Comp. J. 13,<br />

317-322<br />

Fletcher, R. (1971), A modi ed Marquardt subroutine for non-linear least squares,<br />

UKAEA Research Group, report AERE-R-6799, Harwell, Oxon<br />

Fletcher, R. (1972a), Conjugate direction methods, in: Murray (1972a), pp. 73-86<br />

Fletcher, R. (1972b), A survey of algorithms for unconstrained optimization, in: Murray<br />

(1972a), pp. 123-129<br />

Fletcher, R. (1972c), A Fortran subroutine for minimization by the method of conjugate<br />

gradients, UKAEA Research Group, report AERE-R-7073, Harwell, Oxon


270 References<br />

Fletcher, R. (1972d), Fortran subroutines for minimization by quasi-Newton methods,<br />

UKAEA Research Group, report AERE-R-7125, Harwell, Oxon<br />

Fletcher, R., M.J.D. Powell (1963), A rapidly convergent descent method for minimization,<br />

Comp. J. 6, 163-168<br />

Fletcher, R., C.M. Reeves (1964), Function minimization by conjugate gradients, Comp.<br />

J. 7, 149-154<br />

Flood, M.M., A. Leon (1964), A generalized direct search code for optimization, University<br />

ofMichigan, Mental Health Research Institute, preprint 129, Ann Arbor MI,<br />

June 1964<br />

Flood, M.M., A. Leon (1966), A universal adaptive code for optimization|GROPE, in:<br />

Lavi <strong>and</strong> Vogl (1966), pp. 101-130<br />

Floudas, C.A., P.M. Pardalos (1990), A collection of test problems for constrained global<br />

optimization algorithms, vol. 455 of Lecture Notes in Computer Science, Springer,<br />

Berlin<br />

Fogarty, L.E., R.M. Howe (1968), Trajectory optimization by a direct descent process,<br />

Simulation 11, 127-135<br />

Fogarty, L.E., R.M. Howe (1970), Hybrid computer solution of some optimization problems,<br />

Proceedings of the VIth International Analogue Computation Meeting, Munich,<br />

Aug.-Sept. 1970, pp. 127-135<br />

Fogel, D.B. (1991), System identi cation through simulated evolution, Ginn Press, Needham<br />

Heights MA<br />

Fogel, D.B. (1992), Evolving arti cial intelligence, Ph.D. thesis, University of California<br />

at San Diego<br />

Fogel, D.B., J.W. Atmar (Eds.) (1992), Proceedings of the 1st Annual Conference on<br />

<strong>Evolution</strong>ary Programming, San Diego, Feb. 21-22, 1992, <strong>Evolution</strong>ary Programming<br />

Society, La Jolla CA<br />

Fogel, D.B., J.W. Atmar (Eds.) (1993), Proceedings of the 2nd Annual Conference on<br />

<strong>Evolution</strong>ary Programming, San Diego, Feb. 25-26, 1993, <strong>Evolution</strong>ary Programming<br />

Society, La Jolla CA<br />

Fogel, L.J. (1962), Autonomous automata, Ind. Research 4, 14-19<br />

Fogel, L.J., A.J. Owens, M.J. Walsh (1965), Arti cial intelligence through a simulation<br />

of evolution, in: Max eld, Callahan, <strong>and</strong> Fogel (1965), pp. 131-155<br />

Fogel, L.J., A.J. Owens, M.J. Walsh (1966a), Adaption of evolutionary programming<br />

to the prediction of solar ares, General Dynamics-Convair, report NASA-CR-417,<br />

San Diego CA


References 271<br />

Fogel, L.J., A.J. Owens, M.J. Walsh (1966b), Arti cial intelligence through simulated<br />

evolution, Wiley, NewYork<br />

Forrest, S. (Ed.) (1993), Proceedings of the 5th International Conference on Genetic<br />

Algorithms, University of Illinois, Urbana-Champaign IL, July 17-21, 1993, Morgan<br />

Kaufmann, San Mateo CA<br />

Forsythe, G.E. (1968), On the asymptotic directions of the s-dimensional optimum gradient<br />

method, Numer. Math. 11, 57-76<br />

Forsythe, G.E. (1969), Remarks on the paper by Dekker, in: Dejon <strong>and</strong> Henrici (1969),<br />

pp. 49-51<br />

Forsythe, G.E., T.S. Motzkin (1951), Acceleration of the optimum gradient method,<br />

Bull. Amer. Math. Soc. 57, 304-305<br />

Fox, R.L. (1971), Optimization methods for engineering design, Addison-Wesley, Reading<br />

MA<br />

Frankhauser, P., H.-P. Schwefel (1992), Making use of the Weidlich-Haag model in the<br />

case of reduced data sets, in: Gritzmann et al. (1992), pp. 320-323<br />

Frankovic, B., S. Petras, J. Skakala, B. Vykouk (1970), Automatisierung und selbsttatige<br />

Steuerung, Verlag Technik, Berlin<br />

Fraser, A.S. (1957), Simulation of genetic systems by automatic digital computers, Australian<br />

J. Biol. Sci. 10, 484-499<br />

Friedberg, R.M. (1958), A learning machine I, IBM J. Res. Dev. 2, 2-13<br />

Friedberg, R.M., B. Dunham, J.H. North (1959), A learning machine II, IBM J. Res.<br />

Dev. 3, 282-287<br />

Friedmann, M., L.J. Savage (1947), Planning experiments seeking maxima, in: Eisenhart,<br />

Hastay, <strong>and</strong> Wallis (1947), pp. 365-372<br />

Friedrichs, K.O., O.E. Neugebauer, J.J. Stoker (Eds.) (1948), Studies <strong>and</strong> essays,<br />

Courant anniversary volume, Interscience, New York<br />

Fu, K.S., L.D. Cockrell (1970), On search techniques for multimodal surfaces, IFAC<br />

Kyoto Symposium on Systems Engineering Approach to Computer Control, Kyoto,<br />

Japan, Aug. 1970, paper 17.3<br />

Fu, K.S., Z.J. Nikolic (1966), On some reinforcement techniques <strong>and</strong> their relation to<br />

the stochastic approximation, IEEE Trans. AC-11, 756-758<br />

Furst, H., P.H. Muller, V. Nollau (1968), Eine stochastische Methode zur Ermittlung der<br />

Maximalstelle einer Funktion von mehreren Ver<strong>and</strong>erlichen mit experimentell ermittelbaren<br />

Funktionswerten und ihre Anwendung bei chemischen Prozessen, Chemie-<br />

Technik 20, 400-405


272 References<br />

Gaidukov, A.I. (1966), Primeneniye sluchainovo poiska pri optimalnom projektirovanii,<br />

Prikladnye zadichi tekhnicheskoi kibernetiki (1966), 420-436<br />

Gal, S. (1971), Sequential minimax search for a maximum when prior information is<br />

available, SIAM J. Appl. Math. 21, 590-595<br />

Gal, S. (1972), Multidimensional minimax search for a maximum, SIAM J. Appl. Math.<br />

23, 513-526<br />

Galar, R. (1994), <strong>Evolution</strong>ary simulations <strong>and</strong> insights into progress, in: Sebald <strong>and</strong><br />

Fogel (1994), pp. 344-352<br />

Galar, H., H. Kwasnicka, W. Kwasnicki (1980), Simulation of some processes of development,<br />

in: Dekker, Savastano, <strong>and</strong> Vansteenkiste (1980), pp. 133-142<br />

Gar nkel, R.S., G.L. Nemhauser (1972), Integer programming, Wiley, NewYork<br />

Gar nkel, R.S., G.L. Nemhauser (1973), A survey of integer programming emphasizing<br />

computation <strong>and</strong> relations among models, in: Hu <strong>and</strong> Robinson (1973), pp. 77-155<br />

Gauss, C.F. (1809), Determinatio orbitae observationibus quotcumque quam proxime<br />

satisfacientis, Werke, B<strong>and</strong> 7 (Theoria motus corporum coelestium in sectionibus<br />

conicis solem ambientium), Liber secundus, Sectio III, pp. 236-257, Hamburgi sumtibus<br />

Frid. Perthes et I.H. Besser, 1809 reprint: Teubner, Leipzig, Germany, 1906<br />

Gaviano, M., E. Fagiuoli (1972), Remarks on the comparison between r<strong>and</strong>om search<br />

methods <strong>and</strong> the gradient method, in: Szego (1972), pp. 337-349<br />

Gelf<strong>and</strong>, I.M. M.L. Tsetlin (1961), The principle of nonlocal search in automatic optimization<br />

systems, Soviet Physics Doklady 6(3), 192-194<br />

Geo rion, A.M. (Ed.) (1972), Perspectives on optimization, Addison-Wesley, Reading<br />

MA<br />

Gerardin, L. (1968), Natur als Vorbild|die Entdeckung der Bionik, Kindler, Munich<br />

Gersht, A.M., A.I. Kaplinskii (1971), Convergence of the continuous variant ofthe<br />

Robbins-Monro procedure, ARC 32, 71-75<br />

Gessner, P., K. Spremann (1972), Optimierung in Funktionenraumen, Springer, Berlin<br />

Gessner, P., H. Wacker (1972), Dynamische Optimierung|Einfuhrung, Modelle, Computerprogramme,<br />

Hanser, Munich<br />

Gilbert, E.G. (1967), A selected bibliography on parameter optimization methods suitable<br />

for hybrid computation, Simulation 8, 350-352<br />

Gilbert, P., W.J. Ch<strong>and</strong>ler (1972), Interference between communicating parallel processes,<br />

CACM 15, 427-437


References 273<br />

Gill, P.E., W. Murray (1972), Quasi-Newton methods for unconstrained optimization,<br />

JIMA 9, 91-108<br />

Ginsburg, T. (1963), The conjugate gradient method, Numer. Math. 5, 191-200<br />

Girsanov, I.V. (1972), Lectures on mathematical theory of extremum problems, Springer,<br />

Berlin<br />

Glass, H., L. Cooper (1965), Sequential search|a method for solving constrained optimization<br />

problems, JACM 12, 71-82<br />

Glover, F. (1986), Future paths for integer programming <strong>and</strong> links to arti cial intelligence,<br />

Comp. Oper. Res. 13, 533-549<br />

Glover, F. (1989), Tabu search|part I, ORSA-J. on Computing 1, 190-206<br />

Glover, F., H.-J. Greenberg (1989), New approaches for heuristic search|a bilateral<br />

linkage with arti cial intelligence, Europ. J. Oper. Res. 39, 119-130<br />

Gnedenko, B.W. (1970), Lehrbuch der Wahrscheinlichkeitsrechnung, 6th ed., Akademie-<br />

Verlag, Berlin<br />

Goldberg, D.E. (1989), Genetic algorithms in search, optimization, <strong>and</strong> machine learning,<br />

Addison-Wesley, Reading MA<br />

Goldfarb, D. (1969), Su cient conditions for the convergence of a variable metric algorithm,<br />

in: Fletcher (1969a), pp. 273-282<br />

Goldfarb, D. (1970), A family of variable-metric methods derived by variational means,<br />

Math. Comp. 24, 23-26<br />

Goldfeld, S.M., R.E. Qu<strong>and</strong>t, H.F. Trotter (1966), Maximization by quadratic hillclimbing,<br />

Econometrica 34, 541-551<br />

Goldfeld, S.M., R.E. Qu<strong>and</strong>t, H.F. Trotter (1968), Maximization by improved quadratic<br />

hill-climbing <strong>and</strong> other methods, Princeton University, Econometric Research Program,<br />

research memo. RM-95, Princeton NJ, April 1968<br />

Goldstein, A.A. (1962), Cauchy's method of minimization, Numer. Math. 4, 146-150<br />

Goldstein, A.A. (1965), On Newton's method, Numer. Math. 7, 391-393<br />

Goldstein, A.A., J.F. Price (1967), An e ective algorithm for minimization, Numer.<br />

Math. 10, 184-189<br />

Goldstein, A.A., J.F. Price (1971), On descent from local minima, Math. Comp. 25,<br />

569-574<br />

Golinski, J., Z.K. Lesniak (1966), Optimales Entwerfen von Konstruktionen mit Hilfe<br />

der Monte-Carlo-Methode, Bautechnik 43, 307-311


274 References<br />

Goll, R. (1972), Der <strong>Evolution</strong>ismus|Analyse eines Grundbegri s neuzeitlichen Denkens,<br />

Beck, Munich<br />

Golub, G.H. (1965), Numerical methods for solving linear least squares problems, Numer.<br />

Math. 7, 206-216<br />

Golub, G.H., M.A. Saunders (1970), Linear least squares <strong>and</strong> quadratic programming,<br />

in: Abadie (1970), pp. 229-256<br />

Gonzalez, R.S. (1970), An optimization study on a hybrid computer, Ann. Assoc. Int'l<br />

Calcul Analog. 12, 138-148<br />

Gorges-Schleuter, M. (1991a), Explicit parallelism of genetic algorithms through population<br />

structures, in: Schwefel <strong>and</strong> Manner (1991), pp. 150-159<br />

Gorges-Schleuter, M. (1991b), Genetic algorithms <strong>and</strong> population structures|a massively<br />

parallel algorithm, Dr. rer. nat. Diss., University ofDortmund, Department<br />

of Computer Science, Jan. 1991<br />

Gorvits, G.G., O.I. Larichev (1971), Comparison of search methods for the solution of<br />

nonlinear identi cation problems, ARC 32, 272-280<br />

Gottfried, B.S., J. Weisman (1973), Introduction to optimization theory, Prentice-Hall,<br />

Englewood Cli s NJ<br />

Gould, S.J., N. Eldredge (1977), Punctuated equilibria|the tempo <strong>and</strong> mode of evolution<br />

reconsidered, Paleobiology 3, 115-151<br />

Gould, S.J., N. Eldredge (1993), Punctuated equilibrium comes of age, Nature 366,<br />

223-227<br />

Gran, R. (1973), On the convergence of r<strong>and</strong>om search algorithms in continuous time<br />

with applications to adaptive control, IEEE Trans. SMC-3, 62-66<br />

Grasse, P.P. (1973), Allgemeine Biologie, vol. 5|<strong>Evolution</strong>, G. Fischer, Stuttgart<br />

Grassmann, P. (1967), Verfahrenstechnik und Biologie, Chemie Ingenieur Technik 39,<br />

1217-1226<br />

Grassmann, P. (1968), Verfahrenstechnik und Medizin, Chemie Ingenieur Technik 40,<br />

1094-1100<br />

Grauer, M., A. Lew<strong>and</strong>owski, A.P. Wierzbicki (Eds.) (1982), Multiobjective <strong>and</strong> stochastic<br />

optimization, Proceedings of the IIASA Task Force Meeting, Nov. 30 - Dec. 4,<br />

1981, IIASA Proceedings Series CP-82-S12, Laxenburg, Austria<br />

Grauer, M., D.B. Pressmar (Eds.) (1991), Parallel computing <strong>and</strong> mathematical optimization,<br />

vol. 367 of Lecture Notes in Economics <strong>and</strong> Mathematical Systems,<br />

Springer, Berlin


References 275<br />

Graves, R.L., P. Wolfe (Eds.) (1963), Recent advances in mathematical programming,<br />

McGraw-Hill, New York<br />

Greenberg, H. (1971), Integer programming, Academic Press, New York<br />

Greenstadt, J. (1967a), On the relative e ciencies of gradient methods, Math. Comp.<br />

21, 360-367<br />

Greenstadt, J. (1967b), Bestimmung der Eigenwerte einer Matrix nach der Jacobi-<br />

Methode, in: Ralston <strong>and</strong> Wilf (1967), pp. 152-168<br />

Greenstadt, J. (1970), Variations on variable-metric methods, Math. Comp. 24, 1-22<br />

Greenstadt, J. (1972), A quasi-Newton method with no derivatives, Math. Comp. 26,<br />

145-166<br />

Grefenstette, J.J. (Ed.) (1985), Proceedings of the 1st International Conference on<br />

Genetic Algorithms, Carnegie-Mellon University, Pittsburgh PA, July 24-26, 1985,<br />

Lawrence Erlbaum, Hillsdale NJ<br />

Grefenstette, J.J. (Ed.) (1987), Proceedings of the 2nd International Conference on<br />

Genetic Algorithms, MIT, Cambridge MA, July 28-31, 1987, Lawrence Erlbaum,<br />

Hillsdale NJ<br />

Gritzmann, P., R. Hettich, R. Horst, E. Sachs (Eds.) (1992), Operations Research '91,<br />

Extended Abstracts of the 16th Symposium on Operations Research, Trier, Sept.<br />

9-11, 1991, Physica-Verlag, Heidelberg<br />

Grusser, O.J., R. Klinke (Eds.) (1971), Zeichenerkennung durch biologische und technische<br />

Systeme, Springer, Berlin<br />

Guilfoyle, G., I. Johnson, P. Wheatley (1967), One-dimensional search combining golden<br />

section <strong>and</strong> cubic t techniques, Analytical Mechanics Associates Inc., quarterly<br />

report 67-1, Westbury, Long Isl<strong>and</strong> NY, Jan. 1967<br />

Guin, J.A. (1968), Modi cation of the complex method of constrained optimization,<br />

Comp. J. 10, 416-417<br />

Gurin, L.S. (1966), R<strong>and</strong>om search in the presence of noise, Engng. Cybern. 4(3),<br />

252-260<br />

Gurin, L.S., V.P. Lobac (1963), Combination of the Monte Carlo method with the<br />

method of steepest descents for the solution of certain extremal problems, AIAA J.<br />

1, 2708-2710<br />

Gurin, L.S., L.A. Rastrigin (1965), Convergence of the r<strong>and</strong>om search methodinthe<br />

presence of noise, ARC 26, 1505-1511


276 References<br />

Hadamard, J. (1908), Memoire sur le probleme d'analyse relatif al'equilibre des plaques<br />

elastiques encastrees, Memoires presentes par divers savants a l' Academie des sciences<br />

de l'Institut national de France, 2nd Ser., vol. 33 (savants etrangers), no. 4,<br />

pp. 1-128<br />

Hadley, G. (1962), Linear programming, Addison-Wesley, Reading MA<br />

Hadley, G. (1969), Nichtlineare und dynamische Programmierung, Physica-Verlag, Wurzburg,<br />

Germany<br />

Haefner, K. (Ed.) (1992), <strong>Evolution</strong> of information processing systems|an interdisciplinary<br />

approach for a new underst<strong>and</strong>ing of nature <strong>and</strong> society, Springer, Berlin<br />

Hague, D.S., C.R. Glatt (1968), An introduction to multivariable search techniques for<br />

parameter optimization <strong>and</strong> program AESOP, Boeing Space Division, report NASA-<br />

CR-73200, Seattle WA, March 1968<br />

Hamilton, P.A., J. Boothroyd (1969), Remark on algorithm 251 (E4)|function minimization,<br />

CACM 12, 512-513<br />

Hammel, U. (1991), Cartoon|combining modular simulation, regression, <strong>and</strong> optimization<br />

in an object-oriented environment, in: Kohler (1991), pp. 854-855<br />

Hammel, U., T. Back (1994), <strong>Evolution</strong> strategies on noisy functions|how toimprove<br />

convergence properties, in: Davidor, Schwefel, <strong>and</strong> Manner (1994), pp. 159-168<br />

Hammer, P.L. (Ed.) (1984), Stochastics <strong>and</strong> optimization, Annals of Operations Research,<br />

vol. 1, Baltzer, Basle, Switzerl<strong>and</strong><br />

Hammersley, J.M., D.C. H<strong>and</strong>scomb (1964), Monte Carlo methods, Methuen, London<br />

Hancock, H. (1960), Theory of maxima <strong>and</strong> minima, Dover, New York<br />

Hansen, P.B. (1972), Structured multiprogramming, CACM 15, 574-578<br />

Harkins, A. (1964), The use of parallel tangents in optimization, in: Blakemore <strong>and</strong><br />

Davis (1964), pp. 35-40<br />

Hartmann, D. (1974), Optimierung balkenartiger Zylinderschalen aus Stahlbeton mit<br />

elastischem und plastischem Werksto verhalten, Dr.-Ing. Diss., University of Dortmund,<br />

July 1974<br />

Haubrich, J.G.A. (1963), Algorithm 205 (E4)|ative, CACM 6, 519<br />

Heckler, R. (1979), OASIS|optimization <strong>and</strong> simulation integrating system|status report,<br />

technical report KFA-STE-IB-2/79, Nuclear Research Center (KFA) Julich,<br />

Germany, Dec. 1979


References 277<br />

Heckler, R., H.-P. Schwefel (1978), Superimposing direct search methods for parameter<br />

optimization onto dynamic simulation models, in: Highl<strong>and</strong>, Nielsen, <strong>and</strong> Hull<br />

(1978), pp. 173-181<br />

Heinhold, J., K.W. Gaede (1972), Ingenieur-Statistik, 3rd ed., Oldenbourg, Munich<br />

Henn, R., H.P. Kunzi (1968), Einfuhrung in die Unternehmensforschung I und II, Springer,<br />

Berlin<br />

Herdy, M. (1992), Reproductive isolation as strategy parameter in hierarchical organized<br />

evolution strategies, in: Manner <strong>and</strong> M<strong>and</strong>erick (1992), pp. 207-217<br />

Herschel, R. (1961), Automatische Optimisatoren, Elektronische Rechenanlagen 3, 30-36<br />

Hertel, H. (1963), Biologie und Technik, B<strong>and</strong> 1: Struktur - Form - Bewegung, Krausskopf<br />

Verlag, Mainz<br />

Hesse, R. (1973), A heuristic search procedure for estimating a global solution of nonconvex<br />

programming problems, Oper. Res. 21, 1267-1280<br />

Hestenes, M.R. (1956), The conjugate-gradient method for solving linear systems, Proc.<br />

Symp. Appl. Math. 6, 83-102<br />

Hestenes, M.R. (1966), Calculus of variations <strong>and</strong> optimal control theory, Wiley, New<br />

York<br />

Hestenes, M.R. (1969), Multiplier <strong>and</strong> gradient methods, in: Zadeh, Neustadt, <strong>and</strong> Balakrishnan<br />

(1969a), pp. 143-163<br />

Hestenes, M.R. (1973), Iterative methods for solving linear equations, JOTA 11, 323-334<br />

(reprint of the original from 1951)<br />

Hestenes, M.R., M.L. Stein (1973), The solution of linear equations by minimization,<br />

JOTA 11, 335-359 (reprint of the original from 1951)<br />

Hestenes, M.R., E. Stiefel (1952), Methods of conjugate gradients for solving linear<br />

systems, NBS J. Research 49, 409-436<br />

Heusener, G. (1970), Optimierung natriumgekuhlter schneller Brutreaktoren mit Methoden<br />

der nichtlinearen Programmierung, report KFK-1238, Nuclear Research Center<br />

(KfK) Karlsruhe, Germany, July1970<br />

Heydt, G.T. (1970), Directed r<strong>and</strong>om search, Ph.D. thesis, Purdue University, Lafayette<br />

IN, Aug. 1970<br />

Heynert, H. (1972), Einfuhrung in die allgemeine Bionik, Deutscher Verlag der Wissenschaften,<br />

Berlin<br />

Highl<strong>and</strong>, H.J., N.R. Nielsen, L.G. Hull (Eds.) (1978), Proceedings of the Winter Simulation<br />

Conference, Miami Beach FL, Dec. 4-6, 1978


278 References<br />

Hildebr<strong>and</strong>, F.B. (1956), Introduction to numerical analysis, McGraw-Hill, New York<br />

Hill, J.C. (1964), A hill-climbing technique using piecewise cubic approximation, Ph.D.<br />

thesis, Purdue University, Lafayette IN, June 1964<br />

Hill, J.C., J.E. Gibson (1965), Hill-climbing on hills with many minima, Proceedings<br />

of the IInd IFAC Symposium on the Theory of Self Adaptive Control Systems,<br />

Teddington UK, Sept. 1965, pp. 322-334<br />

Hill, J.D. (1969), A search technique for multimodal surfaces, IEEE Trans. SSC-5, 2-8<br />

Hill, J.D., K.S. Fu (1965), A learning control system using stochastic approximation<br />

for hill-climbing, VIth Joint Automatic Control Conference, Troy NY, June 1965,<br />

session 14, paper 2<br />

Hill, J.D., G.J. McMurtry, K.S.Fu (1964), A computer-simulated on-line experiment in<br />

learning control systems, AFIPS Conf. Proc. 25, 315-325<br />

Himmelblau, D.M. (1972a), A uniform evaluation of unconstrained optimization techniques,<br />

in: Lootsma (1972b), pp. 69-97<br />

Himmelblau, D.M. (1972b), Applied nonlinear programming, McGraw-Hill, New York<br />

Himsworth, F.R. (1962), Empirical methods of optimisation, Trans. Inst. Chem. Engrs.<br />

40, 345-349<br />

Hock, W., K. Schittkowski (1981), Test examples for nonlinear programming codes, vol.<br />

187 of Lecture Notes in Economics <strong>and</strong> Mathematical Systems, Springer, Berlin<br />

Hofestadt, R., F. Kruckeberg, T. Lengauer (Eds.) (1993), Informatik in der Biowissenschaft,<br />

Springer, Berlin<br />

Ho mann, U., H. Hofmann (1970), Einfuhrung in die Optimierung mit Anwendungsbeispielen<br />

aus dem Chemie-Ingenieur-Wesen, Verlag Chemie, Weinheim<br />

Ho meister, F. (1991), Scalable parallelism by evolutionary algorithms, in: Grauer <strong>and</strong><br />

Pressmar (1991), pp. 177-198<br />

Ho meister, F., T. Back (1990), Genetic algorithms <strong>and</strong> evolution strategies|similarities<br />

<strong>and</strong> di erences, technical report 365 (green series), University of Dortmund, Department<br />

of Computer Science, Nov. 1990<br />

Ho meister, F., T. Back (1991), Genetic algorithms <strong>and</strong> evolution strategies|similarities<br />

<strong>and</strong> di erences, in: Schwefel <strong>and</strong> Manner (1991), pp. 445-469<br />

Ho meister, F., T. Back (1992), Genetic algorithms <strong>and</strong> evolution strategies|similarities<br />

<strong>and</strong> di erences, technical report SYS-1/92, Systems Analysis Research Group, University<br />

of Dortmund, Department of Computer Science, Feb. 1992


References 279<br />

Ho meister, F., H.-P. Schwefel (1990), A taxonomy of parallel evolutionary algorithms,<br />

in: Wolf, Legendi, <strong>and</strong> Schendel (1990), pp. 97-107<br />

Ho er, A. (1976), Formoptimierung von Leichtbaufachwerken durch Einsatz einer <strong>Evolution</strong>sstrategie,<br />

Dr.-Ing. Diss., Technical University of Berlin, Department ofTransportation<br />

Technologies, June 1976<br />

Ho er, A., U. Ley ner, J. Wiedemann (1973), Optimization of the layout of trusses<br />

combining strategies based on Michell's theorem <strong>and</strong> on the biological principles of<br />

evolution, IInd Symposium on Structural Optimization, Milan, April 1973, AGARD<br />

Conf. Proc. 123, appendix A<br />

Holl<strong>and</strong>, J.H. (1975), Adaptation in natural <strong>and</strong> arti cial systems, University of Michigan<br />

Press, Ann Arbor MI<br />

Holl<strong>and</strong>, J.H. (1992), Adaptation in natural <strong>and</strong> arti cial systems, 2nd ed., MIT Press,<br />

Cambridge MA<br />

Holl<strong>and</strong>, J.H., K.J. Holyoak, R.E. Nisbett, P.R. Thagard (1986), Induction|process of<br />

interference, learning, <strong>and</strong> discovery, MIT Press, Cambridge MA<br />

Hollstien, R.B. (1971), Arti cial genetic adaptation in computer control systems, Ph.D.<br />

thesis, University of Michigan, Ann Arbor MI<br />

Hooke, R. (1957), Control by automatic experimentation, Chem. Engng. 64(6), 284-286<br />

Hooke, R., T.A. Jeeves (1958), Comments on Brooks' discussion of r<strong>and</strong>om methods,<br />

Oper. Res. 6, 881-882<br />

Hooke, R., T.A. Jeeves (1961), Direct search solution of numerical <strong>and</strong> statistical problems,<br />

JACM 8, 212-229<br />

Hooke, R., R.I. VanNice (1959), Optimizing control by automatic experimentation, ISA<br />

J. 6(7), 74-79<br />

Hopper, M.J. (Ed.) (1971), Harwell subroutine library|a catalogue of subroutines,<br />

UKAEA Research Group, report AERE-R-6912, Harwell, Oxon<br />

Horst, R. (Ed.) (1991), J. of Global Optimization, Kluwer, Dordrecht, The Netherl<strong>and</strong>s<br />

Hoshino, S. (1971), On Davies, Swann, <strong>and</strong> Campey minimisation process, Comp. J.<br />

14, 426-427<br />

Hoshino, S. (1972), A formulation of variable metric methods, JIMA 10, 394-403<br />

Hotelling, H. (1941), Experimental determination of the maximum of a function, Ann.<br />

Math. Stat. 12, 20-45<br />

House, F.R. (1971), Remark on algorithm 251 (E4)|function minimisation, CACM 14,<br />

358


280 References<br />

Householder, A.S. (1953), Principles of numerical analysis, McGraw-Hill, New York<br />

Householder, A.S. (1970), The numerical treatment of a single nonlinear equation, Mc-<br />

Graw-Hill, New York<br />

Houston, B.F., R.A. Hu man (1971), A technique which combines modi ed pattern<br />

search methods with composite designs <strong>and</strong> polynomial constraints to solve constrained<br />

optimization problems, Nav. Res. Log. Quart. 18, 91-98<br />

Hu, T.C. (1972), Ganzzahlige Programmierung und Netzwerk usse, Oldenbourg, Munich<br />

Hu, T.C., S.M. Robinson (Eds.) (1973), Mathematical programming, Academic Press,<br />

New York<br />

Huang, H.Y. (1970), Uni ed approach to quadratically convergent algorithms for function<br />

minimization, JOTA 5, 405-423<br />

Huang, H.Y. (1974), Method of dual matrices for function minimization, JOTA 13,<br />

519-537<br />

Huang, H.Y., J.P. Chambliss (1973), Quadratically convergent algorithms <strong>and</strong> onedimensional<br />

search schemes, JOTA 11, 175-188<br />

Huang, H.Y., J.P. Chambliss (1974), Numerical experiments on dual matrix algorithms<br />

for function minimization, JOTA 13, 620-634<br />

Huang, H.Y., A.V. Levy (1970), Numerical experiments on quadratically convergent<br />

algorithms for function minimization, JOTA 6, 269-282<br />

Huberman, B.A. (Ed.) (1988), The ecology of computation, North Holl<strong>and</strong>, Amsterdam<br />

Huelsman, L.P. (1968), GOSPEL|a general optimization software package for electrical<br />

network design, University of Arizona, Department of Electrical Engineering, report,<br />

Tucson AZ, Sept. 1968<br />

Hull, T.E. (1967), R<strong>and</strong>om-number generation <strong>and</strong> Monte-Carlo methods, in: Klerer<br />

<strong>and</strong> Korn (1967), pp. 63-78<br />

Humphrey, W.E., B.J. Cottrell (1962/66), A general minimizing routine, University of<br />

California, Lawrence Radiation Laboratory, internal memo. P-6, Livermore CA,<br />

July 1962, rev. March 1966<br />

Hupfer, P. (1970), Optimierung von Baukonstruktionen, Teubner, Stuttgart<br />

Hwang, C.L., A.S.M. Masud (1979), Multiple objective decision making|methods <strong>and</strong><br />

applications, vol. 164 of Lecture Notes in Economics <strong>and</strong> Mathematical Systems,<br />

Springer, Berlin<br />

Hyslop, J. (1972), A note on the accuracy of optimisation techniques, Comp. J. 15, 140


References 281<br />

Idelsohn, J.M. (1964), Ten ways to nd the optimum, Contr. Engng. 11(6), 97-102<br />

Imamura, H., K. Uosaki, M. Tasaka, T. Suzuki (1970), Optimization methods in the<br />

multimodal case <strong>and</strong> their application to automatic lens design, IFAC Kyoto Symposium<br />

on Systems Engineering Approach to Computer Control, Kyoto, Japan,<br />

Aug. 1970, paper 7.4<br />

Ivakhnenko, A.G. (1970), Heuristic self-organization in problems of engineering cybernetics,<br />

Automatica 6, 207-219<br />

Jacobson, D.H., D.Q. Mayne (1970), Di erential dynamic programming, Elsevier, New<br />

York<br />

Jacoby, S.L.S., J.S. Kowalik, J.T. Pizzo (1972), Iterative methods for nonlinear optimization<br />

problems, Prentice-Hall, Englewood Cli s NJ<br />

Jahnke-Emde-Losch (1966), Tafeln hoherer Funktionen, 7th ed., Teubner, Stuttgart<br />

Janac, K. (1971), Adaptive stochastic approximations, Simulation 16, 51-58<br />

Jarratt, P. (1967), An iterative method for locating turning points, Comp. J. 10, 82-84<br />

Jarratt, P. (1968), A numerical method for determining points of in ection, BIT 8, 31-35<br />

Jarratt, P. (1970), A review of methods for solving nonlinear algebraic equations in one<br />

variable, in: Rabinowitz (1970), pp. 1-26<br />

Jarvis, R.A. (1968), Hybrid computer simulation of adaptive strategies, Ph.D. thesis,<br />

University ofWestern Australia, Nedl<strong>and</strong>s WA, March 1968<br />

Jarvis, R.A. (1970), Adaptive global search in a time-variant environment using a probabilistic<br />

automaton with pattern recognition supervision, IEEE Trans. SSC-6,<br />

209-217<br />

Jeeves, T.A. (1958), Secant modi cation of Newton's method, CACM 1, 9-10<br />

Johnk, M.D. (1969), Erzeugen und Testen von Zufallszahlen, Physica-Verlag, Wurzburg,<br />

Germany<br />

Johannsen, G. (1970), Entwicklung und Optimierung eines vielparametrigen nichtlinearen<br />

Modells fur den Menschen als Regler in der Fahrzeugfuhrung, Dr.-Ing. Diss.,<br />

Technical University of Berlin, Department ofTransportation Technologies, Oct.<br />

1970<br />

Johannsen, G. (1973), Optimierung vielparametriger Bezugsmodelle mit Hilfe von Zufallssuchverfahren,<br />

Regelungstechnische Proze -Datenverarbeitung 21, 234-239<br />

John, F. (1948), Extremum problems with inequalities as subsidiary conditions, in:<br />

Friedrichs, Neugebauer, <strong>and</strong> Stoker (1948), pp. 187-204


282 References<br />

John, P.W.M. (1971), Statistical design <strong>and</strong> analysis of experiments, Macmillan, New<br />

York<br />

Johnson, S.M. (1956), Best exploration for maximum is Fibonaccian, RAND Corporation,<br />

report P-856, Santa Monica CA<br />

Jones, A. (1970), Spiral|a new algorithm for non-linear parameter estimation using<br />

least squares, Comp. J. 13, 301-308<br />

Jones, D.S. (1973), The variable metric algorithm for non-de nite quadratic functions,<br />

JIMA 12, 63-71<br />

Joosen, W., E. Milgrom (Eds.) (1992), Parallel computing|from theory to sound practice,<br />

Proceedings of the European Workshop on Parallel Computing (EWPC '92),<br />

Barcelona, Spain, March 1992, IOS Press, Amsterdam<br />

Jordan, P. (1970), Schopfung und Geheimnis, Stalling, Oldenburg, Germany<br />

Kamiya, A., T. Togawa (1972), Optimal branching structure of the vascular tree, Bull.<br />

Math. Biophys. 34, 431-438<br />

Kammerer, W.J., M.Z. Nashed (1972), On the convergence of the conjugate gradient<br />

method for singular linear operator equations, SIAM J. Numer. Anal. 9, 165-181<br />

Kantorovich, L.V. (1940), A new method of solving of some classes of extremal problems,<br />

Compt. Rend. Acad. Sci. URSS (USSR), New Ser. 28, 211-214<br />

Kantorovich, L.V. (1945), On an e ective method of solving extremal problems for<br />

quadratic functionals, Compt. Rend. Acad. Sci. URSS (USSR), New Ser. 48,<br />

455-460<br />

Kantorovich, L.V. (1952), Functional analysis <strong>and</strong> applied mathematics, NBS report<br />

1509, March 1952<br />

Kaplinskii, A.I., A.I. Propoi (1970), Stochastic approach to non-linear programming<br />

problems, ARC 31, 448-459<br />

Kappler, H. (1967), Gradientenverfahren der nichtlinearen Programmierung, O. Schwartz,<br />

Gottingen, Germany<br />

Karmarkar, N. (1984), A new polynomial-time algorithm for linear programming, Combinatorica<br />

4, 373-395<br />

Karnopp, D.C. (1961), Search theory applied to parameter scan optimization problems,<br />

Ph.D. thesis, MIT, Cambridge MA, June 1961<br />

Karnopp, D.C. (1963), R<strong>and</strong>om search techniques for optimization problems, Automatica<br />

1, 111-121


References 283<br />

Karnopp, D.C. (1966), Ein direktes Rechenverfahren fur implizite Variationsprobleme<br />

bei optimalen Prozessen, Regelungstechnik 14, 366-368<br />

Karp, R.M., W.L. Miranker (1968), Parallel minimax search foramaximum, J.Comb.<br />

Theory 4, 19-35<br />

Karreman, H.F. (Ed.) (1968), Stochastic optimization <strong>and</strong> control, Wiley, New York<br />

Karumidze, G.V. (1969), A method of r<strong>and</strong>om search for the solution of global extremum<br />

problems, Engng. Cybern. 7(6), 27-31<br />

Katkovnik, V.Ya., O.Yu. Kulchitskii (1972), Convergence of a class of r<strong>and</strong>om search<br />

algorithms, ARC 33, 1321-1326<br />

Katkovnik, V.Ya., L.I. Shimelevich (1972), A class of heuristic methods for solution of<br />

partially-integer programming problems, Engng. Cybern. 10, 390-394<br />

Kaupe, A.F. (1963), Algorithm 178 (E4)|direct search, CACM 6, 313-314<br />

Kaupe, A.F. (1964), On optimal search techniques, CACM 7, 38<br />

Kavanaugh, W.P., E.C. Stewart, D.H. Brocker (1968), Optimal control of satellite attitude<br />

acquisition by a r<strong>and</strong>om search algorithm on a hybrid computer, AFIPS Conf.<br />

Proc. 32, 443-452<br />

Kawamura, K., R.A. Volz (1973), On the rate of convergence of the conjugate gradient<br />

reset method with inaccurate linear minimizations, IEEE Trans. AC-18, 360-366<br />

Kelley, H.J. (1962), Methods of gradients, in: Leitmann (1962), pp. 205-254<br />

Kelley, H.J., G.E. Myers (1971), Conjugate direction methods for parameter optimization,<br />

Astron. Acta 16, 45-51<br />

Kelley, H.J., J.L. Speyer (1970), Accelerated gradient projection, in: Balakrishnan et<br />

al. (1970), pp. 151-158<br />

Kempthorne, O. (1952), The design <strong>and</strong> analysis of experiments, Wiley, NewYork<br />

Kenworthy, I.C. (1967), Some examples of simplex evolutionary operation in the paper<br />

industry, Appl. Stat. 16, 211-224<br />

Kesten, H. (1958), Accelerated stochastic approximation, Ann. Math. Stat. 29, 41-59<br />

Khachiyan, L.G. (1979), (abstract on the ellipsoid method), Doklady Akademii Nauk<br />

SSSR (USSR) 244, 1093-1096<br />

Khovanov, N.V. (1967), Stochastic optimization of parameters by the method of variation<br />

of the search region, Engng. Cybern. 5(4), 34-39


284 References<br />

Kiefer, J. (1953), Sequential minimax search for a maximum, Proc. Amer. Math. Soc.<br />

4, 502-506<br />

Kiefer, J. (1957), <strong>Optimum</strong> sequential search <strong>and</strong> approximation methods under minimum<br />

regularity assumptions, SIAM J. 5, 105-136<br />

Kiefer, J., J. Wolfowitz (1952), Stochastic estimation of the maximum of a regression<br />

function, Ann. Math. Stat. 23, 462-466<br />

King, R.F. (1973), An improved Pegasus method for root nding, BIT 13, 423-427<br />

Kirkpatrick, S., C.D. Gelatt, M.P. Vecchi (1983), Optimization by simulated annealing,<br />

Science 220, 671-680<br />

Kivelidi, V.Kh., Ya.I. Khurgin (1970), Construction of probabilistic search, ARC 31,<br />

1892-1894<br />

Kiwiel, K.C. (1985), Methods of descent for nondi erentiable optimization, vol. 1133 of<br />

Lecture Notes in Mathematics, Springer, Berlin<br />

Kjellstrom, G. (1965), Network optimization by r<strong>and</strong>om variation of component values,<br />

Ericsson Technical 25, 133-151<br />

Klerer, M., G.A. Korn (Eds.) (1967), Digital computer user's h<strong>and</strong>-book, McGraw-Hill,<br />

New York<br />

Klessig, R., E. Polak (1972), E cient implementations of the Polak-Ribiere conjugate<br />

gradient algorithm, SIAM J. Contr. 10, 524-549<br />

Klessig, R., E. Polak (1973), An adaptive precision gradient method for optimal control,<br />

SIAM J. Contr. 11, 80-93<br />

Klingman, W.R., D.M. Himmelblau (1964), Nonlinear programming with the aid of a<br />

multiple-gradient summation technique, JACM 11, 400-415<br />

Klir, G.J. (Ed.) (1978), Applied general systems research, Plenum Press, New York<br />

Klockgether, J., H.-P. Schwefel (1970), Two-phase nozzle <strong>and</strong> hollow core jet experiments,<br />

in: Elliott (1970), pp. 141-148<br />

Klotzler, R. (1970), Mehrdimensionale Variationsrechnung, Birkhauser, Basle, Switzerl<strong>and</strong><br />

Kobelt, D., G. Schneider (1977), Optimierung im Dialog unter Verwendung von <strong>Evolution</strong>sstrategie<br />

und Ein u gro enrechnung, Chemie-Technik 6, 369-372<br />

Koch, H.W. (1973), Der Sozialdarwinismus|seine Genese und sein Ein u auf das imperialistische<br />

Denken, Beck, Munich


References 285<br />

Kochen, M., H.M. Hastings (Eds.) (1988), Advances in cognitive science|steps toward<br />

convergence, AAAS Selected Symposium 104<br />

Kohler, E. (Ed.) (1991), 36th International Scienti c Colloquium, Ilmenau, Oct. 21-24,<br />

1991, Technical University of Ilmenau, Germany<br />

Kopp, R.E. (1967), Computational algorithms in optimal control, IEEE Int'l Conv.<br />

Record 15, part 3 (Automatic Control), 5-14<br />

Korbut, A.A., J.J. Finkelstein (1971), Diskrete Optimierung, Akademie-Verlag, Berlin<br />

Korn, G.A. (1966), R<strong>and</strong>om process simulation <strong>and</strong> measurement, McGraw-Hill, New<br />

York<br />

Korn, G.A. (1968), Hybrid computer Monte Carlo techniques, in: McLeod (1968), pp.<br />

223-234<br />

Korn, G.A., T.M. Korn (1961), Mathematical h<strong>and</strong>book for scientists <strong>and</strong> engineers,<br />

McGraw-Hill, New York<br />

Korn, G.A., T.M. Korn (1964), Electronic analog <strong>and</strong> hybrid computers, McGraw-Hill,<br />

New York<br />

Korn, G.A., H. Kosako (1970), A proposed hybrid-computer method for functional optimization,<br />

IEEE Trans. C-19, 149-153<br />

Kovacs, Z., S.A. Lill (1971), Note on algorithm 46|a modi ed Davidon method for<br />

nding the minimum of a function, using di erence approximation for derivatives,<br />

Comp. J. 14, 214-215<br />

Kowalik, J. (1967), A note on nonlinear regression analysis, Austral. Comp. J. 1, 51-53<br />

Kowalik, J., J.F. Morrison (1968), Analysis of kinetic data for allosteric enzyme reactions<br />

as a nonlinear regression problem, Math. Biosci. 2, 57-66<br />

Kowalik, J., M.R. Osborne (1968), Methods for unconstrained optimization problems,<br />

Elsevier, New York<br />

Koza, J. (1992), Genetic programming, MIT Press, Cambridge MA<br />

Krallmann, H. (1978), <strong>Evolution</strong> strategy <strong>and</strong> social sciences, in: Klir (1978), pp. 891-<br />

903<br />

Krasnushkin, E.V. (1970), Multichannel automatic optimizer having a variable sign for<br />

the feedback, ARC 31, 2057-2061<br />

Krasovskii, A.A. (1962), Optimal methods of search incontinuous <strong>and</strong> pulsed extremum<br />

control systems, Proceedings of the 1st IFAC Symposium on Optimization <strong>and</strong><br />

Adaptive Control, Rome, April 1962, pp. 19-33


286 References<br />

Krasovskii, A.A. (1963), Problems of continuous systems theory of extremal control of<br />

industrial processes, Proceedings of the IInd IFAC Congress, Basle, Switzerl<strong>and</strong>,<br />

Aug.-Sept. 1963, vol. 1, pp. 519-526<br />

Krasulina, T.P. (1972), Robbins-Monro process in the case of several roots, ARC 33,<br />

580-585<br />

Kregting, J., R.C. White, Jr. (1971), Adaptive r<strong>and</strong>om search, Eindhoven University of<br />

Technology, Department of Electrical Engineering, Group Measurement <strong>and</strong> Control,<br />

report TH-71-E-24, Eindhoven, The Netherl<strong>and</strong>s, Oct. 1971<br />

Krelle, W., H.P. Kunzi (1958), Lineare Programmierung, Verlag Industrielle Organisation,<br />

Zurich, Switzerl<strong>and</strong><br />

Krolak, P.D. (1968), Further extensions of Fibonaccian search to linear programming<br />

problems, SIAM J. Contr. 6, 258-265<br />

Krolak, P.D., L. Cooper (1963), An extension of Fibonaccian search to several variables,<br />

CACM 6, 639-641<br />

Kuester, J.L., J.H. Mize (1973), Optimization techniques with Fortran, McGraw-Hill,<br />

New York<br />

Kuhn, H.W. (Ed.) (1970), Proceedings of the Princeton Symposium on Mathematical<br />

Programming, Aug. 1967, Princeton University Press, Princeton NJ<br />

Kuhn, H.W., A.W. Tucker (1951), Nonlinear programming, in: Neyman (1951), pp.<br />

481-492<br />

Kulchitskii, O.Yu. (1972), A non-gradient r<strong>and</strong>om search method for an extremum in a<br />

Hilbert space, Engng. Cybern. 10, 773-780<br />

Kunzi, H.P. (1967), Mathematische Optimierung gro er Systeme, Ablauf- und Planungsforschung<br />

8, 395-407<br />

Kunzi, H.P., W. Krelle (1969), Einfuhrung in die mathematische Optimierung, Verlag<br />

Industrielle Organisation, Zurich, Switzerl<strong>and</strong><br />

Kunzi, H.P., W. Krelle, W. Oettli (1962), Nichtlineare Programmierung, Springer, Berlin<br />

Kunzi, H.P., W. Oettli (1969), Nichtlineare Optimierung|neuere Verfahren|Bibliographie,<br />

Springer, Berlin<br />

Kunzi, H.P., S.T. Tan (1966), Lineare Optimierung gro er Systeme, Springer, Berlin<br />

Kunzi, H.P., H.G. Tzschach, C.A. Zehnder (1966), Numerische Methoden der mathematischen<br />

Optimierung, Teubner, Stuttgart


References 287<br />

Kunzi, H.P., H.G. Tzschach, C.A. Zehnder (1970), Numerische Methoden der mathematischen<br />

Optimierung mit Algol- und Fortran-Programmen|Gebrauchsversion der<br />

Computerprogramme, Teubner, Stuttgart<br />

Kuo, F.F., J.F. Kaiser (Eds.) (1966), System analysis by digital computer, Wiley, New<br />

York<br />

Kursawe, F. (1991), A variant ofevolution strategies for vector optimization in: Schwefel<br />

<strong>and</strong> Manner (1991), pp. 193-197<br />

Kursawe, F. (1992), <strong>Evolution</strong> strategies for vector optimization, in: Tzeng <strong>and</strong> Yu<br />

(1992), vol. 3, pp. 187-193<br />

Kushner, H.J. (1963), Hill climbing methods for the optimization of multiparameter<br />

noise disturbed systems, Trans. ASME D, J. Basic Engng. (1963), 157-164<br />

Kushner, H.J. (1972), Stochastic approximation algorithms for the local optimization of<br />

functions with nonunique stationary points, IEEE Trans. AC-17, 646-654<br />

Kussul, E., A. Luk (1971), <strong>Evolution</strong> als Optimierungsproze , Ideen des exakten Wissens<br />

(1971), 821-826<br />

Kwakernaak, H. (1965), On-line iterative optimization of stochastic control systems,<br />

Automatica 2, 195-208<br />

Kwakernaak, H. (1966), On-line dynamic optimization of stochastic control systems,<br />

Proceedings of the IIIrd IFAC Congress, London, June 1966, paper 29-D<br />

Kwatny, H.G. (1972), A note on stochastic approximation algorithms in system identication,<br />

IEEE Trans. AC-17, 571-572<br />

Laarhoven, P.J.M. van, E.H.L. Aarts (1987), Simulated annealing, theory <strong>and</strong> applications,<br />

Reidel, Dordrecht, The Netherl<strong>and</strong>s<br />

L<strong>and</strong>, A.H., S. Powell (1973), Fortran codes for mathematical programming|linear,<br />

quadratic <strong>and</strong> discrete, Wiley, London<br />

Lange-Nielsen, T., G.M. Lance (1972), A pattern search algorithm for feedback-control<br />

system parameter optimization, IEEE Trans. C-21, 1222-1227<br />

Langguth, V. (1972), Ein Identi kationsverfahren fur lineare Systeme mit Hilfe von stochastischen<br />

Suchverfahren und unter Anwendung der Sequentialanalyse fur stochastische<br />

Fehlersignale, messen-steuern-regeln 15, 293-296<br />

Langton, C.G. (Ed.) (1989), Arti cial life, Proceedings of an Interdisciplinary Workshop<br />

on the Synthesis <strong>and</strong> Simulation of Living Systems, Los Alamos NM, Sept. 1987,<br />

Proceedings vol. VI of Santa Fe Institute Studies in the Science of Complexity,<br />

Addison-Wesley, Redwoood City CA


288 References<br />

Langton, C.G. (Ed.) (1994a), Arti cial life III, Proceedings of the Workshop on Arti cial<br />

Life, Santa Fe NM, June 1992, Proceedings vol. XVII of Santa Fe Institute Studies<br />

in the Science of Complexity, Addison-Wesley, Reading MA<br />

Langton, C.G. (Ed.) (1994b), Arti cial life (journal), MIT Press, Cambridge MA<br />

Langton, C.G., C. Taylor, J.D. Former, S. Rasmussen (Eds.) (1992), Arti cial life II,<br />

Proceedings of the Second Interdisciplinary Workshop on the Synthesis <strong>and</strong> Simulation<br />

of Living Systems, Santa Fe NM,Feb. 1990, Proceedings vol. X of Santa Fe<br />

Institute Studies in the Science of Complexity, Addison-Wesley, Reading MA<br />

Lapidus, L., E. Shapiro, S. Shapiro, R.E. Stillman (1961), Optimization of process performance,<br />

AIChE J. 7(2), 288-294<br />

Larichev, O.I., G.G. Gorvits (1974), New approach to comparison of search methods<br />

used in nonlinear programming problems, JOTA 13, 635-659<br />

Larson, R.E., E. Tse (1973), Parallel processing algorithms for the optimal control of<br />

nonlinear dynamic systems, IEEE Trans. C-22, 777-786<br />

Lasdon, L.S. (1970), Conjugate direction methods for optimal control, IEEE Trans.<br />

AC-15, 267-268<br />

Lau ermair, T. (1992a), Hyper achen-Annealing|ein paralleles Optimierungsverfahren<br />

basierend auf selbstorganisierter Musterbildung durch Relaxation auf gekrummten<br />

Hyper achen, Dr. rer. nat. Diss., Technical University of Munich, Department of<br />

Mathematics <strong>and</strong> Computer Science, April 1992<br />

Lau ermair, T. (1992b), Hyperplane annealing <strong>and</strong> activator-inhibitor-systems, in: Manner<br />

<strong>and</strong> M<strong>and</strong>erick (1992), pp. 521-530<br />

Lavi, A., T.P. Vogl (Eds.) (1966), Recent advances in optimization techniques, Wiley,<br />

New York<br />

Lawler, E.L., J.K. Lenstra, A.H.G. Rinooy Kan, D.B. Shmoys (Eds.) (1985), The<br />

travelling salesman problem, a guided tour of combinatorial optimization, Wiley-<br />

Interscience, New York<br />

Lawrence, J.P., III, F.P. Emad (1973), An analytic comparison of r<strong>and</strong>om searching for<br />

the extremum <strong>and</strong> gradient searching of a known objective function, IEEE Trans.<br />

AC-18, 669-671<br />

Lawrence, J.P., III, K. Steiglitz (1972), R<strong>and</strong>omized pattern search, IEEE Trans. C-21,<br />

382-385<br />

LeCam, L.M., J. Neyman (Eds.) (1967), Proceedings of the Vth Berkeley Symposium<br />

on Mathematical Statistics <strong>and</strong> Probability, 1965/66,vol. 4: Biology <strong>and</strong> Problems<br />

of Health, University of California Press, Berkeley CA


References 289<br />

LeCam, L.M., J. Neyman, E.L. Scott (Eds.) (1972), Proceedings of the VIth Berkeley<br />

Symposium on Mathematical Statistics <strong>and</strong> Probability, 1970/71, vol. 5: Darwinian,<br />

Neo-Darwinian <strong>and</strong> Non-Darwinian <strong>Evolution</strong>, University of California Press, Berkeley<br />

CA<br />

Lee, R.C.K. (1964), Optimal estimation, identi cation, <strong>and</strong> control, MIT Press, Cambridge<br />

MA<br />

Lehner, K. (1991), Einsatz wissensbasierter Systeme in der Strukturoptimierung dargestellt<br />

am Beispiel Fachwerkoptimierung, Dr.-Ing. Diss., University ofBochum,<br />

Faculty of Civil Engineering, May 1991<br />

Leibniz, G.W. (1710), Theodicee, 4th rev. ed., Forster, Hannover, 1744<br />

Leitmann, G. (Ed.) (1962), Optimization techniques with applications to aerospace<br />

systems, Academic Press, New York<br />

Leitmann, G. (1964), Einfuhrung in die Theorie optimaler Steuerung und der Di erentialspiele|eine<br />

geometrische Darstellung, Oldenbourg, Munich<br />

Leitmann, G. (Ed.) (1967), Topics in optimization, Academic Press, New York<br />

Lemarechal, C., R. Mi in (Eds.) (1978), Nonsmooth optimization, vol. 3 of IIASA<br />

Proceedings Series, Pergamon Press, Oxford UK<br />

Leon, A. (1966a), A comparison among eight known optimizing procedures, in: Lavi <strong>and</strong><br />

Vogl (1966), pp. 23-46<br />

Leon, A. (1966b), A classi ed bibliography on optimization, in: Lavi <strong>and</strong> Vogl (1966),<br />

pp. 599-649<br />

Lerner, A.Ja., E.A. Rosenman (1973), Optimale Steuerungen, Verlag Technik, Berlin<br />

Lesniak, Z.K. (1970), Methoden der Optimierung von Konstruktionen unter Benutzung<br />

von Rechenautomaten, W. Ernst, Berlin<br />

Levenberg, K. (1944), A method for the solution of certain non-linear problems in least<br />

squares, Quart. Appl. Math. 2, 164-168<br />

Levine, L. (1964), Methods for solving engineering problems using analog computers,<br />

McGraw-Hill, New York<br />

Levine, M.D., T. Vilis (1973), On-line learning optimal control using successive approximation<br />

techniques, IEEE Trans. AC-18, 279-284<br />

Lew, H.S. (1972), An arithmetical approach to the mechanics of blood ow in small<br />

caliber blood vessels, J. Biomech. 5, 49-69


290 References<br />

Ley ner, U. (1974), Uber den Einsatz Linearer Programmierung beim Entwurf optimaler<br />

Leichtbaustabwerke, Dr.-Ing. Diss., Technical University of Berlin, Department of<br />

Transportation Technologies, June 1974<br />

Lill, S.A. (1970), Algorithm 46|a modi ed Davidon method for nding the minimum<br />

of a function, using di erence approximation for derivatives, Comp. J. 13, 111-113<br />

Lill, S.A. (1971), Note on algorithm 46|a modi ed Davidon method, Comp. J. 14, 106<br />

Little, W.D. (1966), Hybrid computer solutions of partial di erential equations by Monte<br />

Carlo methods, AFIPS Conf. Proc. 29, 181-190<br />

Ljapunov, A.A. (Ed.), W. Kammerer, H. Thiele (Eds.) (1964a), Probleme der Kybernetik,<br />

vol. 4, Akademie-Verlag, Berlin<br />

Ljapunov, A.A. (Ed.), W. Kammerer, H. Thiele (Eds.) (1964b), Probleme der Kybernetik,<br />

vol. 5, Akademie-Verlag, Berlin<br />

Locker, A. (Ed.) (1973), Biogenesis - evolution - homeostasis, Springer, Berlin<br />

Loginov, N.V. (1966), Methods of stochastic approximation, ARC 27, 706-728<br />

Lohmann, R. (1992), Structure evolution <strong>and</strong> incomplete induction, in: Manner <strong>and</strong><br />

M<strong>and</strong>erick (1992), pp. 175-185<br />

Lootsma, F.A. (Ed.) (1972a), Numerical methods for non-linear optimization, Academic<br />

Press, London<br />

Lootsma, F.A. (1972b), A survey of methods for solving constrained minimization problems<br />

via unconstrained minimization, in: Lootsma (1972a), pp. 313-347<br />

Lowe, C.W. (1964), Some techniques of evolutionary operation, Trans. Inst. Chem.<br />

Engrs. 42, T334-T344<br />

Lucas, E. (1876), Note sur l'application des series recurrentes a la recherche de la loi de<br />

distribution de nombres premiers, Compt. Rend. Hebdomad. Seances Acad. Sci.<br />

Paris 82, 165-167<br />

Luce, A.D., H. Rai a (1957), Games <strong>and</strong> decisions, Wiley, NewYork<br />

Luenberger, D.G. (1972), Mathematical programming <strong>and</strong> control theory|trends of<br />

interplay, in: Geo rion (1972), pp. 102-133<br />

Luenberger, D.G. (1973), Introduction to linear <strong>and</strong> nonlinear programming, Addison-<br />

Wesley, Reading MA<br />

Machura, M., A. Mulawa (1973), Algorithm 450 (E4)|Rosenbrock function minimization,<br />

CACM 16, 482-483<br />

Madsen, K. (1973), A root- nding algorithm based on Newton's method, BIT 13, 71-75


References 291<br />

Mamen, R., D.Q. Mayne (1972), A pseudo Newton-Raphson method for function minimization,<br />

JOTA 10, 263-277<br />

M<strong>and</strong>ischer, M. (1993), Representation <strong>and</strong> evolution of neural networks, in: Albrecht,<br />

Reeves, <strong>and</strong> Steele (1993), pp. 643-649<br />

Mangasarian, O.L. (1969), Nonlinear programming, McGraw-Hill, New York<br />

Manner, R., B. M<strong>and</strong>erick (Eds.) (1992), Parallel problem solving from nature 2, Proceedings<br />

of the 2nd PPSN Conference, Brussels, Sept. 28-30, 1992, North-Holl<strong>and</strong>,<br />

Amsterdam<br />

Marfeld, A.F. (1970), Kybernetik des Gehirns|ein Kompendium der Grundlagenforschung,<br />

Safari Verlag, Berlin<br />

Markwich, P. (1978), Der thermische Wasserstrahlantrieb auf der Grundlage des o enen<br />

Clausius-Rankine-Prozesses|Konzeption und hydrothermodynamische Analyse,<br />

Dr.-Ing. Diss., Technical University of Berlin, Department ofTransportation<br />

Technologies<br />

Marquardt, D.W. (1963), An algorithm for least-squares estimation of non linear parameters,<br />

SIAM J. 11, 431-441<br />

Marti, K. (1980), On accelerations of the convergence in r<strong>and</strong>om search methods, Methods<br />

of Oper. Res. 37, 391-406<br />

Masters, C.O., H. Drucker (1971), Observations on direct search procedures, IEEE Trans.<br />

SMC-1, 182-184<br />

Matthews, A., D. Davies (1971), A comparison of modi ed Newton methods for unconstrained<br />

optimisation, Comp. J. 14, 293-294<br />

Matyas, J. (1965), R<strong>and</strong>om optimization ARC 26, 244-251<br />

Matyas, J. (1967), Das zufallige Optimierungsverfahren und seine Konvergenz, Proceedings<br />

of the Vth International Analogue Computation Meeting, Lausanne, Aug.-Sept.<br />

1967, vol. 1, pp. 540-544<br />

Max eld, M., A. Callahan, L.J. Fogel (Eds.) (1965), Biophysics <strong>and</strong> cybernetic systems,<br />

Spartan, Washington, DC<br />

Maybach, R.L. (1966), Solution of optimal control problems on a high-speed hybrid<br />

computer, Simulation 9, 238-245<br />

McArthur, D.S. (1961), Strategy in research|alternative methods for design of experiments,<br />

IRE Trans. EM-8, 34-40<br />

McCormick, G P. (1969), Anti-zig-zagging by bending, Mgmt. Sci. 15, 315-320


References 293<br />

Meissinger, H.F., G.A. Bekey (1966), An analysis of continuous parameter identi cation<br />

methods, Simulation 6, 94-102<br />

Meredith, D.L., C.L. Karr, K.K. Kumar (1992), The use of genetic algorithms in the<br />

design of fuzzy-logic controllers, 3rd Workshop on Neural Networks|Academic /<br />

Industrial / Defence (WNN '92), vol. SPIE-1721, pp. 545-555, International Society<br />

of Optical Engineering<br />

Merzenich, W. (1972), Ein einfaches mathematisches <strong>Evolution</strong>smodell, GMD Mitteilungen<br />

21, Bonn<br />

Metropolis, N., A.W. Rosenbluth, M.W. Rosenbluth, A.H. Teller, E. Teller (1953), Equations<br />

of state calculations by fast computing machines, J. Chem. Phys. 21, 1087-<br />

1092<br />

Meyer, H.A. (Ed.) (1956), Symposium on Monte Carlo methods, Wiley, New York<br />

Meyer, J.-A. (Ed.) (1992), Adaptive behavior (journal), MIT Press, Cambridge MA<br />

Meyer, J.-A., H.L. Roitblat, S.W. Wilson (Eds.) (1993), From animals to animats 2,<br />

Proceedings of the 2nd International Conference on Simulation of Adaptive Behavior<br />

(SAB '92), Honolulu HI, Dec. 7-11, 1992, MIT Press, Cambridge MA<br />

Meyer, J.-A., S.W. Wilson (Eds.) (1991), From animals to animats, Proceedings of<br />

the 1st International Conference on Simulation of Adaptive Behavior (SAB), Paris,<br />

Sept. 24-28, 1990, MIT Press, Cambridge MA<br />

Michalewicz, Z. (1992), Genetic algorithms + data structures = evolution programs,<br />

Springer, Berlin<br />

Michalewicz, Z. (1994), Genetic algorithms + data structures = evolution programs, 2nd<br />

ext. ed., Springer, Berlin<br />

Michie, D. (1971), Heuristic search, Comp. J. 14, 96-102<br />

Miele, A. (1969), Variational approach to the gradient method|theory <strong>and</strong> numerical<br />

experiments, in: Zadeh, Neustadt, <strong>and</strong> Balakrishnan (1969b), pp. 143-157<br />

Miele, A., J.W. Cantrell (1969), Study on a memory gradient method for the minimization<br />

of functions, JOTA 3, 459-470<br />

Miele, A., J.W. Cantrell (1970), Memory gradient method for the minimization of functions,<br />

in: Balakrishnan et al. (1970), pp. 252-263<br />

Miele, A., J.N. Damoulakis, J.R. Cloutier, J.L. Tietze (1974), Sequential gradientrestoration<br />

algorithm for optimal control problems with nondi erential constraints,<br />

JOTA 13, 218-255


294 References<br />

Miele, A., H.Y. Huang, J.C. Heidemann (1969), Sequential gradient-restoration algorithm<br />

for the minimization of constrained functions|ordinary <strong>and</strong> conjugate gradient<br />

versions, JOTA 4, 213-243<br />

Miele, A., A.V. Levy, E.E. Cragg (1971), Modi cations <strong>and</strong> extensions of the conjugate<br />

gradient-restoration algorithm for mathematical programming problems, JOTA 7,<br />

450-472<br />

Miele, A., J.L. Tietze, A.V. Levy (1972), Summary <strong>and</strong> comparison of gradient-restoration<br />

algorithms for optimal control problems, JOTA 10, 381-403<br />

Miller, R.E. (1973), A comparison of some theoretical models of parallel computation,<br />

IEEE Trans. C-22, 710-717<br />

Millstein, R.E. (1973), Control structures in Illiac IV Fortran, CACM 16, 621-627<br />

Minot, O.N. (1969), Arti cial intelligence <strong>and</strong> new simulations, Simulation 13, 214-215<br />

Minsky, M. (1961), Steps toward arti cial intelligence, IRE Proc. 49, 8-30<br />

Miranker, W.L. (1969), Parallel methods for approximating the root of a function, IBM<br />

J. Res. Dev. 13, 297-301<br />

Miranker, W.L. (1971), A survey of parallelism in numerical analysis, SIAM Review 13,<br />

524-547<br />

Mitchell, B.A., Jr. (1964), A hybrid analog-digital parameter optimizer for Astrac II,<br />

AFIPS Conf. Proc. 25, 271-285<br />

Mitchell, R.A., J.L. Kaplan (1968), Nonlinear constrained optimization by a non-r<strong>and</strong>om<br />

complex method, NBS J. Res. C, Engng. Instr. 72, 249-258<br />

Mlynski, D. (1964a), Der Wirkungsgrad experimenteller Optimierungsstrategien, Dr.-<br />

Ing. Diss., Technical University (RWTH) of Aachen, Germany, Dec. 1964<br />

Mlynski, D. (1964b), Maximalisierung durch logische Suchprozesse, in: Steinbuch <strong>and</strong><br />

Wagner (1964), pp. 82-94<br />

Mlynski, D. (1966a), Ein Beitrag zur statistischen Theorie der Optimierungsstrategien I<br />

<strong>and</strong> II, Regelungstechnik 14, 209-215 <strong>and</strong> 325-330<br />

Mlynski, D. (1966b), E ciency of experimental strategies for optimising feedbackcontrol<br />

of disturbed processes, Proceedings of the IIIrd IFAC Congress, London, June 1966,<br />

paper 29-G<br />

Mockus, J.B. see also under Motskus, I.B.<br />

Mockus, J.B. (1971), On the optimization of power distribution systems, in: Schwarz<br />

(1971), technical papers, vol. 3, pp. 6.3.2-1 to 6.3.2-14


References 295<br />

Moiseev, N.N. (Ed.) (1970), Colloquium on methods of optimization, Springer, Berlin<br />

Moran, P.A.P. (1967), Unsolved problems in evolutionary theory, in: LeCam <strong>and</strong> Neyman<br />

(1967), pp. 457-480<br />

More, J.J., S.J. Wright (1993), Optimization software guide, vol. 14 of Frontiers in<br />

Applied Mathematics, SIAM, Philadelphia<br />

Morrison, D.D. (1968), Optimization by least squares, SIAM J. Numer. Anal. 5, 83-88<br />

Motskus, I.B. see also under Mockus, J.B.<br />

Motskus, I.B. (1965), Some experiments related to the capabilities of man in solving<br />

multiextremal problems heuristically, Engng. Cybern. 3(3), 40-44<br />

Motskus, I.B. (1967), Mnogoekstremalnye sadachi v projektirovanii, Nauka, Moscow<br />

Motskus, I.B., A.A. Feldbaum (1963), Symposium on multiextremal problems, Trakay,<br />

June 1963, Engng. Cybern. 1(5), 154-155<br />

Movshovich, S.M. (1966), R<strong>and</strong>om search <strong>and</strong> the gradient method in optimization problems,<br />

Engng. Cybern. 4(6), 39-48<br />

Mufti, I.H. (1970), Computational methods in optimal control problems, Springer, Berlin<br />

Mugele, R.A. (1961), A nonlinear digital optimizing program for process control systems,<br />

AFIPS Conf. Proc. 19, 15-32<br />

Mugele, R.A. (1962), A program for optimal control of nonlinear processes, IBM Systems<br />

J. 1, 2-17<br />

Mugele, R.A. (1966), The probe <strong>and</strong> edge theorems for non-linear optimization, in: Lavi<br />

<strong>and</strong> Vogl (1966), pp. 131-144<br />

Muhlenbein, H., D. Schlierkamp-Voosen (1993a), Predictive models for the breeder genetic<br />

algorithm I. Continuous Parameter Optimization, <strong>Evolution</strong>ary Computation<br />

1, 25-49<br />

Muhlenbein, H., D. Schlierkamp-Voosen (1993b), Optimal interaction of mutation <strong>and</strong><br />

crossover in the breeder genetic algorithm, in: Forrest (1993), pp. 648<br />

Muller-Merbach, H. (1971), Operations Research|Methoden und Modelle der Optimalplanung,<br />

2nd ed., F. Vahlen, Berlin<br />

Munson, J.K., A.I. Rubin (1959), Optimization by r<strong>and</strong>om search on the analog computer,<br />

IRE Trans. EC-8, 200-203<br />

Murata, T. (1963), The use of adaptive constrained descent in systems design, University<br />

of Illinois, Coordinated Science Laboratory, report R-189, Urbana IL, Dec. 1963


296 References<br />

Murray, W. (Ed.) (1972a), Numerical methods for unconstrained optimization, Academic<br />

Press, London<br />

Murray, W. (1972b), Second derivative methods, in: Murray (1972a), pp. 57-71<br />

Murray, W. (1972c), Failure, the causes <strong>and</strong> cures, in: Murray (1972a), pp. 107-122<br />

Murtagh, B.A. (1970), A short description of the variable-metric method, in: Abadie<br />

(1970), pp. 525-528<br />

Murtagh, B.A., R.W.H. Sargent (1970), Computational experience with quadratically<br />

convergent minimisation methods, Comp. J. 13, 185-194<br />

Mutseniyeks, V.A., L.A. Rastrigin (1964), Extremal control of continuous multiparameter<br />

systems by the method of r<strong>and</strong>om search, Engng. Cybern. 2(1), 82-90<br />

Myers, G.E. (1968), Properties of the conjugate-gradient <strong>and</strong>Davidon methods, JOTA<br />

2, 209-219<br />

Nachtigall, W. (1971), Biotechnik|statische Konstruktionen in der Natur, Quelle und<br />

Meyer, Heidelberg, Germany<br />

Nachtigall, W. (Ed.) (1992), Technische Biologie und Bionik 1, Proceedings of the 1st<br />

Congress on Bionics, Wiesbaden, June 11-13, 1992, BIONA report 8, G. Fischer,<br />

Stuttgart<br />

Nake, F. (1966), Zerti kat zu Algorithmus 2|Orthonormierung von Vektoren nach E.<br />

Schmidt, Computing 1, 281<br />

Neave, H.R. (1973), On using the Box-Muller transformation with multiplicative congruential<br />

pseudo-r<strong>and</strong>om number generators, Appl. Stat. 22, 92-97<br />

Nelder, J.A., R. Mead (1965), A simplex method for function minimization, Comp. J.<br />

7, 308-313<br />

Nenonen, L.K., B. Pagurek (1969), Conjugate gradient optimization applied to a copper<br />

converter model, Automatica 5, 801-810<br />

Neumann, J. von (1960), Die Rechenmaschine und das Gehirn, Oldenbourg, Munich<br />

Neumann, J. von (1966), Theory of self-reproducing automata, University of Illinois<br />

Press, Urbana-Champaign IL<br />

Neumann, J. von, O. Morgenstern (1961), Spieltheorie und wirtschaftliches Verhalten,<br />

Physica-Verlag, Wurzburg, Germany<br />

Newman, D.J. (1965), Location of the maximum on unimodal surfaces, JACM 12, 395-<br />

398


References 297<br />

Neyman, J. (Ed.) (1951), Proceedings of the IInd Berkeley Symposium on Mathematical<br />

Statistics <strong>and</strong> Probability, 1950, University of California Press, Berkeley CA<br />

Neyman, J. (Ed.) (1956), Proceedings of the IIIrd Berkeley Symposium on Mathematical<br />

Statistics <strong>and</strong> Probability, 1954/55, University of California Press, Berkeley CA<br />

Neyman, J. (Ed.) (1961), Proceedings of the IVth Berkeley Symposium on Mathematical<br />

Statistics <strong>and</strong> Probability, 1960, University of California Press, Berkeley CA<br />

Nickel, K. (1967), Allgemeine Forderungen an einen numerischen Algorithmus, ZAMM<br />

47(Sonderheft), T67-T68<br />

Nickel, K., K. Ritter (1972), Termination criterion <strong>and</strong> numerical convergence, SIAM J.<br />

Numer. Anal. 9, 277-283<br />

Niederreiter, H. (1992), R<strong>and</strong>om number generation <strong>and</strong> quasi-Monte Carlo methods,<br />

vol. 63 of CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM,<br />

Philadelphia<br />

Niemann, H. (1974), Methoden der Mustererkennung, Akademische Verlagsgesellschaft,<br />

Frankfort/Main<br />

Nikolic, Z.J., K.S. Fu (1966), An algorithm for learning without external supervision <strong>and</strong><br />

its application to learning control systems, IEEE Trans. AC-11, 414-442<br />

Nissen, V. (1993), <strong>Evolution</strong>ary algorithms in management science, report 9303 of the<br />

European Study Group for <strong>Evolution</strong>ary Economics<br />

Nissen, V. (1994), <strong>Evolution</strong>are Algorithmen|Darstellung, Beispiele, betriebswirtschaftliche<br />

Anwendungsmoglichkeiten, DUV Deutscher Universitatsverlag, Wiesbaden<br />

Norkin, K.B. (1961), On one method for automatic search for the extremum of a function<br />

of many variables, ARC 22, 534-538<br />

North, M. (1980), Time-dependent stochastic model of oods, Proc. ASCE, J. Hydraulics<br />

Div. 106-HY5, 649-665<br />

Nurminski, E.A. (Ed.) (1982), Progress in nondi erentiable optimization, IIASA Collaborative<br />

Proceedings Series CP-82-58, International Institute for Applied Systems<br />

Analysis, Laxenburg, Austria<br />

Odell, P.L. (1961), An empirical study of three stochastic approximation techniques<br />

applicable to sensitivity testing, report NAVWEPS-7837<br />

Oestreicher, H.L., D.R. Moore (Eds.) (1968), Cybernetic problems in bionics, Gordon<br />

Breach, New York<br />

Oi, K., H. Sayama, T. Takamatsu (1973), Computational schemes of the Davidon-<br />

Fletcher-Powell method in in nite-dimensional space, JOTA 12, 447-458


298 References<br />

Oldenburger, R. (Ed.) (1966), Optimal <strong>and</strong> self optimizing control, MIT Press, Cambridge<br />

MA<br />

Oliver, L.T., D.J. Wilde (1964), Symmetric sequential minimax search for a maximum,<br />

Fibonacci Quart. 2, 169-175<br />

O'Neill, R. (1971), Algorithm AS 47|function minimization using a simplex procedure,<br />

Appl. Stat. 20, 338-345<br />

Opacic, J. (1973), A heuristic method for nding most extrema of a nonlinear functional,<br />

IEEE Trans. SMC-3, 102-107<br />

Oren, S.S. (1973), Self-scaling variable metric algorithms without line search for unconstrained<br />

minimization, Math. Comp. 27, 873-885<br />

Ortega, J.M., W.C. Rheinboldt (1967), Monotone iterations for nonlinear equations with<br />

application to Gauss-Seidel methods, SIAM J. Numer. Anal. 4, 171-190<br />

Ortega, J.M., W.C. Rheinboldt (1970), Iterative solution of nonlinear equations in several<br />

variables, Academic Press, New York<br />

Ortega, J.M., W.C. Rheinboldt (1972), A general convergence result for unconstrained<br />

minimization methods, SIAM J. Numer. Anal. 9, 40-43<br />

Ortega, J.M., M.L. Rocko (1966), Nonlinear di erence equations <strong>and</strong> Gauss-Seidel type<br />

iterative methods, SIAM J. Numer. Anal. 3, 497-513<br />

Osborne, M.R. (1972), Some aspects of nonlinear least squares calculations, in: Lootsma<br />

(1972a), pp. 171-189<br />

Osche, G. (1972), <strong>Evolution</strong>|Grundlagen, Erkenntnisse, Entwicklungen der Abstammungslehre,<br />

Herder, Freiburg, Germany<br />

Ostermeier, A. (1992), An evolution strategy with momentum adaptation of the r<strong>and</strong>om<br />

number distribution, in: Manner <strong>and</strong> M<strong>and</strong>erick (1992), pp. 197-206<br />

Ostrowski, A.M. (1966), Solution of equations <strong>and</strong> systems of equations, 2nd ed., Academic<br />

Press, New York<br />

Ostrowski, A.M. (1967), Contributions to the theory of the method of steepest descent,<br />

Arch. Ration. Mech. Anal. 26, 257-280<br />

Overholt, K.J. (1965), An instability in the Fibonacci <strong>and</strong> golden section search methods,<br />

BIT 5, 284-286<br />

Overholt, K.J. (1967a), Note on algorithm 2|Fibonacci search, <strong>and</strong> algorithm 7|Minx,<br />

<strong>and</strong> the golden section search, Comp. J. 9, 414<br />

Overholt, K.J. (1967b), Algorithm 16|Gold, Comp. J. 9, 415


References 299<br />

Overholt, K.J. (1967c), Algorithm 17|Goldsec, Comp J. 9, 415<br />

Overholt, K.J. (1973), E ciency of the Fibonacci search method, BIT 13, 92-96<br />

Page, S.E., D.W. Richardson (1992), Walsh functions, schema variance, <strong>and</strong> deception,<br />

Complex Systems 6, 125-135<br />

Pagurek, B., C.M. Woodside (1968), The conjugate gradient method for optimal control<br />

problems with bounded control variables, Automatica 4, 337-349<br />

Palmer, J.R. (1969), An improved procedure for orthogonalising the search vectors in<br />

Rosenbrock's <strong>and</strong> Swann's direct search optimisation methods, Comp. J. 12, 69-71<br />

Papageorgiou, M. (1991), Optimierung|Statische, dynamische, stochastische Verfahren<br />

fur die Anwendung, Oldenbourg, Munich<br />

Papentin, F. (1972), A Darwinian evolutionary system, Dr. rer. nat. Diss., University<br />

of Tubingen, Germany<br />

Pardalos, P.M., J.B. Rosen (1987), Constrained global optimization|algorithms <strong>and</strong><br />

applications, vol. 268 of Lecture Notes in Computer Science, Springer, Berlin<br />

Parkinson, J.M., D. Hutchinson (1972a), A consideration of non-gradient algorithms for<br />

the unconstrained optimization of functions of high dimensionality, in: Lootsma<br />

(1972a), pp. 99-113<br />

Parkinson, J.M., D. Hutchinson (1972b), An investigation into the e ciency of variants<br />

on the simplex method, in: Lootsma (1972a), pp. 115-135<br />

Pask, G. (1962), Physical <strong>and</strong> linguistic evolution in self-organizing systems, Proceedings<br />

of the 1st IFAC Symposium on Optimization <strong>and</strong> Adaptive Control, Rome, April<br />

1962, pp. 199-227<br />

Pask, G. (1971), A cybernetic experimental method <strong>and</strong> its underlying philosophy, Int'l<br />

J. Man-Machine Stud. 3, 279-337<br />

Patrick, M.L. (1972), A highly parallel algorithm for approximating all zeros of a polynomial<br />

with only real zeros, CACM 15, 952-955<br />

Pattee, H.H., E.A. Edelsack, L. Fein, A.B. Callahan (Eds.) (1966), Natural automata<br />

<strong>and</strong> useful simulations, Spartan, Washington, DC<br />

Paviani, D.A., D.M. Himmelblau (1969), Constrained nonlinear optimization by heuristic<br />

programming, Oper. Res. 17, 872-882<br />

Pearson, J.D. (1969), Variable metric methods of minimization, Comp. J. 12, 171-178<br />

Peckham, G. (1970), A new method for minimising a sum of squares without calculating<br />

gradients, Comp. J. 13, 418-420


300 References<br />

Peschel, M. (1980), Ingenieurtechnische Entscheidungen|Modellbildung und Steuerung<br />

mit Hilfe der Polyoptimierung, Verlag Technik, Berlin<br />

Peters, E. (1989), OptimiEst|an optimizing expert system using topologies, in: Brebbia<br />

<strong>and</strong> Hern<strong>and</strong>ez (1989), pp. 222-232<br />

Peters, E. (1991), Ein Beitrag zur wissensbasierten Auswahl und Steuerung von Optimierverfahren,<br />

Dr. rer. nat. Diss., University of Dortmund, Department of Computer<br />

Science, May 1991<br />

Pierre, D.A. (1969), Optimization theory with applications, Wiley, New York<br />

Pierson, B.L., S.G. Rajtora (1970), Computational experience with the Davidon method<br />

applied to optimal control problems, IEEE Trans. SSC-6, 240-242<br />

Pike, M.C., I.D. Hill, F.D. James (1967), Note on algorithm 2|Fibonacci search, <strong>and</strong><br />

on algorithm 7|Minx, <strong>and</strong> algorithm 2 modi ed|Fibonacci search, Comp. J. 9,<br />

416-417<br />

Pike, M.C., J. Pixner (1965), Algorithm 2|Fibonacci search, Comp. Bull. 8, 147<br />

Pincus, M. (1970), A Monte Carlo method for the approximate solution of certain types<br />

of constrained optimization problems, Oper. Res. 18, 1225-1228<br />

Pinkham, R.S. (1964), R<strong>and</strong>om root location, SIAM J. 12, 855-864<br />

Pinsker, I.Sh., B.M. Tseitlin (1962), A nonlinear optimization problem, ARC 23, 1510-<br />

1518<br />

Plane, D.R., C. McMillan, Jr. (1971), Discrete optimization|integer programming <strong>and</strong><br />

network analysis for management decisions, Prentice-Hall, Englewood Cli s NJ<br />

Plaschko, P., K. Wagner (1973), <strong>Evolution</strong>s-Linearisierungs-Programm zur Darstellung<br />

von numerischen Daten durchbeliebigeFunktionen, report DLR-FB-73-55, DFVLR<br />

Porz-Wahn, Germany<br />

Pluznikov, L.N., V.O. Andreyev, E.S. Klimenko (1971), Use of r<strong>and</strong>om search method<br />

in industrial planning, Engng. Cybern. 9, 229-235<br />

Polak, E. (1971), Computational methods in optimization|a uni ed approach, Academic<br />

Press, New York<br />

Polak, E. (1972), A survey of methods of feasible directions for the solution of optimal<br />

control problems, IEEE Trans. AC-17, 591-596<br />

Polak, E. (1973), An historical survey of computational methods in optimal control,<br />

SIAM Review 15, 553-584<br />

Polak, E., G. Ribiere (1969), Note sur la convergence de methodes de directions conjuguees,<br />

Rev. Franc. Inf. Rech. Oper. 3(16), 35-43


References 301<br />

Polyak, B.T. (1969), The conjugate gradient method in extremal problems, USSR Comp.<br />

Math. <strong>and</strong> Math. Phys. 9(4), 94-112<br />

Ponstein J. (1967), Seven kinds of convexity, SIAM Review 9, 115-119<br />

Pontrjagin, L.S., V.G. Boltjanskij, R.V. Gamkrelidze, E.F. Miscenko (1967), Mathematische<br />

Theorie optimaler Prozesse, 2nd ed., Oldenbourg, Munich<br />

Powell, D.R., J.R. MacDonald (1972), A rapidly convergent iterative method for the<br />

solution of the generalized nonlinear least squares problem, Comp. J. 15, 148-155<br />

Powell, M.J.D. (1962), An iterative method for nding stationary values of a function of<br />

several variables, Comp. J. 5, 147-151<br />

Powell, M.J.D. (1964), An e cient method for nding the minimum of a function of<br />

several variables without calculating derivatives, Comp. J. 7, 155-162<br />

Powell, M.J.D. (1965), A method for minimizing a sum of squares of nonlinear functions<br />

without calculating derivatives, Comp. J. 7, 303-307<br />

Powell, M.J.D. (1966), Minimization of functions of several variables, in: Walsh (1966),<br />

pp. 143-158<br />

Powell, M.J.D. (1968a), On the calculation of orthogonal vectors, Comp. J. 11, 302-304<br />

Powell, M.J.D. (1968b), A Fortran subroutine for solving systems of non-linear algebraic<br />

equations, UKAEA Research Group, report AERE-R-5947, Harwell, Oxon<br />

Powell, M.J.D. (1969), A theorem on rank one modi cations to a matrix <strong>and</strong> its inverse,<br />

Comp. J. 12, 288-290<br />

Powell, M.J.D. (1970a), Rank one methods for unconstrained optimization, in: Abadie<br />

(1970), pp. 139-156<br />

Powell, M.J.D. (1970b), A survey of numerical methods for unconstrained optimization,<br />

SIAM Review 12, 79-97<br />

Powell, M.J.D. (1970c), A Fortran subroutine for unconstrained minimization, requiring<br />

rst derivatives of the objective function, UKAEA Research Group, report AERE-<br />

R-6469, Harwell, Oxon<br />

Powell, M.J.D. (1970d), A hybrid method for nonlinear equations, in: Rabinowitz (1970),<br />

pp. 87-114<br />

Powell, M.J.D. (1970e), A Fortran subroutine for solving systems of nonlinear algebraic<br />

equations, in: Rabinowitz (1970), pp. 115-161<br />

Powell, M.J.D. (1970f), Subroutine VA04A (Fortran), updated May 20th, 1970, in: Hopper<br />

(1971), p. 72


302 References<br />

Powell, M.J.D. (1970g), Recent advances in unconstrained optimization, UKAEA Research<br />

Group, technical paper AERE-TP-430, Harwell, Oxon, Nov. 1970<br />

Powell, M.J.D. (1971), On the convergence of the variable metric algorithm, JIMA 7,<br />

21-36<br />

Powell, M.J.D. (1972a), Some properties of the variable metric algorithm, in: Lootsma<br />

(1972a), pp. 1-17<br />

Powell, M.J.D. (1972b), Quadratic termination properties of minimization algorithms<br />

I|Statement <strong>and</strong> discussion of results, JIMA 10, 333-342<br />

Powell, M.J.D. (1972c), Quadratic termination properties of minimization algorithms<br />

II|Proofs <strong>and</strong> theorems, JIMA 10, 343-357<br />

Powell, M.J.D. (1972d), A survey of numerical methods for unconstrained optimization,<br />

in: Geo rion (1972), pp. 3-21<br />

Powell, M.J.D. (1972e), Problems related to unconstrained optimization, in: Murray<br />

(1972a), pp. 29-55<br />

Powell, M.J.D. (1972f), Unconstrained minimization algorithms without computation<br />

of derivatives, UKAEA Research Group, technical paper AERE-TP-483, Harwell,<br />

Oxon, April 1972<br />

Poznyak, A.S. (1972), Use of learning automata for the control of r<strong>and</strong>om search, ARC<br />

33, 1992-2000<br />

Press, W.H., S.A. Teukolsky, W.T. Vetterling, F.B. Flannery (1992), Numerical recipes<br />

in Fortran, 2nd ed., Cambridge University Press, Cambridge UK (especially Chap.<br />

7, R<strong>and</strong>om numbers, pp. 266-319)<br />

Prusinkiewicz, P., A. Lindenmayer (1990), The algorithmic beauty ofplants|the virtual<br />

laboratory, Springer, Berlin<br />

Pugachev, V.N. (1970), Determination of the characteristics of complex systems using<br />

statistical trials <strong>and</strong> analytical investigation, Engng. Cybern. 8, 1109-1117<br />

Pugh, E.L. (1966), A gradient technique of adaptive Monte Carlo, SIAM Review 8,<br />

346-355<br />

Pun, L. (1969), Introduction to optimization practice, Wiley, New York<br />

Rabinowitz, P. (Ed.) (1970), Numerical methods for nonlinear algebraic equations, Gordon<br />

Breach, London<br />

Ralston, A., H.S. Wilf (Eds.) (1967), Mathematische Methoden fur Digitalrechner, Oldenbourg,<br />

Munich


References 303<br />

Ralston, A., H.S. Wilf (Eds.) (1969), Mathematische Methoden fur Digitalrechner II,<br />

Oldenbourg, Munich<br />

Rappl, G. (1984), Konvergenzraten von R<strong>and</strong>om-Search-Verfahren zur globalen Optimierung,<br />

Dr. rer. nat. Diss., Hochschule der Bundeswehr, Munich-Neubiberg,<br />

Department of Computer Science, Nov. 1984<br />

Rastrigin, L.A. (1960), Extremal control by the method of r<strong>and</strong>om scanning, ARC 21,<br />

891-896<br />

Rastrigin, L.A. (1963), The convergence of the r<strong>and</strong>om search method in the extremal<br />

control of a many-parameter system, ARC 24, 1337-1342<br />

Rastrigin, L.A. (1965a), Sluchainyi poisk v zadachakh optimisatsii mnogoparametricheskikh<br />

sistem, Zinatne, Riga, (for a translation into English see next item)<br />

Rastrigin, L.A. (1965b), R<strong>and</strong>om search in optimization problems for multiparameter<br />

systems, Air Force System Comm<strong>and</strong>, Foreign Technical Division, FTD-HT-67-363<br />

Rastrigin, L.A. (1966), Stochastic methods of complicated multi-parameter system optimization,<br />

Proceedings of the IIIrd IFAC Congress, London, June 1966, paper 3-F<br />

Rastrigin, L.A. (1967), Raboty po teorii i primeneniyu statisticheskikh metodov optimisatsii<br />

v institute elektroniki i vychislitelnoi tekhniki Akademii Nauk Latviiskoi<br />

SSR, Avtomatika iVychislitelnaya Tekhnika (1967, 5), 31-40<br />

Rastrigin, L.A. (1968), Statisticheskiye metody poiska, Nauka, Moscow<br />

Rastrigin, L.A. (1969), Teorija i primenenije sluchainovo poiska, Zinatne, Riga<br />

Rastrigin, L.A. (1972), Adaptivnye sistemy, vol. 1, Zinatne, Riga<br />

Rauch, S.W. (1973), A convergence theory for a class of nonlinear programming problems,<br />

SIAM J. Numer. Anal. 10, 207-228<br />

Rawlins, G.J.E. (Ed.) (1991), Foundations of genetic algorithms, Morgan Kaufmann,<br />

San Mateo CA<br />

Rechenberg, I. (1964), Cybernetic solution path of an experimental problem, Royal Aircraft<br />

Establishment, Library Translation 1122, Farnborough, Hants, Aug. 1965,<br />

English translation of the unpublished written summary of the lecture \Kybernetische<br />

Losungsansteuerung einer experimentellen Forschungsaufgabe", delivered at<br />

the joint annual meeting of the WGLR <strong>and</strong> DGRR, Berlin, 1964<br />

Rechenberg, I. (1973), <strong>Evolution</strong>sstrategie|Optimierung technischer Systeme nach Prinzipien<br />

der biologischen <strong>Evolution</strong>, Frommann-Holzboog, Stuttgart<br />

Rechenberg, I. (1978), <strong>Evolution</strong>sstrategien, in: Schneider <strong>and</strong> Ranft (1978), pp. 83-114


304 References<br />

Rechenberg, I. (1989), <strong>Evolution</strong> strategy|nature's way of optimization, in: Bergmann<br />

(1989), pp. 106-126<br />

Rechenberg, I. (1994), <strong>Evolution</strong>sstrategie '94, Frommann-Holzboog, Stuttgart<br />

Rein, H., M. Schneider (1971), Einfuhrung in die Physiologie des Menschen, Springer,<br />

Berlin<br />

Rhead, D.G. (1971), Some numerical experiments on Zangwill's method for unconstrained<br />

minimization, University of London, Institute of Computer Science, working<br />

paper ICSI-319<br />

Ribiere, G. (1970), Sur la methodedeDavidon-Fletcher-Powell pour la minimisation des<br />

fonctions, Mgmt. Sci. 16, 572-592<br />

Rice, J.R. (1966), Experiments on Gram-Schmidt orthogonalization, Math. Comp. 20,<br />

325-328<br />

Richardson, J.A., J.L. Kuester (1973), Algorithm 454 (E4)|the complex method for<br />

constrained optimization, CACM 16, 487-489<br />

Riedl, R. (1976), Die Strategie der Genesis, Piper, Munich<br />

Robbins, H., S. Monro (1951), A stochastic approximation method, Ann. Math. Stat.<br />

22, 400-407<br />

Roberts, P.D., R.H. Davis (1969), Conjugate gradients, Control 13, 206-210<br />

Roberts, S.M., H.I. Lyvers (1961), The gradient method in process control, Ind. Engng.<br />

Chem. 53, 877-882<br />

Rodlo , R.K. (1976), Bestimmung der Geschwindigkeit von Versetzungsgruppen in neutronen-bestrahlten<br />

Kupfer-Einkristallen, Dr. rer. nat. Diss., Technical University<br />

of Braunschweig, Germany, Sept. 1976<br />

Rosen, J.B. (1960), The gradient projection method for nonlinear programming I|<br />

Linear constraints, SIAM J. 8, 181-217<br />

Rosen, J.B. (1961), The gradient projection method for nonlinear programming II|<br />

Nonlinear constraints, SIAM J. 9, 514-532<br />

Rosen, J.B. (1966), Iterative solution of nonlinear optimal control problems, SIAM J.<br />

Contr. 4, 223-244<br />

Rosen, J.B., O.L. Mangasarian, K. Ritter (Eds.) (1970), Nonlinear programming, Academic<br />

Press, New York<br />

Rosen, J.B., S. Suzuki (1965), Construction of nonlinear programming test problems,<br />

CACM 8, 113


References 305<br />

Rosen, R. (1967), Optimality principles in biology, Butterworths, London<br />

Rosenblatt, F. (1958), The perceptron|a probabilistic model for information storage<br />

<strong>and</strong> organization in the brain, Psychol. Rev. 65, 386-408<br />

Rosenbrock, H.H. (1960), An automatic method for nding the greatest or least value<br />

of a function, Comp. J. 3, 175-184<br />

Rosenbrock, H.H., C. Storey (1966), Computational techniques for chemical engineers,<br />

Pergamon Press, Oxford UK<br />

Ross, G.J.S. (1971), The e cient use of function minimization in non-linear maximumlikelihood<br />

estimation, Appl. Stat. 19, 205-221<br />

Rothe, R. (1959), Hohere Mathematik fur Mathematiker, Physiker und Ingenieure, I|<br />

Di erentialrechnung und Grundformeln der Integralrechnung nebst Anwendungen,<br />

18th ed., Teubner, Leipzig, Germany<br />

Roughgarden, J.W. (1979), Theory of population genetics <strong>and</strong> evolutionary ecology,<br />

Macmillan, New York<br />

Rozvany, G. (Ed.) (1994), J. on Structural Optimization, Springer, Berlin<br />

Rudolph, G. (1991), Global optimization by means of distributed evolution strategies,<br />

in: Schwefel <strong>and</strong> Manner (1991), pp. 209-213<br />

Rudolph, G. (1992a), On Correlated mutation in evolution strategies, in: Manner <strong>and</strong><br />

M<strong>and</strong>erick (1992), pp. 105-114<br />

Rudolph, G. (1992b), Parallel approaches to stochastic global optimization, in: Joosen<br />

<strong>and</strong> Milgrom (1992), pp. 256-267<br />

Rudolph, G. (1993), Massively parallel simulated annealing <strong>and</strong> its relation to evolutionary<br />

algorithms, <strong>Evolution</strong>ary Computation 1(4), 361-383<br />

Rudolph, G. (1994a), Convergence analysis of canonical genetic algorithms, IEEE Trans.<br />

NN-5, 96-101<br />

Rudolph, G. (1994b), An evolutionary algorithm for integer programming, in: Davidor,<br />

Schwefel, <strong>and</strong> Manner (1994), pp. 139-148<br />

Rutishauser, H. (1966), Algorithmus 2|Orthonormierung von Vektoren nachE.Schmidt,<br />

Computing 1, 159-161<br />

Rybashov, M.V. (1965a), The gradient method of solving convex programming problems<br />

on electronic analog computers, ARC 26, 1886-1898<br />

Rybashov, M.V. (1965b), Gradient method of solving linear <strong>and</strong> quadratic programming<br />

problems on electronic analog computers, ARC 26, 2079-2089


306 References<br />

Rybashov, M.V. (1969), Insensitivity of gradient systems in the solution of linear problems<br />

on analog computers, ARC 30, 1679-1687<br />

Ryshik, I.M., I.S. Gradstein (1963), Summen-, Produkt- und Integraltafeln, 2nd ed.,<br />

Deutscher Verlag der Wissenschaften, Berlin<br />

Saaty, T.L. (1955), The number of vertices of a polyhedron, Amer. Math. Monthly 62,<br />

326-331<br />

Saaty, T.L. (1963), A conjecture concerning the smallest bound on the iterations in linear<br />

programming, Oper. Res. 11, 151-153<br />

Saaty, T.L. (1970), Optimization in integers <strong>and</strong> related extremal problems, McGraw-<br />

Hill, New York<br />

Saaty, T.L., J. Bram (1964), Nonlinear mathematics, McGraw-Hill, New York<br />

Sacks, J. (1958), Asymptotic distribution of stochastic approximation procedures, Ann.<br />

Math. Stat. 29, 373-405<br />

Sameh, A.H. (1971), On Jacobi <strong>and</strong> Jacobi-like algorithms for a parallel computer, Math.<br />

Comp. 25, 579-590<br />

Samuel, A.L. (1963), Some studies in machine learning using the game of checkers, in:<br />

Feigenbaum <strong>and</strong> Feldman (1963), pp. 71-105<br />

Sargent, R.W.H., D.J. Sebastian (1972), Numerical experience with algorithms for unconstrained<br />

minimization, in: Lootsma (1972a), pp. 45-68<br />

Sargent, R.W.H., D.J. Sebastian (1973), On the convergence of sequential minimization<br />

algorithms, JOTA 12, 567-575<br />

Saridis, G.N. (1968), Learning applied to successive approximation algorithms, Proceedings<br />

of the 1968 Joint Automatic Control Conference, Ann Arbor MI, pp. 1007-1013<br />

Saridis, G.N. (1970), Learning applied to successive approximation algorithms, IEEE<br />

Trans. SSC-6, 97-103<br />

Saridis, G.N., H.D. Gilbert (1970), Self-organizing approach to the stochastic fuel regulator<br />

problem, IEEE Trans. SSC-6, 186-191<br />

Satterthwaite, F.E. (1959a), REVOP or r<strong>and</strong>om evolutionary operation, Merrimack College,<br />

report 10-10-59, North Andover MA<br />

Satterthwaite, F.E. (1959b), R<strong>and</strong>om balance experimentation, Technometrics 1, 111-<br />

137<br />

Satterthwaite, F.E., D. Shainin (1959), Pinpoint important process variable with polyvariable<br />

experimentation, J. Soc. Plast. Engrs. 15, 225-230


References 307<br />

Savage, J.M. (1966), <strong>Evolution</strong>, Bayerischer L<strong>and</strong>wirtschafts-Verlag, Munich<br />

Sawaragi, Y., T. Takamatsu, K. Fukunaga, E. Nakanishi, H. Tamura (1971), Dynamic<br />

version of steady state optimizing control of distillation column by trial method,<br />

Automatica 7, 509-516<br />

Scha er, J.D. (Ed.) (1989), Proceedings of the 3rd International Conference on Genetic<br />

Algorithms, George Mason University, Fairfax VA, June 4-7, 1989, Morgan<br />

Kaufmann, San Mateo CA<br />

Schechter, R.S. (1962), Iteration methods for nonlinear problems, Trans. Amer. Math.<br />

Soc. 104, 179-189<br />

Schechter, R.S. (1968), Relaxation methods for convex problems, SIAM J. Numer. Anal.<br />

5, 601-612<br />

Schechter, R.S. (1970), Minimization of a convex function by relaxation, in: Abadie<br />

(1970), pp. 177-190<br />

Schee er, L. (1886), Uber die Bedeutung der Begri e \Maximum und Minimum"inder<br />

Variationsrechnung, Mathematische Annalen 26, 197-208<br />

Scheel, A. (1985), Beitrag zur Theorie der <strong>Evolution</strong>sstrategie, Dr.-Ing. Diss., Technical<br />

University of Berlin, Department of Process Engineering<br />

Scheuer, E.M., D.S. Stoller (1962), On the generation of normal r<strong>and</strong>om vectors, Technometrics<br />

4, 278-281<br />

Schinzinger, R. (1966), Optimization in electromagnetic system design, in: Lavi <strong>and</strong><br />

Vogl (1966), pp. 163-214<br />

Schittkowski, K. (1980), Nonlinear programming codes, vol. 183 of Lecture Notes in<br />

Economics <strong>and</strong> Mathematical Systems, Springer, Berlin<br />

Schley, C.H., Jr. (1968), Conjugate gradient methods for optimization, General Electric<br />

Research <strong>and</strong> Development Center, report 68-C-008, Schenectady NY, Jan. 1968<br />

Schmalhausen, I.I. (1964), Grundlagen des <strong>Evolution</strong>sprozesses vom kybernetischen<br />

St<strong>and</strong>punkt, in: Ljapunov, Kammerer, <strong>and</strong> Thiele (1964a), pp. 151-188<br />

Schmetterer, L. (1961), Stochastic approximation, in: Neyman (1961), vol. 1, pp. 587-<br />

609<br />

Schmidt, J.W., H. Schwetlick (1968), Ableitungsfreie Verfahren mit hoherer Konvergenzgeschwindigkeit,<br />

Computing 3, 215-226<br />

Schmidt, J.W., H.F. Trinkaus (1966), Extremwertermittlung mit Funktionswerten bei<br />

Funktionen von mehreren Ver<strong>and</strong>erlichen, Computing 1, 224-232


308 References<br />

Schmidt, J.W., K. Vetters (1970), Ableitungsfreie Verfahren fur nichtlineare Optimierungsprobleme,<br />

Numer. Math. 15, 263-282<br />

Schmitt, E. (1969), Adaptive computer algorithms for optimization <strong>and</strong> root- nding,<br />

NTZ-report 6, VDE Verlag, Berlin<br />

Schneider, B., U. Ranft (Eds.) (1978), Simulationsmethoden in der Medizin und Biologie,<br />

Springer, Berlin<br />

Schrack, G., N. Borowski (1972), An experimental comparison of three r<strong>and</strong>om searches,<br />

in: Lootsma (1972a), pp. 137-147<br />

Schumer, M.A. (1967), Optimization by adaptive r<strong>and</strong>om search, Ph.D. thesis, Princeton<br />

University, Princeton NJ, Nov. 1967<br />

Schumer, M.A. (1969), Hill climbing on a sample function of a Gaussian Markov process,<br />

JOTA 4, 413-418<br />

Schumer, M.A., K. Steiglitz (1968), Adaptive step size r<strong>and</strong>om search, IEEE Trans.<br />

AC-13, 270-276<br />

Schuster, P. (1972), Vom Makromolekul zur primitiven Zelle|die Entstehung biologischer<br />

Funktion, Chemie in unserer Zeit 6(1), 1-16<br />

Schwarz, H. (Ed.) (1971), Multivariable technical control systems, North-Holl<strong>and</strong>, Amsterdam<br />

Schwarz, H.R., H. Rutishauser, E. Stiefel (1968), Numerik symmetrischer Matrizen,<br />

Teubner, Stuttgart<br />

Schwefel, D. et al. (1972), Gesundheitsplanung im Departamento del Valle del Cauca,<br />

report of the German Development Institute, Berlin, July 1972<br />

Schwefel, H.-P. (1968), Experimentelle Optimierung einer Zweiphasenduse Teil I, report<br />

35 for the project MHD-Staustrahlrohr, AEG Research Institute, Berlin, Oct. 1968<br />

Schwefel, H.-P. (1974), Adaptive Mechanismen in der biologischen <strong>Evolution</strong> und ihr<br />

Ein u auf die <strong>Evolution</strong>sgeschwindigkeit, Internal report of the Working Group<br />

of Bionics <strong>and</strong> <strong>Evolution</strong> Techniques at the Institute for Measurement <strong>and</strong>Control<br />

Technology,Technical University of Berlin, Department of Process Engineering, July<br />

1974<br />

Schwefel, H.-P. (1975a), <strong>Evolution</strong>sstrategie und numerische Optimierung, Dr.-Ing. Diss.,<br />

Technical University of Berlin, Department of Process Engineering<br />

Schwefel, H.-P. (1975b), Binare Optimierung durch somatische Mutation, Internal report<br />

of the Working Group of Bionics <strong>and</strong> <strong>Evolution</strong> Techniques at the Institute<br />

of for Measurement <strong>and</strong> Control Technology, Technical University of Berlin (<strong>and</strong><br />

the Central Animal Laboratory of the Medical High School of Hannover, SFB 146<br />

Versuchstierforschung of the Veterinary High School of Hannover), May 1975


References 309<br />

Schwefel, H.-P. (1980), Subroutines EVOL, GRUP, KORR|Listings <strong>and</strong> User's Guides,<br />

Internal report of the Programme Group of Systems Analysis <strong>and</strong> Technological Development,<br />

KFA-STE-IB-2/80, April 1980, Nuclear Research Center (KFA) Julich,<br />

Germany<br />

Schwefel, H.-P. (1981), <strong>Optimum</strong> <strong>Seeking</strong> Methods|User's Guides, Internal report of<br />

the Programme Group of Systems Analysis <strong>and</strong> Technological Development, KFA-<br />

STE-IB-7/81, Oct. 1981, Nuclear Research Center (KFA) Julich, Germany<br />

Schwefel, H.-P. (1987), Collective phenomena in evolutionary systems, in: Checkl<strong>and</strong><br />

<strong>and</strong> Kiss (1987), vol. 2, pp. 1025-1033<br />

Schwefel, H.-P. (1988), Towards large-scale long-term systems analysis, in: Cheng (1988),<br />

pp. 375-381<br />

Schwefel, H.-P., F. Kursawe (1992), Kunstliche <strong>Evolution</strong> als Modell fur naturliche Intelligenz,<br />

in: Nachtigall (1992), pp. 73-91<br />

Schwefel, H.-P., R. Manner (Eds.) (1991), Parallel problem solving from nature, Proceedings<br />

of the 1st PPSN Workshop, Dortmund, Oct. 1-3, 1990, vol. 496 of Lecture<br />

Notes in Computer Science, Springer, Berlin<br />

Schwetlick, H. (1970), Algorithmus 12|Ein ableitungsfreies Verfahren zur Losung endlich-dimensionaler<br />

Gleichungssysteme, Computing 5, 82-88 <strong>and</strong> 393<br />

Sebald, A.V., L.J. Fogel (Eds.) (1994), Proceedings of the 3rd Annual Conference on<br />

<strong>Evolution</strong>ary Programming, San Diego CA, Feb. 24-26, 1994, World Scienti c,<br />

Singapore<br />

Sebastian, H.-J., K. Tammer (Eds.) (1990), System Modelling <strong>and</strong> Optimization, vol.<br />

143 of Lecture Notes in Control <strong>and</strong> Information Sciences, Springer, Berlin<br />

Sergiyevskiy, G.M., A.P. Ter-Saakov (1970), Factor experiments in many-dimensional<br />

stochastic approximation of an extremum, Engng. Cybern. 8, 949-954<br />

Shah, B.V., R.J. Buehler, O. Kempthorne (1964), Some algorithms for minimizing a<br />

function of several variables, SIAM J. 12, 74-92<br />

Shanno, D.F. (1970a), Parameter selection for modi ed Newton methods for function<br />

minimization, SIAM J. Numer. Anal. 7, 366-372<br />

Shanno, D.F. (1970b), Conditioning of quasi-Newton methods for function minimization,<br />

Math. Comp. 24, 647-656<br />

Shanno, D.F., P.C. Kettler (1970), Optimal conditioning of quasi-Newton methods,<br />

Math. Comp. 24, 657-664<br />

Shapiro, I.J., K.S. Narendra (1969), Use of stochastic automata for parameter selfoptimization<br />

with multimodal performance criteria, IEEE Trans. SSC-5, 352-360


310 References<br />

Shedler, G.S. (1967), Parallel numerical methods for the solution of equations, CACM<br />

10, 286-291<br />

Shimizu, T. (1969), A stochastic approximation method for optimization problems,<br />

JACM 16, 511-516<br />

Shubert, B.O. (1972), A sequential method seeking the global maximum of a function,<br />

SIAM J. Numer. Anal. 9, 379-388<br />

Sigmund, K. (1993), Games of life|explorations in ecology, evolution, <strong>and</strong> behavior,<br />

Oxford University Press, Oxford UK<br />

Silverman, G. (1969), Remark on algorithm 315 (E4)|the damped Taylor's series method<br />

for minimizing a sum of squares <strong>and</strong> for solving systems of non-linear equations,<br />

CACM 12, 513<br />

Singer, E. (1962), Simulation <strong>and</strong> optimization of oil re nery design, in: Cooper (1962),<br />

pp. 62-74<br />

Sirisena, H.R. (1973), Computation of optimal controls using a piecewise polynomial<br />

parameterization, IEEE Trans. AC-18, 409-411<br />

Slagle, J.R. (1972), Einfuhrung in die heuristische Programmierung|kunstliche Intelligenz<br />

und intelligente Maschinen, Verlag Moderne Industrie, Munich<br />

Smith, C.S. (1962), The automatic computation of maximum likelihood estimates, National<br />

Coal Board, Scienti c Department, report SC-846-MR-40, London, June 1962<br />

Smith, D.E. (1973), An empirical investigation of optimum-seeking in the computer<br />

simulation situation, Oper. Res. 21, 475-497<br />

Smith, F.B., Jr., D.F. Shanno (1971), An improved Marquardt procedure for non-linear<br />

regressions, Technometrics 13, 63-74<br />

Smith, J. Maynard (1982), <strong>Evolution</strong> <strong>and</strong> the theory of games, Cambridge University<br />

Press, Cambridge UK<br />

Smith, J. Maynard (1989), <strong>Evolution</strong>ary genetics, Oxford University Press, Oxford UK<br />

Smith, L.B. (1969), Remark on algorithm 178 (E4)|direct search, CACM 12, 638<br />

Smith, N.H., D.F. Rudd (1964), The feasibility of directed r<strong>and</strong>om search, University of<br />

Wisconsin, Department of Chemical Engineering, report<br />

Snell, F.M. (Ed.) (1967), Progress in theoretical biology, vol. 1, Academic Press, New<br />

York<br />

Sorenson, H.W. (1969), Comparison of some conjugate direction procedures for function<br />

minimization, J. Franklin Inst. 288, 421-441


References 311<br />

Soucek, B. <strong>and</strong> the IRIS Group (Eds.) (1992), Dynamic, genetic, <strong>and</strong> chaotic programming,<br />

vol. 5 of Sixth-Generation Computer Technology Series, Wiley-Interscience,<br />

New York<br />

Southwell, R.V. (1940), Relaxation methods in engineering science|a treatise on approximate<br />

computation, Oxford University Press, Oxford UK<br />

Southwell, R.V. (1946), Relaxation methods in theoretical physics, Clarendon Press,<br />

Oxford UK<br />

Spath, H. (1967), Algorithm 315 (E4, C5)|the damped Taylor's series method for minimizing<br />

a sum of squares <strong>and</strong> for solving systems of nonlinear equations, CACM 10,<br />

726-728<br />

Spang, H.A., III (1962), A review of minimization techniques for nonlinear functions,<br />

SIAM Review 4, 343-365<br />

Spears, W.M., K.A. De Jong, T. Back, D.B. Fogel, H. de Garis (1993), An overview of<br />

evolutionary computation, in: Brazdil (1993), pp. 442-459<br />

Spedicato, E. (1973), Stability of Huang's update for the conjugate gradient method,<br />

JOTA 11, 469-479<br />

Spendley, W. (1969), Nonlinear least squares tting using a modi ed simplex minimization<br />

method, in: Fletcher (1969a), pp. 259-270<br />

Spendley, W., G.R. Hext, F.R. Himsworth (1962), Sequential application of simplex<br />

designs in optimisation <strong>and</strong> evolutionary operation, Technometrics 4, 441-461<br />

Speyer, J.L., H.J. Kelley, N. Levine, W.F. Denham (1971), Accelerated gradient projection<br />

technique with application to rocket trajectory optimization, Automatica 7,<br />

37-43<br />

Sprave, J. (1993), Zellulare <strong>Evolution</strong>are Algorithmen zur Parameteroptimierung, in:<br />

Hofestadt, Kruckeberg, <strong>and</strong> Lengauer (1993), pp. 111-120<br />

Sprave, J. (1994), Linear neighborhood evolution strategy, in: Sebald <strong>and</strong> Fogel (1994),<br />

pp. 42-51<br />

Stanton, E.L. (1969), A discrete element analysis of elasto-plastic plates by energy minimization,<br />

Ph.D. thesis, Case Western Reserve University, Jan. 1969<br />

Stark, R.M., R.L. Nicholls (1972), Mathematical foundations for design|civil engineering<br />

systems, McGraw-Hill, New York<br />

Stebbins, G.L. (1968), <strong>Evolution</strong>sprozesse, G. Fischer, Stuttgart<br />

Stein, M.L. (1952), Gradient methods in the solution of systems of linear equations, NBS<br />

J. Research 48, 407-413


312 References<br />

Steinbuch, K. (1971), Automat und Mensch, 4th ed., Springer, Berlin<br />

Steinbuch, K., S.W. Wagner (Eds.) (1964), Neuere Ergebnisse der Kybernetik, Oldenbourg,<br />

Munich<br />

Stender, J. (Ed.) (1993), Parallel genetic algorithms|theory <strong>and</strong> applications, IOS<br />

Press, Amsterdam<br />

Steuer, R.E. (1986), Multiple criteria optimization|theory, computation, <strong>and</strong> application,<br />

Wiley, New York<br />

Stewart, E.C., W.P.Kavanaugh, D.H. Brocker (1967), Study of a global search algorithm<br />

for optimal control, Proceedings of the Vth International Analogue Computation<br />

Meeting, Lausanne, Aug.-Sept. 1967, pp. 207-230<br />

Stewart, G.W. (1967), A modi cation of Davidon's minimization method to accept difference<br />

approximations of derivatives, JACM 14, 72-83<br />

Stewart, G.W. (1973), Conjugate direction methods for solving systems of linear equations,<br />

Numer. Math. 21, 285-297<br />

Stiefel, E. (1952), Uber einige Methoden der Relaxationsrechnung, ZAMP 3, 1-33<br />

Stiefel, E. (1965), Einfuhrung in die numerische Mathematik, 4th ed., Teubner, Stuttgart<br />

Stoer, J., C. Witzgall (1970), Convexity <strong>and</strong> optimization in nite dimensions I, Springer,<br />

Berlin<br />

Stolz, O. (1893), Grundzuge der Di erential- und Integralrechnung, erster Teil|reelle<br />

Ver<strong>and</strong>erliche und Functionen, Abschnitt V|die gro ten und kleinsten Werte der<br />

Functionen, pp. 199-258, Teubner, Leipzig, Germany<br />

Stone, H.S. (1973a), Parallel computation|an introduction, IEEE Trans. C-22, 709-710<br />

Stone, H.S. (1973b), An e cient parallel algorithm for the solution of a tri-diagonal<br />

linear system of equations, JACM 20, 27-38<br />

Storey, C. (1962), Applications of a hill climbing method of optimization, Chem. Engng.<br />

Sci. 17(1), 45-52<br />

Storey, C., Rosenbrock (1964), On the computation of the optimal temperature pro le<br />

in a tubular reaction vessel, in: Balakrishnan <strong>and</strong> Neustadt (1964), pp. 23-64<br />

Stratonovich, R.L. (1968), Does there exist a theory of synthesis of optimal adaptive,<br />

self-learning <strong>and</strong> self-adaptive systems? ARC 29, 83-92<br />

Stratonovich, R.L. (1970), Optimal algorithms of the stochastic approximation type,<br />

Engng. Cybern. 8, 20-27<br />

Strongin, R.G. (1970), Multi-extremal minimization, ARC 31, 1085-1088


References 313<br />

Strongin, R.G. (1971), Minimization of many-extremal functions of several variables,<br />

Engng. Cybern. 9, 1004-1010<br />

Suchowitzki, S.I., L.I. Awdejewa (1969), Lineare und konvexe Programmierung, Oldenbourg,<br />

Munich<br />

Sugie, N. (1964), An extension of Fibonaccian searching to multi-dimensional cases,<br />

IEEE Trans. AC-9, 105<br />

Sutti, C., L. Trabattoni, P. Brughiera (1972), A method for minimization of a onedimensional<br />

nonunimodal function, in: Szego (1972), pp. 181-192<br />

Svechinskii, V.B. (1971), R<strong>and</strong>om search in probabilistic iterative algorithms, ARC 32,<br />

76-80<br />

Swann, W.H. (1964), report on the development of a new direct searching method of optimization,<br />

ICI Central Instrument Laboratory, research note 64-3, Middlesborough,<br />

Yorks, June 1964<br />

Swann, W.H. (1969), A survey of non-linear optimization techniques, FEBS-Letters<br />

2(Suppl.), S39-S55<br />

Swann, W.H. (1972), Direct search methods, in: Murray (1972a), pp. 13-28<br />

Sweschnikow, A.A. (Ed.) (1970), Wahrscheinlichkeitsrechnung und mathematische Statistik<br />

in Aufgaben, Teubner, Leipzig, Germany<br />

Sydow, A. (1968), Eine Methode zur exakten Realisierung des Gradientenverfahrens auf<br />

dem iterativ-arbeitenden Analogrechner, messen-steuern-regeln 11, 462-465<br />

Sydow, A. (Ed.) (1993), Simulationstechnik, 8th Symposium Simulationstechnik, Berlin,<br />

Sept. 1993, Vieweg, Braunschweig, Germany<br />

Synge, J.L. (1944), A geometrical interpretation of the relaxation method, Quart. Appl.<br />

Math. 2, 87-89<br />

Szczerbicka, H., P. Ziegler (Eds.) (1993), 6th Workshop Simulation und Kunstliche Intelligenz,<br />

Karlsruhe, Germany, April 22-23, 1993, Mitteilungen aus den Arbeitskreisen<br />

der ASIM, Arbeitsgemeinschaft Simulation in der Gesellschaft fur Informatik (GI),<br />

Bonn<br />

Szego, G.P. (Ed.) (1972), Minimization algorithms, mathematical theories, <strong>and</strong> computer<br />

results, Academic Press, New York<br />

Szego, G.P., G. Treccani (1972), Axiomatization of minimization algorithms <strong>and</strong> a new<br />

conjugate gradient method, in: Szego (1972), pp. 193-216<br />

Tabak, D. (1969), Comparative study of various minimization techniques used in mathematical<br />

programming, IEEE Trans. AC-14, 572


314 References<br />

Tabak, D. (1970), Applications of mathematical programming techniques in optimal<br />

control|a survey, IEEETrans. AC-15, 688-690<br />

Talkin, A.I. (1964), The negative gradient method extended to the computer programming<br />

of simultaneous systems of di erential <strong>and</strong> nite equations, AFIPS Conf. Proc.<br />

26, 539-543<br />

Tapley, B.D., J.M. Lewallen (1967), Comparison of several numerical optimization methods,<br />

JOTA 1, 1-32<br />

Taran, V.A. (1968a), A discrete adaptive system with r<strong>and</strong>om search for the optimum,<br />

Engng. Cybern. 6(4), 142-150<br />

Taran, V.A. (1968b), Adaptive systems with r<strong>and</strong>om extremum search, ARC 29, 1447-<br />

1455<br />

Tazaki, E., A. Shindo, T. Umeda (1970), Decentralized optimization of a chemical process<br />

by a feasible method, IFAC Kyoto Symposium on Systems Engineering Approach<br />

to Computer Control, Kyoto, Japan, Aug. 1970, paper 25.1<br />

Thom, R. (1969), Topological models in biology, Topology 8, 313-336<br />

Thomas, M.E., D.J. Wilde (1964), Feed-forward control of over-determined systems by<br />

stochastic relaxation, in: Blakemore <strong>and</strong> Davis (1964), pp. 16-22<br />

Todd, J. (1949), The condition of certain matrices I, Quart. J. Mech. Appl. Math. 2,<br />

469-472<br />

Tokumaru, H., N. Adachi, K. Goto (1970), Davidon's method for minimization problems<br />

in Hilbert space with an application to control problems, SIAM J. Contr. 8, 163-178<br />

Tolle, H. (1971), Optimierungsverfahren fur Variationsaufgaben mit gewohnlichen Di erentialgleichungen<br />

als Nebenbedingungen, Springer, Berlin<br />

Tomlin, F.K., L.B. Smith (1969), Remark on algorithm 178 (E4)|direct search, CACM<br />

12, 637-638<br />

Torn, A., A. Zilinskas (1989), Global optimization, vol. 350 of Lecture Notes in Computer<br />

Science, Springer, Berlin<br />

Tovstucha, T.I. (1960), The e ect of r<strong>and</strong>om noise on the steady-state operation of a<br />

step-type extremal system for an object with a parabolic characteristic, ARC 21,<br />

398-404<br />

Traub, J.F. (1964), Iterative methods for the solution of equations, Prentice-Hall, Englewood<br />

Cli s NJ<br />

Treccani, G., L. Trabattoni, G.P. Szego (1972), A numerical method for the isolation of<br />

minima, in: Szego (1972), pp. 239-289


References 315<br />

Tsypkin, Ya.Z. see also under Zypkin, Ja.S.<br />

Tsypkin, Ya.Z. (1968a), All the same, does a theory of synthesis of optimal adaptive<br />

systems exist? ARC 29, 93-98<br />

Tsypkin, Ya.Z. (1968b), Optimal hybrid adaptation <strong>and</strong> learning algorithms, ARC 29,<br />

1271-1276<br />

Tsypkin, Ya.Z. (1968c), Self-learning, what is it? IEEE Trans. AC-13, 608-612<br />

Tsypkin, Ya.Z. (1970a), On learning systems, IFAC Kyoto Symposium on Systems Engineering<br />

Approach to Computer Control, Kyoto, Japan, Aug. 1970, paper 34.1<br />

Tsypkin, Ya.Z. (1970b), Generalized learning algorithms, ARC 31, 86-92<br />

Tsypkin, Ya.Z. (1971), Smoothed r<strong>and</strong>omized functionals <strong>and</strong> algorithms in adaptation<br />

<strong>and</strong> learning theory, ARC 32, 1190-1209<br />

Tsypkin, Ya.Z., A.S. Poznyak (1972), Finite learning automata, Engng. Cybern. 10,<br />

478-490<br />

Tzeng, G.-H., P.L. Yu (Eds.) (1992), Proceedings of the 10th International Conference<br />

on Multiple Criteria Decision Making, Taipei, July 19-24, 1992, National Chiao<br />

Tung University, Taipei, Taiwan<br />

Ueing, U. (1971), Zwei Losungsmethoden fur nichtkonvexe Programmierungsprobleme,<br />

Springer, Berlin<br />

Ueing, U. (1972), A combinatorial method to compute a global solution of certain nonconvex<br />

optimization problems, in: Lootsma (1972a), pp. 223-230<br />

Unbehauen, H. (1971), On the parameter optimization of multivariable control systems,<br />

in: Schwarz (1971), technical papers, vol. 2, pp. 2.2.10-1 to 2.2.10-11<br />

Vagin, V.N., L.Ye. Rudelson (1968), An example of a self-organizing system, Engng.<br />

Cybern. 6(6), 33-40<br />

Vajda, S. (1961), Mathematical programming, Addison-Wesley, Reading MA<br />

Vajda, S. (1967), The mathematics of experimental design, Gri n, London<br />

V<strong>and</strong>erplaats, G.N. (1984), Numerical optimization techniques for engineering design|<br />

with applications, McGraw-Hill, New York<br />

VanNorton, R. (1967), Losung linearer Gleichungssysteme nach dem Verfahren von<br />

Gauss-Seidel, in: Ralston <strong>and</strong> Wilf (1967), pp. 92-105<br />

Varah, J.M. (1965), Certi cation of algorithm 203 (E4)|steep 1, CACM 8, 171


316 References<br />

Varela, F.J., P. Bourgine (Eds.) (1992), Toward a practice of autonomous systems,<br />

Proceedings of the 1st European Conference on Arti cial Life (ECAL), Paris, Dec.<br />

11-13, 1991, MIT Press, Cambridge MA<br />

Varga, J. (1974), Praktische Optimierung|Verfahren und Anwendungen der linearen<br />

und nichtlinearen Optimierung, Oldenbourg, Munich<br />

Varga, R.S. (1962), Matrix iterative analysis, Prentice-Hall, Englewood Cli s NJ<br />

Vaysbord, E.M. (1967), Asymptotic estimates of the rate of convergence of r<strong>and</strong>om<br />

search, Engng. Cybern. 5(4), 22-32<br />

Vaysbord, E.M. (1968), Convergence of a method of r<strong>and</strong>om search, Engng. Cybern.<br />

6(3), 44-48<br />

Vaysbord, E.M. (1969), Convergence of a certain method of r<strong>and</strong>om search for a global<br />

extremum of a r<strong>and</strong>om function, Engng. Cybern. 7(1), 46-50<br />

Vaysbord, E.M., D.B. Yudin (1968), Multiextremal stochastic approximation, Engng.<br />

Cybern. 6(5), 1-11<br />

Venter, J.H. (1967), An extension of the Robbins-Monro procedure, Ann. Math. Stat.<br />

38, 181-190<br />

Viswanathan, R., K.S. Narendra (1972), A note on the linear reinforcement scheme for<br />

variable-structure stochastic automata, IEEE Trans. SMC-2, 292-294<br />

Vitale, P., G. Taylor (1968), A note on the application of Davidon's method to nonlinear<br />

regression problems, Technometrics 10, 843-849<br />

Vogelsang, R. (1963), Die mathematische Theorie der Spiele, Dummler, Bonn<br />

Voigt, H.-M. (1989), <strong>Evolution</strong> <strong>and</strong> optimization|an introduction to solving complex<br />

problems by replicator networks, Akademie-Verlag, Berlin<br />

Voigt, H.-M., H. Muhlenbein, H.-P. Schwefel (Eds.) (1990), <strong>Evolution</strong> <strong>and</strong> optimization<br />

'89|Selected papers on evolution theory, combinatorial optimization <strong>and</strong> related<br />

topics, Wartburg Castle, Eisenach, April 2-4, 1989, Akademie-Verlag, Berlin<br />

Voltaire, F.M. Arouet de (1759), C<strong>and</strong>ide oder der Optimismus, Insel Verlag, Frankfort/Main,<br />

1973<br />

Volz, R.A. (1965), The minimization of a function by weighted gradients, IEEE Proc.<br />

53, 646-647<br />

Volz, R.A. (1973), Example of function optimization via hybrid computation, Simulation<br />

21, 43-48<br />

Waddington, C.H. (Ed.) (1968), Towards a theoretical biology I|prolegomena, Edinburgh<br />

University Press, Edinburgh


References 317<br />

Wald, A. (1966), Sequential analysis, 8th ed., Wiley, NewYork<br />

Wallack, P. (1964), Certi cation of algorithm 203 (E4)|steep 1, CACM 7, 585<br />

Walsh, J. (Ed.) (1966), Numerical analysis|an introduction, Academic Press, London<br />

Ward, L., A. Nag, L.C.W. Dixon (1969), Hill-climbing techniques as a method of calculating<br />

the optical constants <strong>and</strong> thickness of a thin metallic lm, Brit. J. Appl.<br />

Phys. (J. Phys. D), Ser. 2, 2, 301-304<br />

Wasan, M.T. (1969), Stochastic approximation, Cambridge University Press, Cambridge<br />

UK<br />

Wasscher, E.J. (1963a), Algorithm 203 (E4)|steep 1, CACM 6, 517-519<br />

Wasscher, E.J. (1963b), Algorithm 204 (E4)|steep 2, CACM 6, 519<br />

Wasscher, E.J. (1963c), Remark on algorithm 129 (E4)|minifun, CACM 6, 521<br />

Wasscher, E.J. (1965), Remark on algorithm 205 (E4)|ative, CACM 8, 171<br />

Weber, H.H. (1972), Einfuhrung in Operations Research, Akademische Verlagsgesellschaft,<br />

Frankfort/Main<br />

Wegge, L. (1966), On a discrete version of the Newton-Raphson method, SIAM J. Numer.<br />

Anal. 3, 134-142<br />

Weinberg, F. (Ed.) (1968), Einfuhrung in die Methode Branch <strong>and</strong> Bound, Springer,<br />

Berlin<br />

Weinberg, F., C.A. Zehnder (Eds.) (1969), Heuristische Planungsmethoden, Springer,<br />

Berlin<br />

Weisman, J., C.F. Wood (1966), The use of optimal search for engineering design, in:<br />

Lavi <strong>and</strong> Vogl (1966), pp. 219-228<br />

Weisman, J., C.F. Wood, L. Rivlin (1965), Optimal design of chemical process systems,<br />

AIChE Engineering Progress Symposium Series 61, no. 55, pp. 50-63<br />

Weiss, E.A., D.H. Archer, D.A. Burt (1961), Computer sets tower for best run, Petrol<br />

Re ner 40(10), 169-174<br />

Wells, M. (1965), Algorithm 251 (E4)|function minimization (Flepomin), CACM 8,<br />

169-170<br />

Werner, J. (1974), Uber die Konvergenz des Davidon-Fletcher-Powell-Verfahrens fur<br />

streng konvexe Minimierungsaufgaben im Hilbert-Raum, Computing 12, 167-176<br />

Wheeling, R.F. (1960), Optimizers|their structure, CACM 3, 632-638


318 References<br />

White, L.J., R.G. Day (1971), An evaluation of adaptive step-size r<strong>and</strong>om search, IEEE<br />

Trans. AC-16, 475-478<br />

White, R.C., Jr. (1970), Hybrid-computer optimization of systems with r<strong>and</strong>om parameters,<br />

Ph.D. thesis, University of Arizona, Tucson AZ, June 1970<br />

White, R.C., Jr. (1971), A survey of r<strong>and</strong>om methods for parameter optimization,<br />

Simulation 17, 197-205<br />

Whitley, L.D. (1991), Fundamental principles of deception in genetic search, in: Rawlins<br />

(1991), pp. 221-241<br />

Whitley, L.D. (Ed.) (1993), Foundations of Genetic Algorithms 2, Morgan Kaufmann,<br />

San Mateo CA<br />

Whitley, V.W. (1962), Algorithm 129 (E4)|minifun, CACM 5, 550-551<br />

Whittle, P. (1971), Optimization under constraints|theory <strong>and</strong> applications of nonlinear<br />

programming, Wiley-Interscience, London<br />

Wiener, N. (1963), Kybernetik|Regelung und Nachrichtenubertragung in Lebewesen<br />

und Maschine, Econ-Verlag, Dusseldorf, Germany<br />

Wiener, N., J.P. Schade (Eds.) (1965), Progress in biocybernetics, vol. 2, Elsevier,<br />

Amsterdam<br />

Wilde, D.J. (1963), Optimization by the method of contour tangents, AIChE J. 9(2),<br />

186-190<br />

Wilde, D.J. (1964), <strong>Optimum</strong> seeking methods, Prentice-Hall, Englewood Cli s NJ<br />

Wilde, D.J. (1965), A multivariable dichotomous optimum-seeking method, IEEE Trans.<br />

AC-10, 85-87<br />

Wilde, D.J. (1966), Objective function indistinguishability in unimodal optimization, in:<br />

Lavi <strong>and</strong> Vogl (1966), pp. 341-350<br />

Wilde, D.J., C.S. Beightler (1967), Foundations of optimization, Prentice-Hall, Englewood<br />

Cli s NJ<br />

Wilkinson, J.H. (1965), The algebraic eigenvalue problem, Oxford University Press, London<br />

Wilkinson, J.H., C. Reinsch (1971), H<strong>and</strong>book for automatic computation, vol. 2|<br />

Linear algebra, Springer, Berlin<br />

Wilson, E.O., W.H. Bossert (1973), Einfuhrung in die Populationsbiologie, Springer,<br />

Berlin


References 319<br />

Witt, U. (1992), Explaining process <strong>and</strong> change|approachestoevolutionary economics,<br />

University of Michigan Press, Ann Arbor MI<br />

Witte, B.F.W., W.R. Holst (1964), Two new direct minimum search procedures for<br />

functions of several variables, AFIPS Conf. Proc. 25, 195-209<br />

Witten, I.H. (1972), Comments on \use of stochastic automata for parameter selfoptimization<br />

with multimodal performance criteria", IEEE Trans. SMC-2, 289-292<br />

Wolf, G., T. Legendi, U. Schendel (Eds.) (1990), Parcella '90, Proceedings of the 5th<br />

International Workshop on Parallel Processing by Cellular Automata <strong>and</strong> Arrays,<br />

Berlin, Sept. 17-21, 1990, vol. 2 of Research in Informatics, Akademie-Verlag, Berlin<br />

Wolfe, P. (1959a), The simplex method for quadratic programming, Econometrica 27,<br />

382-398<br />

Wolfe, P. (1959b), The secant method for simultaneous nonlinear equations, CACM 2,<br />

12-13<br />

Wolfe, P. (1966), On the convergence of gradient methods under constraints, IBM Zurich,<br />

Switzerl<strong>and</strong>, Research Laboratory report RZ-204, March 1966<br />

Wolfe, P. (1967), Another variable metric method, IBM working paper<br />

Wolfe, P. (1969), Convergence conditions for ascent methods, SIAM Review 11, 226-235<br />

Wolfe, P. (1970), Convergence theory in nonlinear programming, in: Abadie (1970), pp.<br />

1-36<br />

Wolfe, P. (1971), Convergence conditions for ascent methods II|some corrections, SIAM<br />

Review 13, 185-188<br />

Wol , W., C.-J. Soeder, F.R. Drepper (Eds.) (1988), Ecodynamics|Contributions to<br />

theoretical ecology, Springer, Berlin<br />

Wood, C.F. (1960), Application of direct search to the solution of engineering problems,<br />

Westinghouse Research Laboratory, scienti c paper 6-41210-1-P1, Pittsburgh PA,<br />

Oct. 1960<br />

Wood, C.F. (1962), Recent developments in direct search techniques, Westinghouse Research<br />

Laboratory, research paper 62-159-522-Rl, Pittsburgh PA<br />

Wood, C.F. (1965), Review of design optimization techniques, IEEE Trans. SSC-1,<br />

14-20<br />

Yates, F. (1967), A fresh look at the basic principles of the design <strong>and</strong> analysis of<br />

experiments, in: LeCam <strong>and</strong> Neyman (1967), pp. 777-790


320 References<br />

Youden, W.J., O. Kempthorne, J.W. Tukey, G.E.P.Box, J.S. Hunter, F.E. Satterthwaite,<br />

T.A. Budne (1959), Discussion of the papers of Messrs. Satterthwaite <strong>and</strong> Budne,<br />

Technometrics 1, 157-193<br />

Yovits, M.C., S. Cameron (Eds.) (1960), Self-organizing systems, Pergamon Press, Oxford<br />

UK<br />

Yovits, M.C., G.T. Jacobi, D.G. Goldstein (Eds.) (1962), Self-organizing systems, Spartan,<br />

Washington, DC<br />

Yudin, D.B. (1965), Quantitative analysis of complex systems I, Engng. Cybern. 3(1),<br />

1-9<br />

Yudin, D.B. (1966), Quantitative analysis of complex systems II, Engng. Cybern. 4(1),<br />

1-13<br />

Yudin, D.B. (1972), New approaches to formalizing the choice of decisions in complex<br />

situations, ARC 33, 747-756<br />

Yvon, J.P. (1972), On some r<strong>and</strong>om search methods, in: Szego (1972), pp. 313-335<br />

Zach, F. (1974), Technisches Optimieren, Springer, Vienna<br />

Zadeh, L.A., L.W. Neustadt, A.V. Balakrishnan (Eds.) (1969a), Computing methods in<br />

optimization problems 2, Academic Press, London<br />

Zadeh, L.A., L.W. Neustadt, A.V. Balakrishnan (Eds.) (1969b), Computing methods in<br />

optimization problems, Springer, Berlin<br />

Zadeh, N. (1970), A note on the cyclic coordinate ascent method, Mgmt. Sci. 16,<br />

642-644<br />

Zahradnik, R.L. (1971), Theory <strong>and</strong> techniques of optimization for practicing engineers,<br />

Barnes <strong>and</strong> Noble, New York<br />

Zakharov, V.V. (1969), A r<strong>and</strong>om search method, Engng. Cybern. 7(2), 26-30<br />

Zakharov, V.V. (1970), The method of integral smoothing in many-extremal <strong>and</strong> stochastic<br />

problems, Engng. Cybern. 8, 637-642<br />

Zangwill, W.I. (1967), Minimizing a function without calculating derivatives, Comp. J.<br />

10, 293-296<br />

Zangwill, W.I. (1969), Nonlinear programming|a uni ed approach, Prentice-Hall, Englewood<br />

Cli s NJ<br />

Zeleznik, F.J. (1968), Quasi-Newton methods for nonlinear equation, JACM 15, 265-271<br />

Zellnik, H.E., N.E. Sondak, R.S. Davis (1962), Gradient search optimization, Chem.<br />

Engng. Progr. 58(8), 35-41


References 321<br />

Zerbst, E.W. (1987), Bionik, Teubner, Stuttgart<br />

Zettl, G. (1970), Ein Verfahren zum Minimieren einer Funktion bei eingeschranktem<br />

Variationsbereich der Parameter, Numer. Math. 15, 415-432<br />

Zhigljavsky, A.A. (1991), Theory of global r<strong>and</strong>om search, Kluwer, Dordrecht, The<br />

Netherl<strong>and</strong>s<br />

Zigangirov, K.S. (1965), Optimal search in the presence of noise, Engng. Cybern. 3(4),<br />

112-116<br />

Zoutendijk, G. (1960), Methods of feasible directions|a study in linear <strong>and</strong> nonlinear<br />

programming, Elsevier, Amsterdam<br />

Zoutendijk, G. (1970), Nonlinear programming|computational methods, in: Abadie<br />

(1970), pp. 37-86<br />

Zurmuhl, R. (1965), Praktische Mathematik fur Ingenieure und Physiker, 5th ed., Springer,<br />

Berlin<br />

Zwart, P.B. (1970), Nonlinear programming|a quadratic analysis of ridge paralysis,<br />

JOTA 6, 331-339<br />

Zwart, P.B. (1973), Nonlinear programming|counterexample to two global optimization<br />

algorithms, Oper. Res. 21, 1260-1266<br />

Zypkin, Ja.S. see also under Tsypkin, Ya.Z.<br />

Zypkin, Ja.S. (1966), Adaption und Lernen in automatischen Systemen, Oldenbourg,<br />

Munich<br />

Zypkin, Ja.S. (1967), Probleme der Adaption in automatischen Systemen, messen-steuernregeln<br />

10, 362-365<br />

Zypkin, Ja.S. (1970), Adaption und Lernen in kybernetischen Systemen, Oldenbourg,<br />

Munich


322 References<br />

Glossary of Abbreviations<br />

AAAS American Association for the Advancement of Science<br />

ACM Association for Computing Machinery<br />

AEG Allgemeine Elektricitats-Gesellschaft<br />

AERE Atomic Energy Research Establishment<br />

AFIPS American Federation of Information Processing Societies<br />

AGARD Advisory Group for Aerospace Research <strong>and</strong> Development<br />

AIAA American Institute of Aeronautics <strong>and</strong> Astronautics<br />

AIChE American Institute of Chemical Engineers<br />

AIEE American Institute of Electrical Engineers<br />

ANL Argonne National Laboratory<br />

ARC Automation <strong>and</strong> Remote Control<br />

(cover-to-cover translation of Avtomatika iTelemechanika)<br />

ASME American Society ofMechanical Engineers<br />

BIT Nordisk Tidskrift for Informationsbeh<strong>and</strong>ling<br />

CACM Communications of the ACM<br />

DFVLR Deutsche Forschungs- und Versuchsanstalt fur Luft{ und Raumfahrt<br />

DGRR Deutsche Gesellschaft fur Raketentechnik und Raumfahrt<br />

DLR Deutsche Luft- und Raumfahrt<br />

FEBS Federation of European Biochemical Societies<br />

GI Gesellschaft fur Informatik<br />

GMD Gesellschaft fur Mathematik und Datenverarbeitung<br />

IBM International Business Machines Corporation<br />

ICI Imperial Chemical Industries Limited<br />

IEE Institute of Electrical Engineers<br />

IEEE Institute of Electrical <strong>and</strong> Electronics Engineers<br />

Transactions AC on Automatic Control<br />

BME on Bio-Medical Engineering<br />

C on Computers<br />

MIL on Military Electronics<br />

MTT on Microwave Theory <strong>and</strong> Techniques<br />

NN on Neural Networks<br />

SMC on Systems, Man, <strong>and</strong> Cybernetics<br />

SSC on Systems Science <strong>and</strong> Cybernetics<br />

IFAC International Federation of Automatic Control<br />

IIASA International Institute for Applied Systems Analysis<br />

IMACS International Association for Mathematics <strong>and</strong> Computers in Simulation<br />

IRE Institute of Radio Engineers<br />

Transactions EC on Electronic Computers<br />

EM on Engineering Management<br />

ISA Instrument Society of America<br />

JACM Journal of the ACM


JIMA Journal of the Institute of Mathematics <strong>and</strong> Its Applications<br />

JOTA Journal of Optimization Theory <strong>and</strong> Applications<br />

KFA Kernforschungsanlage (Nuclear Research Center) Julich<br />

KfK Kernforschungszentrum (Nuclear Research Center) Karlsruhe<br />

MIT Massachusetts Institute of Technology<br />

NASA National Aeronautics <strong>and</strong> Space Administration<br />

NBS National Bureau of St<strong>and</strong>ards<br />

NTZ Nachrichtentechnische Zeitschrift<br />

PPSN Parallel Problem Solving from Nature<br />

SIAM Society for Industrial <strong>and</strong> Applied Mathematics<br />

UKAEA United Kingdom Atomic Energy Authority<br />

VDE Verb<strong>and</strong> Deutscher Elektrotechniker<br />

VDI Verein Deutscher Ingenieure<br />

WGLR Wissenschaftliche Gesellschaft fur Luft- und Raumfahrt<br />

ZAMM Zeitschrift fur angew<strong>and</strong>te Mathematik und Mechanik<br />

ZAMP Zeitschrift fur angew<strong>and</strong>te Mathematik und Physik<br />

323


324 References


Appendix A<br />

Catalogue of Problems<br />

The catalogue is divided into three groups of test problems corresponding to the three<br />

divisions of the numerical strategy comparison. The optimization problems are all formulated<br />

as minimum problems with a speci ed objective function F (x) <strong>and</strong> solution x .<br />

For the second set of problems, the initial conditions x (0) are also given. Occasionally,<br />

further local minima <strong>and</strong> other stationary points of the objective function are also indicated.<br />

Inequality constraints are formulated such that the constraint functions Gj(x) are<br />

all greater than zero within the allowed or feasible region. If a solution lies on the edge<br />

of the feasible region, then the active constraints are mentioned. The values of these constraint<br />

functions must be just equal to zero at the optimum. Where possible the structure<br />

of the minimum problem is depicted geometrically by means of a two dimensional contour<br />

diagram with lines F (x 1x 2)=const: <strong>and</strong> as a three dimensional picture in which values<br />

of F (x 1x 2) are plotted as elevation over the (x 1x 2) plane. Additionally, thevalues of<br />

the objective function on the contour lines are speci ed. Constraints are shown as bold<br />

lines in the contour diagrams. In the 3D plots the objective function is mostly oored<br />

to minimal values within non-feasible regions. In some cases there is a brief mention of<br />

any especially characteristic behavior shown by individual strategies during their iterative<br />

search for the minimum.<br />

A.1 Test Problems for the First Part of the<br />

Strategy Comparison<br />

Problem 1.1 (sphere model)<br />

Objective function:<br />

Minimum:<br />

F (x)=<br />

nX<br />

i=1<br />

x 2<br />

i<br />

x i =0 for i = 1(1)n F (x )=0<br />

For n = 2 a contour diagram as well as a 3D plot are sketched under Problem 2.17. For<br />

this, the simplest of all quadratic problems, none of the strategies fails.<br />

325


326 Appendix A<br />

Problem 1.2<br />

Objective function:<br />

Minimum:<br />

F (x) =<br />

0<br />

nX<br />

@ iX<br />

i=1<br />

j=1<br />

xj<br />

1<br />

A<br />

x i =0 for i =1(1)n F (x )=0<br />

A contour diagram as well as a 3D plot for n = 2 are given under Problem 2.9. The<br />

objective function of this true quadratic minimum problem can be written in matrix<br />

notation as:<br />

F (x) =x T Ax<br />

The n n matrix of coe cients A is symmetric <strong>and</strong> positive-de nite. According to<br />

Schwarz, Rutishauser, <strong>and</strong> Stiefel (1968) its condition number K is a measure of the<br />

numerical di culty of the problem. Among other de nitions, that of Todd (1949) is<br />

useful, namely:<br />

where<br />

K = max<br />

min<br />

= a2<br />

max<br />

a 2<br />

min<br />

max = max<br />

i fj ij i= 1(1)ng<br />

<strong>and</strong> similarly for min. The i are the eigenvalues of the matrix A, <strong>and</strong> the ai are the<br />

lengths of the semi-axes of an n-dimensional elliptic contour surface F (x) =const.<br />

Condition numbers for the present matrix<br />

A =(aij) =<br />

2<br />

6<br />

6<br />

4<br />

n n ; 1 n ; 2 ::: n ; j +1 ::: 1<br />

n ; 1 n ; 1 n ; 2 ::: n ; j +1 ::: 1<br />

n ; 2 n ; 2 n ; 2 ::: n ; j +1 ::: 1<br />

.<br />

.<br />

.<br />

.<br />

n ; i +1 ::: n ; i +1 ::: n ; i +1 ::: ::: 1<br />

.<br />

.<br />

.<br />

.<br />

1 1 1 ::: 1 ::: 1<br />

were calculated for various values of n by means of an algorithm of Greenstadt (1967b),<br />

which uses the Jacobi method of diagonalization. As can be seen from the following table,<br />

K increases with the number of variables as O(n 2 ).<br />

2<br />

3<br />

7<br />

7<br />

5


Test Problems for the Second Part of the Strategy Comparison 327<br />

n K K=n 2<br />

1 1 1<br />

2 6.85 1.71<br />

3 16.4 1.82<br />

6 64.9 1.80<br />

10 175 1.75<br />

20 678 1.69<br />

30 1500 1.67<br />

60 5930 1.65<br />

100 16400 1.64<br />

Not all the search methods achieved the required accuracy. For many variables the coordinate<br />

strategies <strong>and</strong> the complex method of Box terminated the search prematurely.<br />

Powell's method of conjugate gradients even got stuck without the termination criterion<br />

taking e ect.<br />

A.2 Test Problems for the Second Part of the<br />

Strategy Comparison<br />

Problem 2.1 after Beale (1958)<br />

Objective function:<br />

F (x) =[1:5 ; x 1 (1 ; x 2)] 2 + h 2:25 ; x 1 (1 ; x 2<br />

2) i 2<br />

+ h 2:625 ; x 1 (1 ; x 3<br />

2) i 2<br />

Figure A.1: Graphical representation of Problem 2.1<br />

F (x) ==0:1 1 4' 14:20 36 100=


328 Appendix A<br />

Minimum:<br />

x =(3 0:5) F (x )=0<br />

Besides the strong minimum x there is a weak minimum at in nity:<br />

Saddle point:<br />

Start:<br />

x 0 ! (;1 1) F (x 0 ) ! 0<br />

x 00 =(0 1) F (x 00 ) ' 14:20<br />

x (0) =(0 0) F (x (0) ) ' 14:20<br />

For very large initial step lengths the (1+1) evolution strategy converged once to the weak<br />

minimum x 0 .<br />

Problem 2.2<br />

As Problem 2.1, but with:<br />

Start:<br />

Problem 2.3<br />

Objective function:<br />

x (0) =(0:1 0:1) F (x (0) ) ' 12:99<br />

q<br />

F (x) =;jx sin( jxj)j<br />

Figure A.2: Diagram F (x) for Problem 2.3


Test Problems for the Second Part of the Strategy Comparison 329<br />

There are in nitely many local minima, the position of which can be speci ed by a<br />

transcendental equation: q q<br />

jx j = 2 tan ( jx j)<br />

For jx j 1wehave approximately<br />

<strong>and</strong><br />

x ' ( (0:5+k)) 2 for k =1 2 3:::integer<br />

F (x ) 'jx j<br />

Whereas in reality none of the nite local minima is at the same time a global minimum,<br />

the nite word length of the digital computer used together with the system-speci c<br />

method of evaluating the sine function give rise to an apparent global minimum at<br />

x = 4:44487453 10 16<br />

F (x )=;4:44487453 10 16<br />

Counting from the origin it is the 67 108 864th local minimum in each direction. If x is<br />

increased above this value, the objective function value is always set to zero. (Note that<br />

this behavior is machine dependent.)<br />

Start:<br />

x (0) =0 F (x (0) )=0<br />

Most strategies located the rst or highest local minimum left or right of the starting<br />

point (the origin). Depending on the sequence of r<strong>and</strong>om numbers, the two membered<br />

evolution method found (for example) the 2nd, 9th, <strong>and</strong> 34th local minimum. Only the<br />

(10, 100) evolution strategy almost always reached the apparent global minimum.<br />

Problem 2.4<br />

Objective function:<br />

Minimum:<br />

Start:<br />

F (x) =<br />

nX<br />

i=1<br />

Problem 2.5 after Booth (1949)<br />

Objective function:<br />

x 1 ; x 2<br />

i<br />

2<br />

+(xi ; 1] 2<br />

for n =5<br />

x i =1 for i = 1(1)n F (x )=0<br />

x (0)<br />

i =10 for i = 1(1)n F (x (0) ) = 40905<br />

F (x) =(x 1 +2x 2 ; 7) 2 +(2x 1 + x 2 ; 5) 2


330 Appendix A<br />

Figure A.3: Graphical representation of Problem 2.4 for n =2<br />

F (x) ==10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 =<br />

This minimum problem is equivalent to solving the following pair of linear equations:<br />

x 1 +2x 2 =7<br />

2 x 1 + x 2 =5<br />

Figure A.4: Graphical representation of Problem 2.5<br />

F (x) ==1 9 25 49 81 121 169 225=


Test Problems for the Second Part of the Strategy Comparison 331<br />

An approach to the latter problem is to determine those values of x 1 <strong>and</strong> x 2 that minimize<br />

the error in the equations. The error is de ned here in the sense of a Gaussian<br />

approximation as the sum of the squares of the components of the residual vector.<br />

Minimum:<br />

Start:<br />

Problem 2.6<br />

Objective function:<br />

x =(1 3) F (x )=0<br />

x (0) =(0 0) F (x (0) )=74<br />

F (x) = maxfjx 1 +2x 2 ; 7j j2 x 1 + x 2 ; 5jg<br />

This represents an attempt to solve the previous system of linear equations of Problem<br />

2.5 in the sense of a Tchebyche approximation. Accordingly, the error is de ned as the<br />

absolute maximum component of the residual vector.<br />

Minimum:<br />

Start:<br />

x =(1 3) F (x )=0<br />

x (0) =(0 0) F (x (0) )=7<br />

Figure A.5: Graphical representation of Problem 2.6<br />

F (x) ==1 2 3 4 5 6 7 8 9 10 11=


332 Appendix A<br />

Several of the search procedures wereunableto ndtheminimum. They converge to a<br />

point on the line x 1+x 2 = 4, which joins together the sharpest corners of the rhombohedral<br />

contours. The partial derivatives of the objective function are discontinuous there in the<br />

unit vector directions, parallel to the coordinate axes, no improvement can be made.<br />

Besides the coordinate strategies, the methods of Hooke <strong>and</strong> Jeeves <strong>and</strong> of Powell are<br />

thwarted by this property.<br />

Problem 2.7 after Box (1966)<br />

Objective function:<br />

Minima:<br />

F (x)=<br />

10X<br />

j=1<br />

(exp (;0:1 jx 1) ; exp (;0:1 jx 2) ; x 3 [exp (;0:1 j) ; exp (;j)]) 2<br />

x =(1 10 1) F (x )=0<br />

x =(10 1 ;1) F (x )=0<br />

Besides these two equivalent, strong minima there is a weak minimum along the line<br />

x 0<br />

1 = x 0<br />

2 x 0<br />

3 =0 F (x 0 )=0<br />

Because of the<br />

into a region:<br />

nite computational accuracy the weak minimum is actually broadened<br />

x 00<br />

1 ' x 00<br />

2 x 00<br />

3 ' 0 F (x 00 )=0 if x1 1<br />

Figure A.6: Graphical representation of Problem 2.7 on the plane<br />

x3 =1F(x) ==0:03 0:3 1' 3:064 10 30=


Test Problems for the Second Part of the Strategy Comparison 333<br />

Start:<br />

Figure A.7: Graphical representation of Problem 2.7 on the planes<br />

left: x3 = 0, right: x3 = ;1,<br />

F (x) ==0:03 0:3 1' 3:064 10 30=<br />

x (0) =(0 20 20) F (x (0) ) ' 1022<br />

Many strategies only roughly located the rst of the strong minima de ned above. The<br />

evolution strategies tended to converge to the weak minimum, since the minima are at<br />

equal values of the objective function. The second strong minimum, which is never referred<br />

to in the relevant literature, was sometimes found by the multimembered evolution<br />

strategy.<br />

Problem 2.8<br />

As Problem 2.7, but with<br />

Start:<br />

Problem 2.9<br />

Objective function:<br />

Minimum:<br />

Start:<br />

x (0) =(0 10 20) F (x (0) ) ' 1031<br />

F (x) =<br />

nX<br />

i=1<br />

(<br />

iX<br />

j=1<br />

xj) 2 for n =5<br />

x i =0 for i = 1(1)n F (x )=0<br />

x (0)<br />

i =10 for i = 1(1)n F (x (0) ) = 5500


334 Appendix A<br />

Figure A.8: Graphical representation of Problem 2.9 for n =2<br />

F (x) ==4 36 100 196 324 484=<br />

Problem 2.10 after Kowalik (1967 see also Kowalik <strong>and</strong> Morrison, 1968)<br />

Objective function:<br />

F (x) =<br />

X11<br />

i=1<br />

ai ; x1 (b2 i + bi x2) b2 ! 2<br />

i + bi x3 + x4 Numerical values of the constants ai <strong>and</strong> bi for i = 1(1)11 can be taken from the following<br />

table:<br />

i ai b ;1<br />

i<br />

1 0.1957 0.25<br />

2 0.1947 0.5<br />

3 0.1735 1<br />

4 0.1600 2<br />

5 0.0844 4<br />

6 0.0627 6<br />

7 0.0456 8<br />

8 0.0342 10<br />

9 0.0323 12<br />

10 0.0235 14<br />

11 0.0246 16<br />

In this non-linear tting problem, formulated as a minimum problem, the free parameters


Test Problems for the Second Part of the Strategy Comparison 335<br />

ajj = 1(1)4 of a function<br />

y(z) = 1 (z 2 + 2 z)<br />

z 2 + 3 z + 4<br />

have to be determined with reference to eleven data points fyizig such that the error, as<br />

measured by the Euclidean norm, is minimized (Gaussian or least squares approximation).<br />

Minimum:<br />

Start:<br />

x ' (0:1928 0:1908 0:1231 0:1358) F (x ) ' 0:0003075<br />

x (0) =(0 0 0 0) F (x (0) ) ' 0:1484<br />

Near the optimum, if the variables are changed in the last decimal place (with respect<br />

to the machine accuracy), rounding errors cause the objective function to behave almost<br />

stochastically. The multimembered evolution strategy with recombination yields the best<br />

solution. It deviates signi cantly from the optimum solution as de ned by Kowalik <strong>and</strong><br />

Osborne (1968). Since this best value has a quasi-singular nature, it is repeatedly lost<br />

by the population of a (10 , 100) evolution strategy, with the result that the termination<br />

criterion of the search sometimes only takes e ect after a long time if at all.<br />

Problem 2.11<br />

As Problem 2.10, but with:<br />

Start:<br />

Problem 2.12<br />

As Problem 2.10, but with:<br />

Start:<br />

x (0) =(0:25 0:39 0:415 0:39) F (x (0) ) ' 0:005316<br />

x (0) =(0:25 0:40 0:40 0:40) F (x (0) ) ' 0:005566<br />

Problem 2.13 after Fletcher <strong>and</strong> Powell (1963)<br />

Objective function:<br />

where<br />

F (x) =<br />

Ai = nP<br />

nX<br />

i=1<br />

(Ai ; Bi(x)) 2 for n =5<br />

(aij sin j + bij cos j)<br />

j=1<br />

Bi(x) = nP<br />

(aij sin xj + bij cos xj)<br />

j=1<br />

9<br />

>=<br />

><br />

for i = 1(1)n<br />

aij <strong>and</strong> bij are integer r<strong>and</strong>om numbers in the range [;100 100], <strong>and</strong> i are r<strong>and</strong>om<br />

numbers in the range [; ]. A minimum of this problem is simultaneously a solution of<br />

the equivalent systemofn simultaneous non-linear (transcendental) equations:


336 Appendix A<br />

Figure A.9: Graphical representation of Problem 2.13 for n =2:<br />

a11 = ;2 a12 = 27 a21 = ;70 a22 = ;48<br />

b11 = ;76 b12 = ;51 b21 = 63 b22 = ;50<br />

1 = ;3:0882 2 = 2:0559<br />

F (x) ==238:864 581:372 1403:11 3283:14 7153:45<br />

13635:3 21479:6 27961:4 31831:7 33711:8 34533:5=<br />

nX<br />

j=1<br />

(aij sin xj + bij cos xj) =Ai for i = 1(1)n<br />

The solution is again approximated in the least squares sense.<br />

Minimum:<br />

x i = i for i = 1(1)n F (x )=0<br />

Because the trigonometric functions are multivalued there are in nitely many equivalent<br />

minima (real solutions of the system of equations), of which upto2 n lie in the interval<br />

Start:<br />

i ; xi i + for i = 1(1)n<br />

x (0)<br />

i = i + i for i = 1(1)n<br />

where i are r<strong>and</strong>om numbers in the range [; =10 =10]. To provide the same conditions<br />

for all the search methods the same sequence of r<strong>and</strong>om numbers was used in each case,<br />

<strong>and</strong> hence<br />

F (x (0) ) ' 1182<br />

Because of the proximity of the starting point to the one solution, x i = i for i = 1(1)n,<br />

all the strategies approached this minimum only.


Test Problems for the Second Part of the Strategy Comparison 337<br />

Problem 2.14 after Powell (1962)<br />

Objective function:<br />

Minimum:<br />

Start:<br />

F (x) =(x 1 +10x 2) 2 +5(x 3 ; x 4) 2 +(x 2 ; 2 x 3) 4 +10(x 1 ; x 4) 4<br />

x =(0 0 0 0) F (x )=0<br />

x (0) =(3 ;1 0 1) F (x (0) )=215<br />

The matrix of second partial derivatives of the objective function goes singular at the<br />

minimum. Thus it is not surprising that a quasi-Newton method like thevariable metric<br />

method of Davidon, Fletcher, <strong>and</strong> Powell (applied here in Stewart's derivative-free form)<br />

got stuck a long way from the minimum. Geometrically speaking, there is a valley which<br />

becomes extremely narrow asitapproaches the minimum. Theevolution strategies therefore<br />

ended up by converging very slowly with a minimum step length, <strong>and</strong> the search had<br />

to be terminated for reasons of time.<br />

Problem 2.15<br />

As Problem 2.14, except:<br />

Start:<br />

Problem 2.16 after Leon (1966a)<br />

Objective function:<br />

x (0) =(1 2 3 4) F (x (0) )=1512<br />

F (x) = 100 (x 2 ; x 3<br />

1) 2 +(x 1 ; 1) 2<br />

Figure A.10: Graphical representation of Problem 2.16<br />

F (x) ==0:25 4 64 250 1000 5000 10000=


338 Appendix A<br />

Minimum:<br />

Start:<br />

Problem 2.17 (sphere model)<br />

Objective function:<br />

Minimum:<br />

Start:<br />

x =(1 1) F (x )=0<br />

x (0) =(;1:2 1) F (x (0) ) ' 749:<br />

F (x) =<br />

Problem 2.18 after Matyas (1965)<br />

Objective function:<br />

Minimum:<br />

Start:<br />

nX<br />

i=1<br />

x 2<br />

i for n =5<br />

x i =0 for i =1(1)n F (x )=0<br />

x (0)<br />

i =10 for i = 1(1)n F (x (0) )=500<br />

F (x) =0:26 (x 2<br />

1 + x2<br />

2 ) ; 0:48 x1 x2 x =(0 0) F (x )=0<br />

x (0) =(15 30) F (x (0) )=76:5<br />

Figure A.11: Graphical representation of Problem 2.17 for n =2<br />

F (x) ==4 16 36 64 100 144 196=


Test Problems for the Second Part of the Strategy Comparison 339<br />

Figure A.12: Graphical representation of Problem 2.18<br />

F (x) ==1 3 10 30 100 300=<br />

The coordinate strategies terminated the search prematurely because of the lower bounds<br />

on the step lengths (as determined by themachine), which precluded making any more<br />

successful line searches in the coordinate directions.<br />

Problem 2.19 by Wood (after Colville, 1968)<br />

Objective function:<br />

Minimum:<br />

F (x) = 100 x 1 ; x 2<br />

2<br />

2<br />

+(x 2 ; 1) 2 +90 x 3 ; x 2<br />

4<br />

2<br />

+(x 4 ; 1) 2<br />

+10:1 h (x1 ; 1) 2 +(x3 ; 1) 2<br />

i<br />

+19:8(x1 ; 1)(x3 ; 1)<br />

There is another stationary point near<br />

x =(1 1 1 1) F (x )=0<br />

x 0 ' (1 ;1 1 ;1) F (x 0 ) ' 8<br />

According to Himmelblau (1972a,b) there are still further local minima.<br />

Start:<br />

x (0) =(;1 ;3 ;1 ;3) F (x (0) ) = 19192<br />

Avery narrow valley appears to run from the stationary point x 0 to the minimum. All<br />

the coordinate strategies together with the methods of Hooke <strong>and</strong> Jeeves <strong>and</strong> of Powell<br />

ended the search in this region.


340 Appendix A<br />

Problem 2.20<br />

Objective function:<br />

Minimum:<br />

Start:<br />

Problem 2.21<br />

Objective function:<br />

Minimum:<br />

Start:<br />

F (x) =<br />

nX<br />

i=1<br />

jxij for n =5<br />

x i =0 for i =1(1)n F (x )=0<br />

x (0)<br />

i =10 for i =1(1)n F (x (0) )=50<br />

F (x) = max<br />

i fjxij i= 1(1)ng for n =5<br />

x i =0 for i =1(1)n F (x )=0<br />

x (0)<br />

i =10 for i =1(1)n F (x (0) )=10<br />

Since the starting point is at a corner of the cubic contour surface, none of the coordinate<br />

strategies could nd a point withalower value of the objective function. The method of<br />

Figure A.13: Graphical representation of Problem 2.20 for n =2<br />

F (x) ==2 4 6 8 10 12 14 16 18 20=


Test Problems for the Second Part of the Strategy Comparison 341<br />

Figure A.14: Graphical representation of Problem 2.21 for n =2<br />

F (x) ==2 4 6 8 10=<br />

Powell also ended the search without making any signi cant improvement on the initial<br />

condition. Both the simplex method of Nelder <strong>and</strong> Mead <strong>and</strong> the complex method of Box<br />

also had trouble in the minimum search in their cases the initially constructed simplex<br />

or complex collapsed long before reaching the minimum, again near one of the corners.<br />

Problem 2.22<br />

Objective function:<br />

Minimum:<br />

Start:<br />

F (x) =<br />

nX<br />

i=1<br />

jxij +<br />

nY<br />

i=1<br />

jxij for n =5<br />

x i =0 for i = 1(1)n F (x )=0<br />

x (0)<br />

i =10 for i = 1(1)n F (x (0) ) = 100050<br />

The simplex <strong>and</strong> complex methods did not nd the minimum. As in the previous Problem<br />

2.21, this is due to the sharply pointed corners of the contours. The variable metric<br />

strategy also nally got stuck at one of these corners <strong>and</strong> converged no further. In this<br />

case the discontinuity in the partial derivatives of the objective function at the corners is<br />

to blame for its failure.


342 Appendix A<br />

Problem 2.23<br />

Objective function:<br />

Minimum:<br />

Figure A.15: Graphical representation of Problem 2.22 for n =2<br />

F (x) ==3 8 15 24 35 48 63 80 99=<br />

F (x) =<br />

nX<br />

i=1<br />

x 10<br />

i for n =5<br />

x i =0 for i =1(1)n F (x )=0<br />

Figure A.16: Graphical representation of Problem 2.23 for n =2<br />

F (x) ==2 10 4 10 6 10 8 10 10 10 =


Test Problems for the Second Part of the Strategy Comparison 343<br />

Start:<br />

x (0)<br />

i =10 for i = 1(1)n F (x (0) )=5 10 10<br />

Only the two strategies that have a quadratic internal model of the objective function,<br />

namely the variable metric <strong>and</strong> conjugate directions methods, failed to converge, because<br />

the function F (x) isofmuch higher (10th) order.<br />

Problem 2.24 after Rosenbrock (1960)<br />

Objective function:<br />

Minimum:<br />

Start:<br />

Problem 2.25<br />

Objective function:<br />

Minimum:<br />

F (x) =<br />

F (x) = 100 (x 2 ; x 2<br />

1) 2 +(x 1 ; 1) 2<br />

x =(1 1) F (x )=0<br />

x (0) =(;1:2 1) F (x (0) )=24:2<br />

nX<br />

i=2<br />

x 1 ; x 2<br />

i<br />

2<br />

+(xi ; 1) 2 for n =5<br />

x i =1 for i = 1(1)n F (x )=0<br />

Figure A.17: Graphical representation of Problem 2.24<br />

F (x) ==0:5 4 20 100 250 500 1000 2000 5000=


344 Appendix A<br />

Start:<br />

x (0)<br />

i =10 for i = 1(1)n F (x (0) )=32724<br />

For n = 2 this becomes nearly the same as Problem 2.24.<br />

Problem 2.26<br />

Objective function:<br />

q<br />

F (x)=;x sin( jxj)<br />

This problem is the same as Problem 2.3 except for the modulus. The di erence has<br />

the e ect that the neighboring minima are further apart here. The positions of the local<br />

minima <strong>and</strong> maxima are described under Problem 2.3.<br />

Start:<br />

x (0) =0 F (x (0) )=0<br />

Again, only the multimemberedevolution strategy converged to the apparent global minimum<br />

all the other methods only converged to the rst (nearest) local minimum.<br />

Problem 2.27 after Zettl (1970)<br />

Objective function:<br />

Minimum:<br />

F (x) =(x 2<br />

1 + x 2<br />

2 ; 2 x 1) 2 +0:25 x 1<br />

x ' (;0:02990 0) F (x ) ';0:003791<br />

Figure A.18: Diagram F (x) of Problem 2.26


Test Problems for the Second Part of the Strategy Comparison 345<br />

Figure A.19: Graphical representation of Problem 2.27<br />

F (x) ==0:03 0:3 1 3 10 30=<br />

Because of rounding errors this same objective function value is reached for various pairs<br />

of values of x 1x 2:<br />

Local maximum:<br />

Saddle point:<br />

Start:<br />

x 0 ' (1:063 0) F (x 0 ) ' 1:258<br />

x 00 ' (1:967 0) F (x 00 ) ' 0:4962<br />

x (0) =(2 0) F (x (0) )=0:5<br />

Problem 2.28 of Watson (after Kowalik <strong>and</strong> Osborne, 1968)<br />

Objective function:<br />

where<br />

F (x) =<br />

30X<br />

i=1<br />

0<br />

B<br />

@ 5X<br />

j=1<br />

ja j;1<br />

i xj+1 ;<br />

ai =<br />

2<br />

4 6X<br />

j=1<br />

i ; 1<br />

29<br />

a j;1<br />

i xj<br />

32<br />

5<br />

; 1<br />

1<br />

C<br />

A<br />

2<br />

+ x 2<br />

1<br />

The origin of this problem is the approximate solution of the ordinary di erential equation<br />

dz<br />

dy ; z2 =1


346 Appendix A<br />

on the interval 0 y 1 with the boundary condition z(y =0)=0. The function<br />

sought, z(y), is to be approximated by a polynomial<br />

~z(c y) =<br />

nX<br />

j=1<br />

cj y j;1<br />

In the present case only the rst six terms are considered. Suitable values of the polynomial<br />

coe cients cj j = 1(1)6, are to be determined. The deviation from the exact<br />

solution of the di erential equation is measured in the Gaussian sense as the sum of the<br />

squares of the errors at m = 30 argument values yi, uniformly distributed in the range<br />

[0,1]<br />

F 1(c) =<br />

0<br />

mX<br />

@<br />

i=1<br />

@~z(c y)<br />

; ~z<br />

@y yi 2 (c y)<br />

yi<br />

The boundary condition is treated as a second simultaneous equation by means of a<br />

similarly constructed term:<br />

F 2(c) = ~z 2 (c y) y=0<br />

By inserting the polynomial <strong>and</strong> rede ning the parameters ci as variables xi we obtain<br />

the objective function F (x) =F 1(x)+F 2(x), the minimum of which isanapproximate<br />

solution of the parameterized functional problem.<br />

Minimum:<br />

Start:<br />

x ' (;0:0158 1:012 ;0:2329 1:260 ;1:513 0:9928) F (x ) ' 0:002288<br />

; 1<br />

x (0) =(0 0 0 0 0 0) F (x (0) )=30<br />

Judging by the number of objective function evaluations all the search methods found<br />

this a di cult problem to solve. The best solution was provided by the complex strategy.<br />

Problem 2.29 after Beale (1967)<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

Start:<br />

F (x) =2x 2<br />

1 +2x 2<br />

2 + x 2<br />

3 +2x 1 x 2 +2x 1 x 3 ; 8 x 1 ; 6 x 2 ; 4 x 3 +9<br />

x = 4 7 4<br />

<br />

3 9 9<br />

Gj(x) =xj 0 for j = 1(1)3<br />

G 4(x) =;x 1 ; x 2 ; 2 x 3 +3 0<br />

F (x )= 1<br />

9 only G 4 active i.e., G 4(x )=0<br />

x (0) =(0:1 0:1 0:1) F (x (0) )=7:29<br />

1<br />

A<br />

2


Test Problems for the Second Part of the Strategy Comparison 347<br />

Problem 2.30<br />

As Problem 2.3, but with the constraints<br />

G 1(x) =;x +300 0 G 2(x) =x +300 0<br />

The introduction of constraints gives rise to two equivalent, global minima at the edge of<br />

the feasible region:<br />

Minima:<br />

x = 300 F (x ) ';299:7 G 1 or G 2 active<br />

In addition there are ve local minima within the feasible region. Here too, the absolute<br />

minima were only located by the multimembered evolution strategy.<br />

Problem 2.31<br />

As Problem 2.4, but with constraints:<br />

Minimum:<br />

Start:<br />

Gj(x) =xj ; 1 0 for j =1(1)n n =5<br />

x i =1 for i = 1(1)n F (x )=0 all Gj active<br />

x (0)<br />

i = ;10 for i = 1(1)n F (x (0) ) = 61105<br />

The starting point is located outside of the feasible region.<br />

Figure A.20: Diagram F (x) for Problem 2.30


348 Appendix A<br />

Problem 2.32 after Bracken <strong>and</strong> McCormick (1970)<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

F (x) =;x 2<br />

1 ; x 2<br />

2<br />

Gj(x) =xj 0 for j =1 2<br />

G 3(x) =;x 1 +1 0 G 4(x) =;x 1 ; 4 x 2 +5 0<br />

x =(1 1) F (x )=;2 G 3 <strong>and</strong> G 4 active<br />

Besides this global minimum there is another local one:<br />

Start:<br />

x 0 = 0 5<br />

4<br />

F (x 0 )=; 25<br />

16 G 1 <strong>and</strong> G 4 active<br />

x (0) =(0 0) F (x (0) )=0<br />

All the search methods converged to the global minimum.<br />

Problem 2.33 after Zettl (1970)<br />

As Problems 2.14 <strong>and</strong> 2.15, but with the constraints:<br />

Gj(x) =xj+2 ; 2 0 for j =1 2<br />

Figure A.21: Graphical representation of Problem 2.32<br />

F (x) ==0:04 0:16 0:36 0:64 1:0 1:44 1:96 2:56 3:24 4=


Test Problems for the Second Part of the Strategy Comparison 349<br />

Minimum:<br />

Start:<br />

x =(1:275 0:6348 2:0 2:0) F (x ) ' 189:1 all Gj active<br />

x (0) =(1 2 3 4) F (x (0) )=1512<br />

The (1+1) evolution strategy only solved the problem very inaccurately. Due to the 1=5<br />

success rule the mutation variances vanish prematurely.<br />

Problem 2.34 after Fletcher <strong>and</strong> Powell (1963)<br />

Objective function:<br />

where<br />

or<br />

Constraints:<br />

Minimum:<br />

=<br />

F (x) = 100 h (x 3 ; 10 ) 2 +(R ; 1) 2i + x 2<br />

3<br />

8<br />

><<br />

>:<br />

1<br />

2<br />

x 1 = R cos (2 )<br />

x 2 = R sin (2 )<br />

R =<br />

q x 2<br />

1 + x 2<br />

2<br />

arctan x2<br />

x1 if x 2 6= 0 <strong>and</strong> x 1 > 0<br />

1<br />

2 if x2 =0<br />

1<br />

2<br />

+ arctan x2<br />

x1<br />

if x 2 6= 0 <strong>and</strong> x 1 < 0<br />

G 1(x) =;x 3 +7:5 0 G 2(x) =x 3 +2:5 0<br />

x =(1 ' 0 0) F (x )=0 no constraint isactive<br />

The objective function itself has a discontinuity atx 2 = 0, right at the minimum sought.<br />

Thus x 2 should only be allowed to approach closely to zero. Because of the multivalued<br />

trigonometric functions there are in nitely many solutions to the problem, of which only<br />

one, however, lies within the feasible region.<br />

Start:<br />

Problem 2.35 after Rosenbrock (1960)<br />

Objective function:<br />

x (0) =(;1 0 0) F (x (0) )=2500<br />

F (x)=;x 1 x 2 x 3


350 Appendix A<br />

Constraints:<br />

Gj(x) =xj 0 for j = 1(1)3<br />

G 4(x) =;x 1 ; 2 x 2 ; 2 x 3 +72 0<br />

The underlying question here was: What dimension should a parcel of maximum volume<br />

have, if the sum of its length <strong>and</strong> transverse circumference is bounded?<br />

Minimum:<br />

Start:<br />

x =(24 12 12) F (x )=;3456 G 4 active<br />

x (0) =(0 0 0) F (x (0) )=0<br />

All variants of the evolution strategy converged only to within the neighborhood of the<br />

minimum sought, because in the end only a fraction of all trials were feasible.<br />

Problem 2.36<br />

This is derived from Problem 2.35 by treating the constraint G 4,which is active at the<br />

minimum, as an equation, <strong>and</strong> thereby eliminating one of the free variables. With<br />

we obtain<br />

x 0<br />

1 +2x 0<br />

2 +2x 0<br />

3 =72<br />

F 0 (x) =;(72 ; 2 x 0<br />

2 ; 2 x 0<br />

3) x 2 x 3<br />

or by renumbering of the variables a new objective function:<br />

F (x) =;x 1 x 2 (72 ; 2 x 1 ; 2 x 2)<br />

Figure A.22: Graphical representation of Problem 2.36<br />

F (x) == ; 3400 ;3000 ;2000 ;1000 ;300 300 1000=


Test Problems for the Second Part of the Strategy Comparison 351<br />

Constraints:<br />

Minimum:<br />

Start:<br />

Gj(x) =xj 0 for j =1 2<br />

x =(12 12) F (x )=;3456 no constraints are active<br />

Problem 2.37 (corridor model)<br />

Objective function:<br />

Constraints:<br />

Gj(x) =<br />

8<br />

><<br />

>:<br />

x (0) =(1 1) F (x (0) )=;68<br />

F (x) =;<br />

nX<br />

i=1<br />

xi for n =3<br />

;xj + 100 0 for j = 1(1)n<br />

P<br />

xj;n+1 ; 1<br />

j;n<br />

j;n<br />

i=1<br />

;xj;2 n+2 + 1<br />

j;2 n+1<br />

j;2 n+1<br />

xi + q j;n+1<br />

j;n 0 for n +1 j 2 n ; 1<br />

P<br />

i=1<br />

xi + q j;2 n+2<br />

j;2 n+1 0 for 2 n j 3 n ; 2<br />

Figure A.23: Graphical representation of Problem 2.37 for n =2<br />

F (x) == ; 220 ;215 ;210 ;205 ;200 ;195<br />

;190 ;185 ;180 ;175 ;170 ;165 ;160=


352 Appendix A<br />

The constraints form a feasible region, which could be described as a corridor with a<br />

square cross section (three dimensionally speaking). The axis of the corridor runs along<br />

the diagonal in the space<br />

x 1 = x 2 = x 3 = :::= xn<br />

The contours of the linear objective function run perpendicular to this axis. In order<br />

to obtain a nite minimum further constraints were added, whereby a kind of pencil<br />

point is placed on the end of the corridor. In the absence of these additional constraints<br />

the problem corresponds to the corridor model used by Rechenberg (1973), for which he<br />

derived theoretically the rate of progress (a measure of the convergence rate) of the two<br />

membered evolution strategy.<br />

Minimum:<br />

Start:<br />

Problem 2.38<br />

x i =100 for i = 1(1)n F (x )=;300 G 1 to Gn active<br />

x (0)<br />

i =0 for i =1(1)n F (x (0) )=0<br />

As Problem 2.25, but with the additional constraints:<br />

Minimum:<br />

Start:<br />

Gj(x) =xj ; 1 0 for j = 1(1)n n =5<br />

x i =1 for i = 1(1)n F (x )=0 all Gj active<br />

x (0)<br />

i = ;10 for i = 1(1)n F (x (0) ) = 48884<br />

The starting point is not in the feasible region.<br />

Problem 2.39 after Rosen <strong>and</strong> Suzuki (1965)<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

F (x) =x 2<br />

1 + x 2<br />

2 +2x 2<br />

3 + x 2<br />

4 ; 5 x 1 ; 5 x 2 ; 21 x 3 +7x 4<br />

G1(x) = ;2 x 2<br />

1 ; x2<br />

2 ; x2<br />

3 ; 2 x1 + x2 + x4 +5 0<br />

G2(x) = ;x 2<br />

1 ; x 2<br />

2 ; x 2<br />

3 ; x 2<br />

4 ; x1 + x2 ; x3 + x4 +8 0<br />

G3(x) = ;x 2<br />

1 ; 2 x 2<br />

2 ; x 2<br />

3 ; 2 x 2<br />

4 + x1 + x4 +10 0<br />

x =(0 1 2 ;1) F (x )=;44 G 1 active


Test Problems for the Second Part of the Strategy Comparison 353<br />

Start:<br />

x (0) =(0 0 0 0) F (x (0) )=0<br />

None of the search methods that operate directly with constraints, i.e., without reformulating<br />

the objective functions, managed to solve the problem to satisfactory accuracy.<br />

Problem 2.40<br />

Objective function:<br />

Constraints:<br />

Gj(x) =<br />

8<br />

><<br />

>:<br />

F (x) =;<br />

5X<br />

i=1<br />

xj 0 for j = 1(1)5<br />

; 5P<br />

i=1<br />

xi<br />

(9 + i) xi + 50000 0 for j =6<br />

This is a simple linear programming problem. The solution is in a corner of the allowed<br />

region de ned by the constraints (simplex).<br />

Minimum:<br />

x =(5000 0 0 0 0) F (x )=;5000 G 2 to G 6 active<br />

Figure A.24: Graphical representation of Problem 2.40 on the plane<br />

x3 = x4 = x5 =0<br />

F (x) == ; 10500 ;9500 ;8500 ;7500 ;6500<br />

;5500 ;4500 ;3500;2500 ;1500;500 500=


354 Appendix A<br />

Start:<br />

x (0) =(250 250 250 250 250) F (x (0) )=;1250<br />

In terms of the values of the variables, none of the strategies tested achieved accuracies<br />

better than 10 ;2 . The two variants of the (10 , 100) evolution strategy came closest to<br />

the exact solution.<br />

Problem 2.41<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

Start:<br />

F (x) =;<br />

x = 0 0 0 0 50000<br />

14<br />

5X<br />

i=1<br />

(ixi)<br />

as for Problem 2.40<br />

F (x )= ;250000<br />

14<br />

Gj active forj =1 2 3 4 6<br />

x (0) =(250 250 250 250 250) F (x (0) )=;3750<br />

This problem di ers from the previous one only in the numerical values regarding the<br />

accuracies achieved, the same remarks apply as for Problem 2.40.<br />

Figure A.25: Graphical representation of Problem 2.41 on the plane<br />

x2 = x3 = x4 =0<br />

F (x) == ; 30000 ;25000 ;20000<br />

;15000 ;10000 ;5000 0=


Test Problems for the Second Part of the Strategy Comparison 355<br />

Problem 2.42<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

Start:<br />

F (x) =<br />

5X<br />

i=1<br />

(ixi)<br />

as for Problems 2.40 <strong>and</strong> 2.41<br />

x =(0 0 0 0 0) F (x )=0 G 1 to G 5 active<br />

x (0) = (250 250 250 250 250) F (x (0) )=3750<br />

The minimum is at the origin of coordinates. The evolution strategies were thus better<br />

able to approach the solution by adjusting the individual step lengths. The multimembered<br />

strategy with recombinations yielded an exact solution with variable values less<br />

than 10 ;38 .<br />

Problem 2.43<br />

As Problem 2.42, except:<br />

Start:<br />

x (0) =(;250 ;250 ;250 ;250 ;250) F (x (0) )=;3750<br />

The starting point is not in the feasible region.<br />

The solutions are the same as in Problem 2.42.<br />

Figure A.26: Graphical representation of Problem 2.42 on the plane<br />

x3 = x4 = x5 =0<br />

F (x) == ; 1000 1000 3000 5000<br />

7000 9000 11000 13000 15000=


356 Appendix A<br />

Problem 2.44<br />

As Problem 2.26, but with additional constraints:<br />

Minimum:<br />

G 1(x) =;x + 300 0 G 2(x) =x + 300 0<br />

x = ;300 F (x ) ';299:7 G 2 active<br />

Besides this global minimum there are ve more local minima within the feasible region.<br />

Start:<br />

x (0) =0 F (x (0) )=0<br />

The global minimum could only be located by multimembered evolution. All the other<br />

search strategies converged to the local minimum nearest to the starting point.<br />

Problem 2.45 of Smith <strong>and</strong> Rudd (after Leon, 1966a)<br />

Objective function:<br />

Constraints:<br />

Gj =<br />

F (x) =<br />

8<br />

><<br />

>:<br />

nX<br />

i=1<br />

x i<br />

i e;x i for n =5<br />

xj 0 for j = 1(1)n<br />

2 ; xj;n 0 for j = n + 1(1)2 n<br />

Figure A.27: Diagram F (x) for Problem 2.44


Test Problems for the Second Part of the Strategy Comparison 357<br />

Minimum:<br />

Figure A.28: Graphical representation of Problem 2.45 for n = 2<br />

F (x) == ; 1:0 0:0 0:3 0:4 0:6 0:8 0:9=<br />

x i =0 for i = 1(1)n F (x )=0 all G 1 to Gn active<br />

Besides this global minimum there is another local one:<br />

Start:<br />

x 0<br />

i =(2 0:::0) F (x 0 )=2e ;2 G 2 to Gn+1 active<br />

x (0)<br />

i =1 for i = 1(1)n F (x (0) ) ' 1:84<br />

In the neighborhood of the minimum sought, the rate of convergence of a search strategy<br />

depends strongly on its ability tomake widely di erent individual adjustments to the<br />

step lengths for the changes in the variables. The multimembered evolution solved this<br />

problem best when working with recombination. Rosenbrock's method converged to the<br />

local minimum, as did the complex method <strong>and</strong> the simple evolution strategies.<br />

Problem 2.46<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

Start:<br />

F (x) =x 2<br />

1 + x2<br />

2<br />

G 1(x) =x 1 +2x 2 ; 2 0<br />

x =(0:4 0:8) F (x )=0:8 G 1 active<br />

x (0) =(10 10) F (x (0) ) = 200


358 Appendix A<br />

Figure A.29: Graphical representation of Problem 2.46<br />

F (x) ==0:04 0:36 1:00 1:96 3:24 4:84 6:76=<br />

Problem 2.47 after Ueing (1971)<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

F (x) =;x 2<br />

1 ; x 2<br />

2<br />

Gj(x) = xj 0 for j =1 2<br />

G 3(x) = x 2<br />

1 + x 2<br />

2 ; 17 x 1 ; 5 x 2 +66 0<br />

G 4(x) = x 2<br />

1<br />

+ x2<br />

2 ; 10 x 1 ; 10 x 2 +41 0<br />

G 5(x) = x 2<br />

1 + x 2<br />

2 ; 4 x 1 ; 14 x 2 +45 0<br />

G 6(x) = ;x 1 +7 0<br />

G 7(x) = ;x 2 +7 0<br />

x =(6 0) F (x )=;36 G 2 <strong>and</strong> G 3 active<br />

Besides the global minimum x there are three other local minima:<br />

Start:<br />

x 0 ' (2:116 4:174) F (x 0 ) ';21:90<br />

x 00 =(0 5) F (x 00 )=;25<br />

x 000 =(5 2) , F (x 000 )=;29<br />

x (0) =(0 0) F (x (0) )=0


Test Problems for the Second Part of the Strategy Comparison 359<br />

Figure A.30: Graphical representation of Problem 2.47<br />

F (x) == ; 4 ;16 ;36 ;64 ;100;144;196;256=<br />

To the original problem have been added the two constraints G 6 <strong>and</strong> G 7. Without them<br />

there are two separate feasible regions <strong>and</strong> the global minimum is at in nity, inthe<br />

external, open region. Depending on the initial step lengths, the evolution strategies were<br />

sometimes able to go out from the starting point within the inner, closed region into the<br />

external region. After adding G 6 <strong>and</strong> G 7, the multimembered strategies converged to the<br />

global minimum, all other search methods located other local minima which of these was<br />

located by the two membered evolution strategy, depended on the sequence of r<strong>and</strong>om<br />

numbers.<br />

Problem 2.48 after Ueing (1971)<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

F (x) =;x 2<br />

1 ; x 2<br />

2<br />

Gj(x) = xj 0 for j =1 2<br />

G 3(x) = ;x 1 + x 2 +4 0<br />

G4(x) = x1 3 ; x2 +4 0<br />

G5(x) = x 2<br />

1 + x 2<br />

2 ; 10 x1 ; 10 x2 +41 0<br />

x =(12 8) F (x )=;208 G 3 <strong>and</strong> G 4 active<br />

Besides this global minimum there are two more local minima:<br />

x 0 ' (2:018 4:673) F (x 0 ) ';25:91


360 Appendix A<br />

Start:<br />

Figure A.31: Graphical representation of Problem 2.48<br />

F (x) == ; 4 ;16 ;36 ;64 ;100 ;144 ;196 ;256=<br />

x 00 ' (6:293 2:293) F (x 00 ) ';44:86<br />

x (0) =(0 0) F (x (0) )=0<br />

There are two feasible regions which are unconnected <strong>and</strong> closed. The starting point <strong>and</strong><br />

the global minimum are separated by a non-feasible region. Only the (10 , 100) evolution<br />

strategy converged to the global minimum. It sometimes happened with this strategy that<br />

one descendant of a generation would jump from one feasible region to the other however,<br />

the group of remaining individuals would converge to one of the other local minima. All<br />

other strategies did not converge to the global minimum.<br />

Problem 2.49 after Wolfe (1966)<br />

Objective function:<br />

Constraints:<br />

Minimum:<br />

Start:<br />

F (x) = 4<br />

3 (x2<br />

1 + x 2<br />

2 ; x1 x2) 3<br />

4 + x3 Gj(x) =xj 0 for j = 1(1)3<br />

x =(0 0 0) F (x )=0 all Gj active<br />

x (0) =(10 10 10) F (x (0) ) ' 52:16


Test Problems for the Third Part of the Strategy Comparison 361<br />

Problem 2.50<br />

As Problem 2.37, but with some other constraints:<br />

Minimum:<br />

Gj(x) =;xj + 100 0 for j = 1(1)n<br />

Gn+1(x) =1;<br />

0<br />

nX<br />

i=1<br />

@ 1<br />

n<br />

nX<br />

j=1<br />

(xj) ; xi<br />

x i =100 for i = 1(1)n F (x )=;300 for n =3 G 1 to Gn active<br />

Start:<br />

1<br />

A<br />

x (0)<br />

i =0 for i = 1(1)n F (x (0) )=0<br />

Instead of the 2 n ; 2 linear constraints of Problem 2.37, a non-linear constraint served<br />

here to bound the corridor at its sides. From a geometrical point of view, the cross section<br />

of the corridor for n =3variables is now circular instead of square. For n =2variables<br />

the two problems become equivalent.<br />

A.3 Test Problems for the Third Part of the<br />

Strategy Comparison<br />

These are usually n-dimensional extensions of problems from the second set of tests, whose<br />

numbers are given in brackets after the new problem number.<br />

Problem 3.1 (analogous to Problem 2.4)<br />

Objective function:<br />

Minimum:<br />

F (x)=<br />

nX<br />

i=1<br />

h<br />

(x1 ; x 2<br />

i )2 +(1; xi) 2<br />

i<br />

x i =1 for i = 1(1)n F (x )=0<br />

No noteworthy di culties arose in the solution of this <strong>and</strong> the following biquadratic<br />

problem with any of the comparison strategies. Away from the minimum, the contour<br />

patterns of the objective functions resemble those of the n-dimensional sphere problem<br />

(Problem 1.1). Nevertheless, the slight di erences caused most search methods to converge<br />

much moreslowly (typically by a factor 1=5). The simplex strategy was particularly<br />

a ected. The computation time it required were about 10 to 30 times as long as for<br />

the sphere problem with the same number of variables. With n = 100 <strong>and</strong> greater,<br />

the required accuracy was only achieved in Problem 3.1 after at least one collapse <strong>and</strong><br />

subsequent reconstruction of the simplex. The evolution strategies on the other h<strong>and</strong><br />

were all practically una ected by the di erence with respect to Problem 1.1. Also for<br />

the complex method the cost was only slightly higher, although with this strategy the<br />

computation time increased very rapidly with the number of variables for all problems.<br />

2<br />

0


362 Appendix A<br />

Problem 3.2 (analogous to Problem 2.25)<br />

Objective function:<br />

Minimum:<br />

F (x) =<br />

nX<br />

i=2<br />

Problem 3.3 (analogous to Problem 2.13)<br />

Objective function:<br />

F (x) =<br />

0<br />

nX<br />

@ nX<br />

i=1<br />

h<br />

(x1 ; x 2<br />

i ) 2 +(1; xi) 2<br />

i<br />

x i =1 for i =1(1)n F (x )=0<br />

j=1<br />

(aij sin j + bij cos j) ;<br />

nX<br />

j=1<br />

(aij sin xj + bij cos xj)<br />

where aijbij for i j = 1(1)n are integer r<strong>and</strong>om numbers from the range [;100 100], <strong>and</strong><br />

jj = 1(1)n are r<strong>and</strong>om numbers from the range [; ].<br />

Minimum:<br />

x i = i for i =1(1)n F (x )=0<br />

Besides this desired minimum there are numerous others that have the same value (see<br />

Problem 2.13). The aij <strong>and</strong> bij require storage space of order O(n 2 ). For this reason<br />

the maximum number of variables for which this problem could be set up had to be<br />

limited to nmax = 30. The computation time per function call also increases as O(n 2 ).<br />

The coordinate strategies ended the search for the minimum before reaching the required<br />

accuracy when 10 or more variables were involved. The method of Davies, Swann, <strong>and</strong><br />

Campey (DSC) with Gram-Schmidt orthogonalization <strong>and</strong> the complex method failed<br />

in the same way for 30 variables. For n = 30 the search simplex of the Nelder-Mead<br />

strategy also collapsed prematurely, but after a restart the minimum was su ciently<br />

well approximated. Depending on the sequence of r<strong>and</strong>om numbers, the two membered<br />

evolution strategy converged either to the desired minimum or to one of the others. This<br />

was not seen to occur with the multimembered strategies however, only one attempt<br />

could be made in each case because of the long computation times.<br />

Problem 3.4 (analogous to Problem 2.20)<br />

Objective function:<br />

Minimum:<br />

F (x) =<br />

nX<br />

i=1<br />

jxij<br />

x i =0 for i =1(1)n F (x )=0<br />

This problem presented no di culties to those strategies having a line (one dimensional)<br />

search subroutine, since the axes-parallel minimizations are always successful. The simplex<br />

method on the other h<strong>and</strong> required several restarts even for just 30 variables, <strong>and</strong><br />

1<br />

A<br />

2


Test Problems for the Third Part of the Strategy Comparison 363<br />

for n = 100 variables it had to be interrupted, as it exceeded the maximum permitted<br />

computation time (8 hours) without achieving the required accuracy. The success or failure<br />

of the (1+1) evolution strategy <strong>and</strong> the complex method depended upon the actual<br />

r<strong>and</strong>om numbers. Therefore, in this <strong>and</strong> the following problems, whenever there was<br />

any doubt about convergence, several (at least three) attempts were made with di erent<br />

sequences of r<strong>and</strong>om numbers. It was seen that the two membered evolution strategy<br />

sometimes spent longer near one of the corners formed by the contours of the objective<br />

function, where it converged only slowly however, it nally escaped from this situation.<br />

Thus, although the computation times were very varied, the search was never terminated<br />

prematurely. The success of the multimembered evolution strategy depended on whether<br />

or not recombination was implemented. Without recombination the method sometimes<br />

failed for just 30 variables, whereas with recombination it converged safely <strong>and</strong> with no<br />

periods of stagnation. In the latter case the computation times taken were actually no<br />

longer than for the sphere problem with the same number of variables.<br />

Problem 3.5 (analogous to Problem 2.21)<br />

Objective function:<br />

Minimum:<br />

F (x) = max<br />

i fxi i=1(1)ng<br />

x i =0 for i = 1(1)n F (x )=0<br />

Most of the methods using a one dimensional search failed here, because the value of the<br />

objective function is piecewise constant along the coordinate directions. The methods of<br />

Rosenbrock <strong>and</strong> of Davies, Swann, <strong>and</strong> Campey (whatever the method of orthogonalization)<br />

converged safely, since they consider trial steps that do not change the objective<br />

function value as successful. If only true improvements are accepted, as in the conjugate<br />

gradient, variable metric, <strong>and</strong> coordinate strategies, the search never even leaves the chosen<br />

starting point at one of the corners of the contour surface. The simplex <strong>and</strong> complex<br />

strategies failed for n>30 variables. Even for just 10 variables the search simplex of the<br />

Nelder-Mead method had to be constructed anew after collapsing 185 times, before the<br />

desired accuracy could be achieved. For the evolution strategy with only one parent <strong>and</strong><br />

one descendant, the probability of nding from the starting point apoint with a better<br />

value of the objective function is<br />

we =2 ;n<br />

For this reason the (1+1) strategy failed for n 10. The multimembered version without<br />

recombination, could solve the problem for up to n =10variables. With recombination,<br />

convergence was sometimes still achieved for n =30variables, but no longer for n =100<br />

in the three attempts made.


364 Appendix A<br />

Problem 3.6 (analogous to Problem 2.22)<br />

Objective function:<br />

Minimum:<br />

F (x) =<br />

nX<br />

i=1<br />

jxij +<br />

nY<br />

i=1<br />

jxij<br />

x i =0 for i =1(1)n F (x )=0<br />

In spite of the even sharper corners on the contour surfaces of the objective function<br />

all the strategies behaved in much the same way as they did in the minimum search<br />

of Problem 3.4. The only notable di erence was with the (10 , 100) evolution strategy<br />

without recombination. For n = 30 variables the minimum search always converged<br />

only for n = 100 <strong>and</strong> above the search was no longer successful.<br />

Problem 3.7 (analogous to Problem 2.23)<br />

Objective function:<br />

Minimum:<br />

F (x)=<br />

nX<br />

i=1<br />

x 10<br />

i<br />

x i =0 for i =1(1)n F (x )=0<br />

The strategy of Powell failed for n 10 variables. Since all the step lengths were set<br />

to zero the search stagnated <strong>and</strong> the internal termination criterion did not take e ect.<br />

The optimization had to be interrupted externally. From n = 30, the variable metric<br />

method was also ine ective. The quadratic model of the objective function on which it<br />

is based led to completely false predictions of suitable search directions. For n = 10 the<br />

simplex method required 48 restarts, <strong>and</strong> for n =30asmany as 181 in order to achieve<br />

the desired accuracy. None of the evolution strategies had any convergence di culties<br />

in solving the problem. They were not tested further for n >300 simply for reasons of<br />

computation time.<br />

Problem 3.8 (similar to Problem 2.37) (corridor model)<br />

Objective function:<br />

Constraints:<br />

Gj(x) =<br />

8<br />

><<br />

>:<br />

q<br />

j+1<br />

j + xj+1 ; 1<br />

j<br />

F (x) =;<br />

jP<br />

i=1<br />

nX<br />

i=1<br />

xi<br />

xi 0 for j =1(1)n ; 1<br />

q<br />

j;n+2<br />

j;n+1 ; xj;n+2 + 1<br />

j;n+1 P<br />

xi 0 for j = n(1)2 n ; 2<br />

j;n+1<br />

i=1<br />

The other constraints of Problem 2.37, which bound the corridor in the direction of the<br />

minimum being sought, were omitted here. The minimum is thus at in nity.


Test Problems for the Third Part of the Strategy Comparison 365<br />

In comparing the results of this <strong>and</strong> the following circularly bounded corridor problem<br />

with the theoretical rates of progress for this model function, the quantity ofinterest was<br />

the cost, not of reaching a given approximation to an objective, but of covering a given<br />

distance along the corridor axis. For the half-width of the corridor, b =1was taken. The<br />

search was started at the origin <strong>and</strong> terminated as soon as a distance s 10 b had been<br />

covered, or the objective function had reached a value F ;10 p n:<br />

Start:<br />

x (0)<br />

i =0 for i = 1(1)n F (x (0) )=0<br />

All the tested strategies converged satisfactorily. Thenumber of mutations or generations<br />

required by theevolution strategies increased linearly with the number of variables, as<br />

expected. Since the number of constraints, as well as the computation time per function<br />

call, increased as O(n), the total computation time increased as O(n 3 ). Because of the<br />

maximum of 8 hours per search adopted as a limit on the computation time, the two<br />

membered evolution strategy could only be tested to n = 300, <strong>and</strong> the multimembered<br />

strategies to n = 100. Intermediate results for n = 300, however, con rm that the<br />

expected trend is maintained.<br />

Problem 3.9 (similar to Problem 2.50)<br />

Objective function:<br />

Constraint:<br />

G(x) =1;<br />

F (x) =;<br />

0<br />

nX<br />

i=1<br />

@ 1<br />

n<br />

nX<br />

j=1<br />

nX<br />

i=1<br />

xi<br />

(xj) ; xi<br />

Minimum, starting point <strong>and</strong> convergence criterion as in Problem 3.8.<br />

The complex method failed for n 30, but the Rosenbrock strategy simply required<br />

more objective function evaluations <strong>and</strong> orthogonalizations compared to the rectangular<br />

corridor. The evolution strategies converged safely. They too required more mutations<br />

or generations than in the previous problem. However, since only one constraint instead<br />

of 2 n ; 2was to be tested <strong>and</strong> respected, the time they took only increased as O(n 2 ).<br />

Recombination in the multimembered version was only a very slight advantage for this<br />

<strong>and</strong> the linearly bounded corridor problem.<br />

Problem 3.10 (analogous to Problem 2.45)<br />

Objective function:<br />

Constraints:<br />

Gj(x) =<br />

8<br />

><<br />

>:<br />

F (x)=<br />

nX<br />

i=1<br />

x i<br />

i e;x i<br />

1<br />

A<br />

xj 0 for j = 1(1)n<br />

2 ; xj;n 0 for j = n + 1(1)2 n<br />

2<br />

0


366 Appendix A<br />

Minimum:<br />

x i =0 for i =1(1)n F (x )=0 all G 1 to Gn active<br />

Besides this global minimum there is a local one within the feasible region:<br />

x 0<br />

i =<br />

( 2 for i =1<br />

0 for i = 2(1)n F (x0 )=2e ;2<br />

As in the solution of Problem 2.45 with vevariables, the search methods only converged if<br />

they could adjust the step lengths individually. The strategy of Rosenbrock failed for only<br />

n = 10. The complex method sometimes converged for the same number of variables after<br />

about 1,000 seconds of computation time, but occasionally not even within the allotted 8<br />

hours. For n =30variables, none of the strategies reached the objective before the time<br />

limit expired. The results obtained after 8 hours showed clearly that better progress was<br />

being made by thetwo membered evolution strategy <strong>and</strong> the multimembered strategy<br />

with recombination. The following table gives the best objective function values obtained<br />

by each of the strategies compared.<br />

Rosenbrock 10 ;4y<br />

Complex 10 ;7<br />

(1 + 1) evolution 10 ;30<br />

(10 100) evolution without recombination 10 ;12<br />

(10 100) evolution with recombination 10 ;26<br />

y The Rosenbrock strategy ended the search prematurely after about 5 hours. All the other values are<br />

intermediate results after 8 hours of computation time when the strategy's own termination criteria were<br />

not yet satis ed. The searches could therefore still have come to a successful conclusion.


Appendix B<br />

Program Codes<br />

This appendix contains the two FORTRAN programs EVOL <strong>and</strong> GRUP (with option<br />

REKO) used for the test series described in Chapter 6 plus the extension KORR as of<br />

1976, whichcovers all features of GRUP/REKOaswell as correlated mutations (Schwefel,<br />

1974 see also Chap. 7) introduced shortly after the rst German version of this work was<br />

nished in 1974 (<strong>and</strong> reproduced as monograph by Birkhauser, Basle, Switzerl<strong>and</strong>, in<br />

1977). GRUP <strong>and</strong> REKO thus should no longer be used or imitated.<br />

B.1 (1+1) <strong>Evolution</strong> Strategy EVOL<br />

1. Purpose<br />

The EVOL subroutine is a FORTRAN coding of the two membered evolution strategy.<br />

It is an iterative direct search strategy for a parameter optimization problem. A search<br />

is made for the minimum in a non-linear function of an arbitrary but nite number of<br />

continuously variable parameters. Derivatives of the objective function are not required.<br />

Constraints in the form of inequalities can be incorporated (right h<strong>and</strong> side 0). The<br />

user must supply initial values for the variables <strong>and</strong> for the appropriate step sizes. If the<br />

initial state is not feasible, a search is made for a feasible point by minimizing the sum of<br />

the negative values for the constraints that have been violated.<br />

2. Subroutine parameter list<br />

EVOL (N,M,LF,LR,LS,TM,EA,EB,EC,ED,SN,FB,XB,SM,X,F,G,T,Z,R)<br />

All parameters apart from LF, FB, X, <strong>and</strong> Z must be assigned values or names either<br />

before or when the subroutine is called. The variables XB <strong>and</strong> SM do not retain the<br />

values initially assigned to them.<br />

N (integer) Number of parameters (>0).<br />

M (integer) Number of constraints ( 0).<br />

367


368 Appendix B<br />

LF (integer) Return code with the following meaning:<br />

LF=;2 Starting point not feasible <strong>and</strong> search for a feasible state unsuccessful.<br />

Feasible region probably empty.<br />

LF=;1 Starting point not feasible <strong>and</strong> search for a feasible state terminated<br />

because time limit was reached.<br />

LF=0 Starting point not feasible, search for a feasible state successful. The<br />

nal values of XB can be used as starting point for a subsequent search<br />

for a minimum if EVOL is called again.<br />

LF=1 Search for minimum terminated because time limit was reached.<br />

LF=2 Search for minimum terminated in an orderly fashion. No further improvement<br />

in the value of the objective function could be achieved in the<br />

context of the given accuracy parameters. Probably the nal state XB<br />

(extreme point) having FB (extreme value) lies near a local minimum,<br />

perhaps the global minimum.<br />

LR (integer) Auxiliary quantity used in step size management. Normal value 1.0.<br />

The step sizes are adjusted so that on average one success (improvement<br />

in the value of the objective function) is obtained in 5 LR trials<br />

(objective function calls). This is computed on the last 10 N LR<br />

trials.<br />

LS (integer) Auxiliary quantity used in convergence testing. Minimum value 2.0.<br />

The search is terminated if the value of the objective function has improved<br />

by less than EC (absolute) or ED (relative) in the course of 10<br />

N LR LS trials. Note: the step sizes are reduced by atmosta<br />

factor SN 10 LS during this period. The factor is 0:2 LS if SN = 0.85<br />

is selected.<br />

TM (real) Parameter used in controlling the computation time, e.g., the maximum<br />

CPU time in seconds, depending on the function designated T (see<br />

below). The search is terminated if T > TM. This check is performed<br />

after each N LR mutations (objective function calls).<br />

EA (real) Lower bound to step sizes, absolute. EA > 0.0 must be chosen large<br />

enough to be treated as di erent from 0.0 within the accuracy of the<br />

computer used.<br />

EB (real) Lower bound to step sizes relativetovalues of variables. EB > 0.0 must<br />

be chosen large enough for 1:0 + EB to be treated as di erent from 1:0<br />

within the accuracy of the computer used.<br />

EC (real) Parameter in convergence test, absolute. See under LS. (EC > 0.0, see<br />

EA).<br />

ED (real) Parameter in convergence test, relative. See under LS. (1:0 +ED> 1:0,<br />

see EB). Convergence is assumed if the data pass one or both tests. If<br />

it is desired to suppress a test, it is possible either to set EC = 0.0 or to<br />

choose a value for ED such that1:0 + ED = 1.0 but ED > 0.0 within<br />

the accuracy of the machine.<br />

SN (real) Auxiliary variable for step size adjustment. Normal value 0.85. The<br />

step size can be kept constant during the trials by setting SN = 1:0.<br />

The success rate indicated by LR is used to adjust the step size by a<br />

factorSNor1:0/SN after every N LR trials.


(1 + 1) <strong>Evolution</strong> Strategy EVOL 369<br />

FB (real) Best value of objective function obtained during the search.<br />

XB (one dimensional On call: holds initial values of variables.<br />

real array of On exit: holds best values of variables corresponding to FB.<br />

length N)<br />

SM (one dimensional On call: holds initial values of step sizes (more precisely, st<strong>and</strong>real<br />

array of ard deviations of components of the mutation vector).<br />

length N) On exit: holds current step sizes of the last (not necessarily<br />

successful) mutation. <strong>Optimum</strong> initialization: somewhere<br />

near the local radius of curvature of the objective function<br />

hypersurface divided by the number of variables. More practical<br />

suggestion: SM(I) = DX(I)/SQRT(N), where DX(I) is<br />

a crude estimate of either the distance between start <strong>and</strong> expected<br />

optimum or the maximum uncertainty range for the<br />

variable X(I). If the SM(I) are initially set too large, a certain<br />

time elapses before they are appropriately adjusted. This is<br />

advantageous as regards the probability of locating the global<br />

optimum in the presence of several local optima.<br />

X (one dimensional Space for holding a variable vector.<br />

real array of<br />

length N)<br />

F (real function) Name of the objective function, which istobeprovided by the<br />

user.<br />

G (real function) Name of the function used in calculating the values of the<br />

constraint functions to be provided by the user.<br />

T (real function) Name of the function used in controlling the computation time.<br />

Z (real function) Name of the function used in transforming a uniform r<strong>and</strong>om<br />

number distribution to a normal distribution. If the nomenclature<br />

Z is retained, the function Z appended to the EVOL<br />

subroutine can be used for this purpose.<br />

R (real function) Name of the function that generates a uniform r<strong>and</strong>om number<br />

distribution.<br />

3. Method<br />

See I. Rechenberg, <strong>Evolution</strong> Strategy: Optimization of Technical Systems in Accordance<br />

with the Principles of Biological <strong>Evolution</strong> (in German), vol. 15 of Problemata series, Verlag<br />

Frommann-Holzboog, Stuttgart, 1973 also H.-P. Schwefel, Numerical Optimization of<br />

Computer Models, Wiley, Chichester, 1981 (translated by M. W. Finnis from Numerische<br />

Optimierung von Computer-Modellen mittels der <strong>Evolution</strong>sstrategie, vol. 26 of Interdisciplinary<br />

Systems Research, Birkhauser, Basle, Switzerl<strong>and</strong>, 1977).<br />

The method is based on a very much simpli ed simulation of biological evolution using<br />

the principles of mutation (r<strong>and</strong>om changes in variables, normal distribution for change<br />

vector) <strong>and</strong> selection (elimination of deteriorations <strong>and</strong> retention of improvements). The<br />

widths of the normal distribution (or step sizes) are controlled by reference to the ratio<br />

of the number of improvements to the number of mutations.


370 Appendix B<br />

4. Convergence criterion<br />

Based on the change in the value of the objective function (see under LS, EC, <strong>and</strong> ED).<br />

5. Peripheral I/O: none.<br />

6. Notes<br />

If there are several (local) minima, only one is located. Which one actually is found depends<br />

on the initial values of variables <strong>and</strong> step sizes as well as on the r<strong>and</strong>om number<br />

sequence. In such cases it is recommended to repeat the search several times with di erent<br />

sets of initial values <strong>and</strong>/or r<strong>and</strong>om numbers. The approximation to the minimum<br />

is usually poor if the search terminates at the boundary of the feasible region de ned by<br />

the constraints. Better results can then be obtained by setting LR > 1, LS > 2, <strong>and</strong>/or<br />

SN > 0.85 (maximum value 1.0). In addition, the bounds EA <strong>and</strong> EB should not be<br />

made too small. The same applies if the objective function has discontinuous rst partial<br />

derivatives (e.g., in the case of Tchebyche approximation).<br />

7. Subroutines or functions used<br />

The function names should be declared as external in the segment that calls EVOL.<br />

7.1 Objective function<br />

This is to be written by the user in the form:<br />

-----------------------------------------------------<br />

FUNCTION F(N,X)<br />

DIMENSION X(N)<br />

...<br />

F=...<br />

RETURN<br />

END<br />

-----------------------------------------------------<br />

N represents the number of parameters, <strong>and</strong> X represents the formal parameter vector.<br />

The function should be written on the basis that EVOL searches for a minimum ifa<br />

maximum is to be sought, F must be supplied with a negative sign.<br />

7.2 Constraints function<br />

This is to be written by the user in the general style:<br />

-----------------------------------------------------<br />

FUNCTION G(J,N,X)<br />

DIMENSION X(N)<br />

GOTO(1,2,3,...,(M)),J<br />

1 G=...<br />

RETURN<br />

2 G=...


(1 + 1) <strong>Evolution</strong> Strategy EVOL 371<br />

RETURN<br />

...<br />

(M) G=...<br />

RETURN<br />

END<br />

-----------------------------------------------------<br />

N <strong>and</strong> X have the meanings described for the objective function, while J (integer) is the<br />

serial number of the constraint. The statements should be written on the basis that EVOL<br />

will accept vector X as feasible if all the G values are larger than or equal to 0.0.<br />

7.3 Function for controlling the computation time<br />

This may be de ned by the user or called from the subroutine library in the particular<br />

machine. The following structure is assumed:<br />

REAL FUNCTION T(D)<br />

where D is a dummy parameter. T should be assigned the monitored quantity, e.g.,<br />

the CPU time in seconds limited by TM.Many computers are supplied with ready-made<br />

timing software. If this is given as a function, only its name needs to be supplied to EVOL<br />

instead of T as a parameter. If it is a subroutine, the user can program the required<br />

function. For example, the subroutine might be called SECOND(I), where parameter I is<br />

an integer representing the CPU time in microseconds, in which case one could program:<br />

-----------------------------------------------------<br />

FUNCTION T(D)<br />

CALL SECOND(I)<br />

T=1.E-6*FLOAT(I)<br />

RETURN<br />

END<br />

-----------------------------------------------------<br />

7.4 Function for transforming a uniformly distributed r<strong>and</strong>om number to a normally<br />

distributed one<br />

See under Section 8.<br />

7.5 Function for generating a uniform r<strong>and</strong>om number distribution in the range (0,1]<br />

The structure must be<br />

REAL FUNCTION R(D)<br />

where D is dummy. R is the value of the r<strong>and</strong>om number. Note: The smallest value of<br />

Rmust be large enough for the natural logarithm to be generated without oating-point<br />

over ow. The st<strong>and</strong>ard library usually includes a suitable program, in which case only<br />

the appropriate name has to be supplied to EVOL.


372 Appendix B<br />

8. Function Z(S,R)<br />

This function converts a uniform r<strong>and</strong>om number distribution to a normal distribution<br />

pairwise by means of the Box-Muller rules. The st<strong>and</strong>ard deviation is supplied as parameter<br />

S, while the expectation value for the mean is always 0.0. The quantity LZis<br />

common to EVOL <strong>and</strong> Z by virtue of a COMMON block <strong>and</strong> acts as a switch to transmit<br />

only one of the two r<strong>and</strong>om numbers generated in response to each second call.<br />

---------------------------------------------------------<br />

SUBROUTINE EVOL(N,M,LF,LR,LS,TM,EA,EB,EC,ED,SN,FB,<br />

1XB,SM,X,F,G,T,Z,R)<br />

DIMENSION XB(1),SM(1),X(1),L(10)<br />

COMMON/EVZ/LZ<br />

EXTERNAL R<br />

TN=TM+T(D)<br />

LZ=1<br />

IF(M)4,4,1<br />

1 LF=-1<br />

C<br />

C FEASIBILITY CHECK<br />

C<br />

FB=0.<br />

DO 3 J=1,M<br />

FG=G(J,N,XB)<br />

IF(FG)2,3,3<br />

2 FB=FB-FG<br />

3 CONTINUE<br />

IF(FB)4,4,5<br />

C<br />

C ALL CONSTRAINTS SATISFIED IF FB


(1 + 1) <strong>Evolution</strong> Strategy EVOL 373<br />

7 DO 8 I=1,N<br />

8 X(I)=XB(I)+Z(SM(I),R)<br />

IF(LF)9,9,12<br />

C<br />

C AUXILIARY OBJECTIVE<br />

C<br />

9 FF=0.<br />

DO 11 J=1,M<br />

FG=G(J,N,X)<br />

IF(FG)10,11,11<br />

10 FF=FF-FG<br />

11 CONTINUE<br />

IF(FF)32,32,16<br />

C<br />

C ALL CONSTRAINTS SATISFIED IF FF


374 Appendix B<br />

25 L(K)=L(K+1)<br />

L(10)=LE<br />

LM=0<br />

LC=LC+1<br />

IF(LC-10*LS)31,26,26<br />

C<br />

C CONVERGENCE CRITERION<br />

C<br />

26 IF(FC-FB-EC)28,28,27<br />

27 IF((FC-FB)/ED-ABS(FC))28,28,30<br />

28 LF=ISIGN(2,LF)<br />

29 RETURN<br />

30 LC=0<br />

FC=FB<br />

C<br />

C TIME CONTROL<br />

C<br />

31 IF(T(D)-TN)7,29,29<br />

32 DO 33 I=1,N<br />

33 XB(I)=X(I)<br />

FB=F(N,XB)<br />

LF=0<br />

GOTO 29<br />

END<br />

---------------------------------------------------------<br />

FUNCTION Z(S,R)<br />

COMMON/EVZ/LZ<br />

DATA ZP/6.28318531/<br />

GOTO(1,2),LZ<br />

1 A=SQRT(-2.*ALOG(R(D)))<br />

B=ZP*R(D)<br />

Z=S*A*SIN(B)<br />

LZ=2<br />

RETURN<br />

2 Z=S*A*COS(B)<br />

LZ=1<br />

RETURN<br />

END<br />

---------------------------------------------------------


( , ) <strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO 375<br />

B.2 ( , )<strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO<br />

1. Purpose<br />

The GRUP subroutine is a FORTRAN program to h<strong>and</strong>le a multimembered (L,LL) evolution<br />

strategy with L parents <strong>and</strong> LL descendants per generation. It is an iterative direct<br />

search strategy for h<strong>and</strong>ling parameter optimization problems. A search is made for the<br />

minimum in a non-linear function of an arbitrary but nite number of continuously variable<br />

parameters. Derivatives of the objective function are not required. Constraints in<br />

the form of inequalities can be incorporated (right h<strong>and</strong> side 0). The user must supply<br />

initial values for the variables <strong>and</strong> for the appropriate step sizes. If the initial state is<br />

not feasible, a search is made for a feasible point that minimizes the sum of the negative<br />

values for the constraints that have been violated.<br />

2. Subroutine parameter list<br />

GRUP (REKO,L,LL,N,M,LF,TM,EA,EB,EC,ED,SN,FA,FB,XB,SM,X,FK,XK,SK,F,G,T,Z,R)<br />

All parameters apart from LF, FA, FB, X, FK, XK, SK, <strong>and</strong> Z must be assigned values<br />

or names before or when the subroutine is called. The variables XB <strong>and</strong> SM do not retain<br />

the values initially assigned to them.<br />

REKO (logical) Switch for alternative with/without recombination.<br />

REKO=.FALSE. No recombination. The step sizes retain the relationship<br />

initially assigned to them.<br />

REKO=.TRUE. Recombination occurs. The relationships between the step<br />

sizes alter during the search.<br />

L (integer) Number of parents ( 1). This parameter should not be<br />

chosen too small if recombination is to occur.<br />

LL (integer) Number of descendants (>L). Recommended tochoose a<br />

value 6 L.<br />

N (integer) Number of parameters (>0).<br />

M (integer) Number of constraints ( 0).<br />

LF (integer) Return code with the following meanings:<br />

LF=;2 Starting point not feasible <strong>and</strong> search for a feasible state<br />

unsuccessful. Feasible region probably empty.<br />

LF=;1 Starting point not feasible <strong>and</strong> search for a feasible state<br />

terminated because time limit was reached.<br />

LF=0 Starting point not feasible, search for a feasible state successful.<br />

The nal values of XB can be used as starting point<br />

for the search for a minimum if GRUP is called again.<br />

LF=1 Search for minimum terminated because time limit was<br />

reached.<br />

LF=2 Search for minimum terminated in an orderly fashion.


376 Appendix B<br />

LF=2 (continued) No further improvement inthevalue of the objective function<br />

could be achieved in the context of the framework of the<br />

given accuracy parameters. Probably the nal state XB (extreme<br />

value) lies near a local minimum, perhaps the global<br />

minimum.<br />

TM (real) Parameter used in monitoring the computation time, e.g., the<br />

maximum CPU time in seconds, depending on the function<br />

designated T (see below). The search is terminated if T ><br />

TM. This check is performed after every generation = LL<br />

objective function calls.<br />

EA (real) Lower bound to step sizes, absolute. EA > 0.0 must be chosen<br />

large enough to be treated as di erent from 0.0 within the<br />

accuracy of the computer used.<br />

EB (real) Lower bound to step sizes relativetovalues of variables. EB ><br />

0.0 must be chosen large enough for 1:0 + EB to be treated as<br />

di erent from 1.0 within the accuracy of the computer used.<br />

EC (real) Parameter in convergence test, absolute. The search is terminated<br />

if the di erence between the best <strong>and</strong> worst values<br />

of the objective function within a generation is less than or<br />

equal to EC (EC > 0.0, see EA).<br />

ED (real) Parameter in convergence test, relative. The search is terminated<br />

if the di erence between the best <strong>and</strong> worst values<br />

of the objective function within a generation is less than or<br />

equaltoEDmultiplied by the absolute value of the mean of<br />

the objective function as taken over all L parents in a generation<br />

(1.0 + ED > 1.0, see EB). Convergence is assumed if the<br />

data pass one or both tests. If it is desired to delete a test, it<br />

is possible either to set EC = 0.0 or to choose a value for ED<br />

such that 1.0 + ED = 1.0 but ED > 0.0 within the accuracy<br />

of the machine.<br />

SN (real) Auxiliary quantity used in step size adjustment. Normal value<br />

C/SQRT(N), with C > 0.0, e.g., C = 1.0 for L = 10 <strong>and</strong> LL<br />

=100. C can be increased as LL increases, but it must be<br />

reduced as L increases. An approximation for L = 1 is LL<br />

proportional to SQRT(C) EXP(C).<br />

FA (real) Current best objective function value for population.<br />

FB (real) Best value of objective function attained during the whole<br />

search. The minimum found may not be unique if FB di ers<br />

from FA because: (1) there is a state with an even smaller<br />

value for the objective function (e.g., near a local minimum<br />

or even near the global minimum) that has been lost over<br />

the generations or (2) the minimum consists of several quasisingular<br />

peaks on account of the nite accuracy of the computer<br />

used. Usually, the di erence between FA <strong>and</strong> FB is<br />

larger in the rst case than in the second, if EC <strong>and</strong> ED have<br />

been assigned small values.


( , ) <strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO 377<br />

XB (one dimensional On call: holds initial values of variables.<br />

real array of On exit: holds best values of variables corresponding to FB.<br />

length N)<br />

SM (one dimensional On call: holds initial values of step sizes (more precisely, st<strong>and</strong>real<br />

array of ard deviations of components of the mutation vector).<br />

length N) On exit: holds current step sizes of the last (not necessarily successful)<br />

mutation. <strong>Optimum</strong> initialization: somewhere near the<br />

local radius of curvature of the objective function hypersurface<br />

divided by thenumber of variables. More practical suggestion:<br />

SM(I) = DX(I)/SQRT(N), where DX(I) is a crude estimate of<br />

either the distance between start <strong>and</strong> expected optimum or the<br />

maximum uncertainty range for the variable X(I). If the SM(I)<br />

are initially set too large, it may happen that a good starting<br />

point is lost in the rst generation. This is advantageous as<br />

regards the probability of locating the global optimum in the<br />

presence of several local optima.<br />

X (one dimensional Space for holding a variable vector.<br />

real array of<br />

length N)<br />

FK (one dimensional Holds objective function values for the L best individuals in<br />

real array of each of the last two generations.<br />

length 2 L)<br />

XK (one dimensional Holds the variable values for N components for each of the L<br />

real array of parents in each of the last two generations. XK(1) to XK(N)<br />

length 2 L N) hold the state vector X for the rst individual, the next N<br />

locations do the same for the second, <strong>and</strong> so on.<br />

SK (one dimensional Holds the st<strong>and</strong>ard deviations, structure as for XK.<br />

real array of<br />

length 2 L N)<br />

F (real function) Name of the objective function, which is to be programmed by<br />

the user.<br />

G (real function) Name of the function for calculating the values of the constraints,<br />

to be programmed by the user.<br />

T (real function) Name of function used in monitoring the computation time.<br />

Z (real function) Name of function used in transforming a uniform r<strong>and</strong>om number<br />

distribution to a normal distribution. If the name Z is retained,<br />

the function Z listed after the GRUP subroutine can<br />

be used for this purpose.<br />

R (real function) Name of the function that generates a uniform r<strong>and</strong>om number<br />

distribution.<br />

3. Method<br />

GRUP has been developed from EVOL. The method is based on a very much simpli ed<br />

simulation of biological evolution. See I. Rechenberg, <strong>Evolution</strong> Strategy: Optimization<br />

of Technical Systems in Accordance with the Principles of Biological <strong>Evolution</strong> (in Ger-


378 Appendix B<br />

man), vol. 15 of Problemata series, Verlag Frommann-Holzboog, Stuttgart, 1973 also<br />

H.-P. Schwefel, Numerical Optimization of Computer Models, Wiley, Chichester, 1981<br />

(translated by M. W. Finnis from Numerische Optimierung von Computer-Modellen mittels<br />

der <strong>Evolution</strong>sstrategie, vol. 26 of Interdisciplinary Systems Research, Birkhauser,<br />

Basle, Switzerl<strong>and</strong>, 1977).<br />

The current L parameter vectors are used to generate LL new ones by means of small<br />

r<strong>and</strong>om changes.<br />

The best L of these become the initial ones for the next generation (iteration). At the<br />

same time, the step sizes (st<strong>and</strong>ard deviations) for the changes in the variables (strategy<br />

parameters) are altered. The selection leads to adaptation to the local topology if LL/L<br />

is assigned a suitably large value, e.g., >6. The r<strong>and</strong>om changes in the parameters are<br />

produced by the addition of normally distributed r<strong>and</strong>om numbers, while those in the<br />

step sizes are produced from r<strong>and</strong>om numbers with a log-normal distribution by multiplication.<br />

4. Convergence criterion<br />

Based on the di erences in value of the objective function (see under EC <strong>and</strong> ED).<br />

5. Peripheral I/O: none.<br />

6. Notes<br />

The multimembered strategy represents an improvement in reliabilityover the two membered<br />

strategy. On the other h<strong>and</strong>, the run time is greater when an ordinary (serial)<br />

digital computer is used. The run time increases less rapidly than in proportion to LL<br />

(the number of descendants per generation), because increasing LL increases the convergence<br />

rate (over the generations). However, minima at a boundary of the feasible<br />

region or at a vertex are attained only slowly or inexactly. In any case, although the<br />

certainty of global convergence cannot be guaranteed, numerical tests have shown that<br />

the multimembered strategy is far better than other search procedures in this respect. It<br />

is capable of h<strong>and</strong>ling separated feasible regions provided that the number of parameters<br />

is not large <strong>and</strong> that the initial step sizes are set suitably large. In doubtful cases it is<br />

recommended to repeat the search each time with a di erent set of initial values <strong>and</strong>/or<br />

r<strong>and</strong>om numbers. If the optimum being sought lies at a boundary of the feasible region, it<br />

is probably better to choose a value for SN (the parameter governing the rates of change<br />

of the st<strong>and</strong>ard deviations) less than the (maximal) value suggested above.<br />

7. Subroutines or functions used<br />

The function names are to be declared as external in the segment that calls GRUP.<br />

7.1 Objective function<br />

To be written by the user in the form:


( , ) <strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO 379<br />

-----------------------------------------------------<br />

FUNCTION F(N,X)<br />

DIMENSION X(N)<br />

...<br />

...<br />

F=...<br />

RETURN<br />

END<br />

-----------------------------------------------------<br />

N represents the number of parameters, <strong>and</strong> X represents the formal parameter vector.<br />

GRUP supplies the actual values. The function should be written on the basis that GRUP<br />

searches for a minimum if a maximum is to be sought, F must be supplied with a negative<br />

sign.<br />

7.2 Constraints function<br />

To bewrittenby the user in the general style:<br />

-----------------------------------------------------<br />

FUNCTION G(J,N,X)<br />

DIMENSION X(N)<br />

GOTO(1,2,3,...,(M)),J<br />

1 G=...<br />

RETURN<br />

2 G=...<br />

RETURN<br />

...<br />

...<br />

(M) G=...<br />

RETURN<br />

END<br />

-----------------------------------------------------<br />

N <strong>and</strong> X have the meanings described for the objective function, while J (integer) is<br />

the serial number of the constraint. The statements should be written on the basis that<br />

GRUP will accept vector X as feasible if all the G values are larger than or equal to zero.<br />

7.3 Function for monitoring the computation time<br />

This may be de ned by the user or called from the subroutine library in the particular<br />

machine. The following structure is assumed:<br />

REAL FUNCTION T(D)<br />

where D is a dummy parameter. T should be assigned the monitored quantity, e.g.,<br />

the CPU time in seconds limited by TM.Many computers are supplied with ready-made<br />

timing software. If this is given as a function only its name needs to be supplied to GRUP,


380 Appendix B<br />

instead of T, as a parameter. If it is a subroutine, the user can program the required<br />

function. For example, the subroutine might be called SECOND(I), where parameter I is<br />

an integer representing the CPU time in microseconds, in which case one could program:<br />

-----------------------------------------------------<br />

FUNCTION T(D)<br />

CALL SECOND(I)<br />

T=1.E-6*FLOAT(I)<br />

RETURN<br />

END<br />

-----------------------------------------------------<br />

7.4 Function for transforming a uniformly distributed r<strong>and</strong>om number to a normally<br />

distributed one<br />

See under 8.<br />

7.5 Function for generating a uniform r<strong>and</strong>om number distribution in the range (0,1]<br />

The structure must be<br />

REAL FUNCTION R(D)<br />

where D is dummy. R is the value of the r<strong>and</strong>om number.<br />

Note: The smallest value of R must be large enough for the natural logarithm to be generated<br />

without oating-point over ow. The st<strong>and</strong>ard library usually includes a suitable<br />

program, in which case only the appropriate name has to be supplied to GRUP.<br />

8. Function Z(S,R)<br />

This function converts a uniform r<strong>and</strong>om number distribution to a normal distribution<br />

pairwise by means of the Box-Muller rules. The st<strong>and</strong>ard deviation is supplied as parameter<br />

S, while the expectation value for the mean is always zero. The quantity LZis<br />

common to GRUP <strong>and</strong> Z by virtue of a COMMON block <strong>and</strong> acts as a switch to transmit<br />

only one of the two r<strong>and</strong>om numbers generated in response to each second call.<br />

---------------------------------------------------------<br />

SUBROUTINE GRUP(REKO,L,LL,N,M,LF,TM,EA,EB,EC,ED,SN,<br />

1FA,FB,XB,SM,X,FK,XK,SK,F,G,T,Z,R)<br />

LOGICAL REKO<br />

DIMENSION XB(1),SM(1),X(1),FK(1),XK(1),SK(1)<br />

COMMON/GRZ/LZ<br />

EXTERNAL R<br />

KK(RR)=(LA+IFIX(FLOAT(L)*RR))*N<br />

C<br />

C THE PRECEDING LINE CONTAINS A STATEMENT FUNCTION<br />

C<br />

TN=TM+T(D)


( , ) <strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO 381<br />

LZ=1<br />

IF(M)4,4,1<br />

1 LF=-1<br />

C<br />

C FEASIBILITY CHECK<br />

C<br />

FB=0.<br />

DO 3 J=1,M<br />

FG=G(J,N,XB)<br />

IF(FG)2,3,3<br />

2 FB=FB-FG<br />

3 CONTINUE<br />

IF(FB)4,4,5<br />

C<br />

C ALL CONSTRAINTS SATISFIED IF FB


382 Appendix B<br />

15 CONTINUE<br />

16 FF=F(N,X)<br />

17 IF(FF-FB)18,19,19<br />

C<br />

C STORING OF BEST INTERMEDIATE RESULT<br />

C<br />

18 FB=FF<br />

KB=K<br />

19 DO 20 I=1,N<br />

KA=KA+1<br />

SK(KA)=AMAX1(SM(I)*SA,ABS(X(I))*EB,EA)<br />

20 XK(KA)=X(I)<br />

21 FK(K)=FF<br />

IF(KB)24,24,22<br />

22 KB=(KB-1)*N<br />

DO 23 I=1,N<br />

23 XB(I)=XK(KB+I)<br />

C<br />

C START OF MAIN LOOP<br />

C<br />

24 LA=L<br />

LB=0<br />

C<br />

C LA AND LB FORM A ROTATING COUNTER TO AVOID SHUFFLING<br />

C GENOTYPES WITHIN THE ARRAYS CONTAINING PARENTS AND<br />

C DESCENDANTS<br />

C<br />

25 LC=LB<br />

LB=LA<br />

LA=LC<br />

LC=0<br />

LD=0<br />

26 SA=EXP(Z(SN,R))<br />

C<br />

C LOG-NORMAL STEP SIZE FACTOR<br />

C<br />

IF(REKO)GOTO 28<br />

KI=KK(R(D))<br />

DO 27 I=1,N<br />

KI=KI+1<br />

SM(I)=SK(KI)*SA<br />

27 X(I)=XK(KI)+Z(SM(I),R)<br />

C<br />

C MUTATION WITHOUT RECOMBINATION ABOVE


( , ) <strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO 383<br />

C<br />

GOTO 30<br />

28 SA=SA*.5<br />

C<br />

C MUTATION WITH RECOMBINATION BELOW<br />

C<br />

DO 29 I=1,N<br />

SM(I)=(SK(KK(R(D))+I)+SK(KK(R(D))+I))*SA<br />

29 X(I)=XK(KK(R(D))+I)+Z(SM(I),R)<br />

30 IF(LF)31,31,34<br />

C<br />

C AUXILIARY OBJECTIVE<br />

C<br />

31 FF=0.<br />

DO 33 J=1,M<br />

FG=G(J,N,X)<br />

IF(FG)32,33,33<br />

32 FF=FF-FG<br />

33 CONTINUE<br />

IF(FF)60,60,38<br />

C<br />

C ALL CONSTRAINTS SATISFIED IF FF


384 Appendix B<br />

SK(KS)=AMAX1(SM(I),ABS(X(I))*EB,EA)<br />

42 XK(KS)=X(I)<br />

IF(LD-L)46,43,43<br />

C<br />

C DETERMINING THE CURRENT WORST<br />

C<br />

43 KS=LB+1<br />

FS=FK(KS)<br />

DO 45 K=2,L<br />

KA=LB+K<br />

FF=FK(KA)<br />

IF(FF-FS)45,45,44<br />

44 FS=FF<br />

KS=KA<br />

45 CONTINUE<br />

46 LC=LC+1<br />

IF(LC-LL)26,47,47<br />

47 IF(LD-L)26,48,48<br />

C<br />

C END OF A GENERATION<br />

C<br />

48 KA=LB+1<br />

FA=FK(KA)<br />

FC=FA<br />

C<br />

C DETERMINING THE CURRENT BEST AND SUM<br />

C<br />

DO 50 K=2,L<br />

KB=LB+K<br />

FF=FK(KB)<br />

FC=FC+FF<br />

IF(FF-FA)49,50,50<br />

49 FA=FF<br />

KA=KB<br />

50 CONTINUE<br />

IF(FA-FB)51,51,53<br />

C<br />

C DETERMINING WHETHER THE CURRENT BEST IS BETTER THAN<br />

C THE SO FAR OVERALL BEST<br />

C<br />

51 FB=FA<br />

KB=(KA-1)*N<br />

DO 52 I=1,N<br />

52 XB(I)=XK(KB+I)


( , ) <strong>Evolution</strong> Strategies GRUP <strong>and</strong> REKO 385<br />

C<br />

C CONVERGENCE CRITERION<br />

C<br />

53 IF(FS-FA-EC)55,55,54<br />

54 IF((FS-FA)*FLOAT(L)/ED-ABS(FC))55,55,59<br />

55 LF=ISIGN(2,LF)<br />

56 KB=(KA-1)*N<br />

DO 57 I=1,N<br />

57 X(I)=XK(KB+I)<br />

58 RETURN<br />

C<br />

C TIME CONTROL<br />

C<br />

59 IF(T(D)-TN)25,56,56<br />

60 DO 61 I=1,N<br />

61 XB(I)=X(I)<br />

FB=F(N,XB)<br />

FA=FB<br />

LF=0<br />

GOTO 58<br />

END<br />

---------------------------------------------------------<br />

FUNCTION Z(S,R)<br />

COMMON/GRZ/LZ<br />

DATA ZP/6.28318531/<br />

GOTO(1,2),LZ<br />

1 A=SQRT(-2.*ALOG(R(D)))<br />

B=ZP*R(D)<br />

Z=S*A*SIN(B)<br />

LZ=2<br />

RETURN<br />

2 Z=S*A*COS(B)<br />

LZ=1<br />

RETURN<br />

END<br />

---------------------------------------------------------


386 Appendix B<br />

B.3 ( + ) <strong>Evolution</strong> Strategy KORR<br />

Plus additional subroutines: PRUEFG, SPEICH, MUTATI, DREHNG,<br />

UMSPEI, MINMAX, GNPOOL, ABSCHA.<br />

<strong>and</strong> functions: ZULASS, GAUSSN, BLETAL.<br />

1. Purpose<br />

The KORR subroutine is a FORTRAN coding of a multimembered evolution strategy.<br />

It is an iterative direct search strategy for a parameter optimization problem. A search<br />

is made for the minimum in a non-linear function of an arbitrary but nite number of<br />

continuously variable parameters. Derivatives of the objective function are not required.<br />

Constraints in the form of inequalities can be incorporated (right h<strong>and</strong> side 0). The<br />

user must supply initial values for the variables <strong>and</strong> for the appropriate step sizes. If the<br />

initial state is not feasible, a search is made for a feasible point by minimizing the sum of<br />

the negative values for the constraints that have been violated.<br />

2. Parameter list for subroutine KORR<br />

KORR (IELTER, BKOMMA, NACHKO, IREKOM, BKORRL, KONVKR, IFALLK, TGRENZ,<br />

EPSILO, DELTAS, DELTAI, DELTAP, N, M, NS, NP, NY, ZSTERN, XSTERN,<br />

ZBEST, X, S, P, Y, ZIELFU, RESTRI, GAUSSN, GLEICH, TKONTR, KANAL)<br />

All parameters apart from IFALLK, ZSTERN, ZBEST, X, <strong>and</strong> Y must be assigned values<br />

or names before or when the subroutine is called. The variables XSTERN, S, <strong>and</strong> P do<br />

not retain the values initially assigned to them.<br />

IELTER (integer) Number of parents of a generation.<br />

IELTER 1 if IREKOM = 111<br />

IELTER > 1 IF IREKOM > 111<br />

BKOMMA (logical) Switch for comma or plus version.<br />

BKOMMA=.FALSE. Selection criterion applied to parents <strong>and</strong> descendants<br />

(IELTER + NACHKO) evolution strategy.<br />

BKOMMA=.TRUE. Selection criterion applied only to descendants<br />

(IELTER, NACHKO) evolution strategy.<br />

NACHKO (integer) Number of descendants in a generation.<br />

NACHKO 1ifBKOMMA = .FALSE.<br />

NACHKO > IELTER if BKOMMA = .TRUE.<br />

IREKOM (integer) Switch for recombination type consisting of three<br />

digits each of which has values between 1 <strong>and</strong> 5.<br />

The rst digit applies to the object variables X, the<br />

second one to the step sizes S, <strong>and</strong> the third one to<br />

the correlation angles P. Thus 111 IREKOM<br />

555. Each digit controls the recombination in the<br />

following way:


( + ) <strong>Evolution</strong> Strategy KORR 387<br />

1 No recombination<br />

2 Discrete recombination of pairs of parents<br />

3 Intermediary recombination of pairs of parents<br />

4 Discrete recombination of all parents<br />

5 Intermediary recombination of all parents in pairs<br />

BKORRL (logical) Switch forvariability of the mutation hyperellipsoid<br />

(locus of equal probability density).<br />

BKORRL=.FALSE. The ellipsoid cannot rotate.<br />

BKORRL=.TRUE. The ellipsoid can extend <strong>and</strong> rotate.<br />

KONVKR (integer) Switch fortheconvergence criterion:<br />

KONVKR = 1 The di erence in the objective functionvalues between<br />

the best <strong>and</strong> worst parents at the start of each<br />

generation is used to determine whether to terminate<br />

the search before the time limit is reached. It<br />

is assumed that IELTER > 1.<br />

KONVKR > 1 (best 2 N): The change in the mean of all the<br />

parental objective function values in KONVKR generations<br />

is used as the search termination criterion.<br />

In both cases EPSILO(3) serves as the absolute <strong>and</strong><br />

EPSILO(4) as the relative bound for deciding to terminate<br />

the search.<br />

IFALLK (integer) Return code with the following meaning:<br />

IFALLK = ;2 Starting point not feasible, search terminated on<br />

nding a minimal value of the auxiliary objective<br />

function without satisfying all the constraints.<br />

IFALLK = ;1 Starting point not feasible, search for a feasible parameter<br />

vector terminated because time limit was<br />

reached.<br />

IFALLK=0 Starting point not feasible, search for a feasible<br />

XSTERN vector successful, search foraminimum<br />

can be restarted with this.<br />

IFALLK=1 Search for a minimum terminated because time limit<br />

was reached.<br />

IFALLK=2 Search for minimum terminated regularly. The convergence<br />

criterion was satis ed.<br />

IFALLK=3 As for IFALLK = 1, but time limit reached not at<br />

the end of a generation but in an attempt to generate<br />

NACHKO viable descendants.<br />

TGRENZ (real) Parameter used in monitoring the computation time,<br />

e.g., the maximum CPU time in seconds. Search<br />

terminated at the latest at the end of the generation<br />

for which TKONTR TGRENZ.


388 Appendix B<br />

EPSILO (one dimensional Holds parameters that a ect the attainable accuracy of<br />

real array of approximation. The lowest possible values are machinelength<br />

4) dependent.<br />

EPSILO(1) Lower bound to step sizes, absolute.<br />

EPSILO(2) Lower bound to step sizes relative tovalues of variables<br />

(not implemented in this program).<br />

EPSILO(3) Limit to absolute value of objective function di erences<br />

for convergence test.<br />

EPSILO(4) As EPSILO(3) but relative.<br />

DELTAS (real) Factor used in step-size change. All st<strong>and</strong>ard deviations<br />

(=step sizes) S(I) are multiplied by a common<br />

r<strong>and</strong>om number EXP(GAUSSN(DELTAS)), where<br />

GAUSSN(DELTAS) is a normally distributed r<strong>and</strong>om<br />

number with zero mean <strong>and</strong> st<strong>and</strong>ard deviation DELTAS.<br />

EXP(DELTAS) 1.0.<br />

DELTAI (real) As for DELTAS, but each S(I) is multiplied by its own<br />

r<strong>and</strong>om factor EXP(GAUSSN(DELTAI)).<br />

EXP(DELTAI) 1.0. The S(I) retain their initial values<br />

if DELTAS = 0.0 <strong>and</strong> DELTAI = 0.0. The variables<br />

can be scaled only by recombination (IREKOM > 1) if<br />

DELTAI = 0.0.<br />

The following rules are suggested to provide the most<br />

rapid convergence for sphere models:<br />

DELTAS = C/SQRT(2.0 N).<br />

DELTAI = C/(SQRT(2.0 N/SQRT(NS)).<br />

The constant C can increase sublinearly with NACHKO,<br />

but it must be reduced as IELTER increases. The empirical<br />

value C = 1.0 has been found applicable for IELTER<br />

= 10, NACHKO = 100, <strong>and</strong> BKOMMA = .TRUE., which<br />

is a (10 , 100) evolution strategy.<br />

DELTAP (real) St<strong>and</strong>ard deviation in r<strong>and</strong>om variation of the position<br />

angles P(I) for the mutation ellipsoid.<br />

DELTAP > 0.0 if BKORRL = .TRUE. Data in radians.<br />

A suitable value has been found to be DELTAP = 5.0<br />

0.01745 (5 degrees) in certain cases.<br />

N (integer) Number of parameters N > 0.<br />

M (integer) Number of constraints M 0.<br />

NS (integer) Field length in array Sornumber of distinct step-size<br />

parameters that can be used, 1 NS N. The mutation<br />

ellipse becomes a hypersphere for NS = 1. All the principal<br />

axes of the ellipsoid may be di erent for NS = N,<br />

whereas 1


( + ) <strong>Evolution</strong> Strategy KORR 389<br />

NP (integer) Field length of array P.<br />

NP = N (NS ; 1) ; ((NS ; 1) NS)/2 if BKORRL =<br />

.TRUE. NP = 1 if BKORRL = .FALSE.<br />

NY (integer) Field length of array Y.<br />

NY = (N+NS+NP+1) IELTER 2 if BKORRL =<br />

.TRUE. NY = (N+NS+1) IELTER 2ifBKORRL<br />

=.FALSE.<br />

ZSTERN (real) Best value of objective function found during search for<br />

minimum.<br />

XSTERN (one dimensional On call: initial parameter vector.<br />

real array At end of search: best values for parameters correspondof<br />

length N) ing to ZSTERN, or feasible vector found for the special<br />

case IFALLK = 0.<br />

ZBEST (real) Current best value of objective function for the population,<br />

may be di erent from ZSTERN if BKOMMA =<br />

.TRUE.<br />

X (one dimensional Holds the variables for a descendant.<br />

real array of<br />

length N)<br />

S (one dimensional Holds the step sizes for a descendant. The user must<br />

real array of supply initial values. Universally valid rules for selecting<br />

length NS) the best S(I) are not available. If the step sizes are<br />

too large, a very good starting point can be wasted<br />

(BKOMMA = .TRUE.) or the step size adjustment may<br />

be very much delayed (BKOMMA = .FALSE.).<br />

If the initial step sizes are too small, there is only a<br />

slight chance of locating the global optimum in the presence<br />

of several local ones. In general, the optimum overall<br />

step sizes vary with the number N of parameters as<br />

C/SQRT(N), so the individual st<strong>and</strong>ard deviations vary<br />

as C/N with C = const.<br />

P (one dimensional Holds the positional angles of the ellipsoid for a descendreal<br />

array ant. The user must supply initial values if BKORRL =<br />

of length NP) .TRUE. has been selected. If no better values are known<br />

initially, one can set P(I) = ATAN(1.0) for all I = 1(1)NP.<br />

Y (one dimensional Holds the vectors X, S, P, <strong>and</strong>theobjective function<br />

real array of values for the parents of the current generation <strong>and</strong> the<br />

length NY) next generation as well.<br />

ZIELFU (real function) Name of the objective function, to be programmed bythe<br />

user.<br />

RESTRI (real function) Name of the function for evaluating all constraints, to be<br />

programmed by the user.<br />

GAUSSN (real function) Name of the function used in transforming a uniform r<strong>and</strong>om<br />

number distribution to a Gaussian one.


390 Appendix B<br />

GLEICH (real function) Name of the function for generating uniformly distributed<br />

r<strong>and</strong>om numbers.<br />

TKONTR (real function) Name of the run-time monitoring function.<br />

KANAL (integer) Channel number for output, relates only to messages output<br />

by subroutine PRUEFG concerning formal errors detected<br />

in the parameter list of subroutine KORR when the<br />

latter is called.<br />

3. Method<br />

KORR is a development from EVOL, Rechenberg's two membered strategy, <strong>and</strong>GRUP,<br />

the older version of Schwefel's multimembered evolution strategy. The method is based<br />

onavery much simpli ed simulation of biological evolution. See I. Rechenberg, <strong>Evolution</strong><br />

Strategy: Optimization of Technical Systems in Accordance with the Principles of Biological<br />

<strong>Evolution</strong> (in German), vol. 15 of Problemata series, Verlag Frommann-Holzboog,<br />

Stuttgart, 1973 also H.-P. Schwefel, Numerical Optimization of Computer Models, Wiley,<br />

Chichester, 1981 (translated by M. W. Finnis from Numerische Optimierung von<br />

Computer-Modellen mittels der <strong>Evolution</strong>sstrategie, vol. 26 of Interdisciplinary Systems<br />

Research, Birkhauser, Basle, Switzerl<strong>and</strong>, 1977).<br />

The IELTER parameter vectors are used to generate NACHKO new ones by introducing<br />

small normally distributed r<strong>and</strong>om changes. The IELTER best of these are used as<br />

starting points for the next generation (iteration). At the same time the strategy parameters<br />

are altered. These are the parameters of the current normal distributions for the<br />

lengths of the principal axes (st<strong>and</strong>ard deviations = step sizes) <strong>and</strong> the angular position<br />

of the mutation ellipsoid in N-dimensional space. Selection results in adaptation to local<br />

topology if the ratio NACHKO/IELTER is set large enough, e.g., at least 6. The r<strong>and</strong>om<br />

variations in the angles are produced by the addition of normally distributed r<strong>and</strong>om<br />

numbers, while those in the step sizes are produced from r<strong>and</strong>om numbers with a lognormal<br />

distribution by multiplication.<br />

4. Convergence criterion<br />

The termination criterion is based on value di erences in the objective function, see under<br />

KONVKR, EPSILO(3), <strong>and</strong> EPSILO(4).<br />

5. Peripheral I/O<br />

Input: none.<br />

Output: via channel KANAL, but only if there are formal errors in the<br />

parameter list of KORR. See under KANAL.<br />

6. Notes<br />

The two membered strategy (EVOL) usually has the shortest run time of all these evolution<br />

strategies (the EVOL, GRUP, <strong>and</strong> KORR codings so far developed) because ordinary<br />

(serial) computers can test the descendants only one after another in a generation, whereas<br />

in nature they are tested in parallel. The run times of the multimembered strategies increase<br />

less rapidly than in proportion to NACHKO because the convergence rate taken


( + ) <strong>Evolution</strong> Strategy KORR 391<br />

over the generations tends to increase with NACHKO. However, there are frequent instances<br />

where even the simpler multimembered scheme (GRUP) has a run time less than<br />

that of EVOL because GRUP <strong>and</strong> KORR in principle allow one to adapt the step sizes<br />

individually to the local topology, which is not possible with EVOL, <strong>and</strong> this permits one<br />

to scale the variables in a exible fashion. For this reason, the reliability <strong>and</strong> attainable<br />

accuracy are appreciably better than those given by EVOL.<br />

The new KORR program represents further improvements on GRUP in this respect on<br />

account of the increased exibility inthemutation ellipsoid, which improves the variability<br />

of the object variables. In addition to the lengths of the principal axes (st<strong>and</strong>ard<br />

deviations = step sizes) the positions of the principal axes in N-dimensional space are<br />

strategy parameters that are adjustable within the population. This together with the<br />

scaling provides directional adaptation to any valleys or ridges in the objective function<br />

surface. The changes in the object variables are no longer independent but linearly correlated,<br />

<strong>and</strong> this improves the convergence rate (with respect to the number of generations)<br />

quite appreciably in many instances. In special cases, however, there may be an increase<br />

in the run time arising from the storage <strong>and</strong> modi cation of the positional angles, <strong>and</strong><br />

also from coordinate transformation. KORR enables the user to test how many strategy<br />

parameters (=degrees of freedom in the mutation ellipsoid) may be adequate to solve<br />

his special problem. The correlation can be suppressed completely, in which case KORR<br />

becomes equivalent toGRUP. Intermediary stages can be implemented by means of the<br />

NS parameter, the number of mutually independent step sizes. For example, for NS = 2<br />

< Nwehave ahyperellipsoid of rotation with N-NS rotation axes. KORR di ers from the<br />

older EVOL <strong>and</strong> GRUP in being divided into numerous small subroutines. This modular<br />

structure is disadvantageous as regards core requirement <strong>and</strong> run time, but it provides<br />

insight into the mode of operation of the program as a whole, so that it is easier for the<br />

user to modify the algorithm.<br />

Although KORR in general allows one to improve the reliability of the optimum search<br />

there are still two critical situations. Minima at the boundary of the feasible region or<br />

inavertex are attained only slowly or inaccurately. In any case, certainty of global<br />

convergence cannot be guaranteed however, numerical tests have shown that the multimembered<br />

strategy is far better than other search procedures in this respect. It is capable<br />

of h<strong>and</strong>ling separated feasible regions provided that the number of parameters is not large<br />

<strong>and</strong> that the initial step sizes are set suitably large. In doubtful cases it is recommendedto<br />

repeat the search each time with a di erent set of initial values <strong>and</strong>/or r<strong>and</strong>om numbers.<br />

7. Subroutines or functions used<br />

---------------------------------------------------------<br />

SUBROUTINES: PRUEFG, SPEICH, MUTATI, DREHNG, UMSPEI,<br />

MINMAX, GNPOOL, ABSCHA<br />

FUNCTIONS : ZIELFU, RESTRI, GAUSSN, GLEICH, TKONTR,<br />

ZULASS, BLETAL<br />

---------------------------------------------------------<br />

The segment that calls KORR should have the name of the functions ZIELFU, RESTRI,


392 Appendix B<br />

GLEICH, <strong>and</strong> TKONTR declared as external. This applies also to the name of any<br />

function used instead of GAUSSN to convert a uniform distribution to a normal one.<br />

ZIELFU Objective function, to be programmed by the user in the form:<br />

-------------------------------------------------<br />

FUNCTION ZIELFU(N,X)<br />

DIMENSION X(N)<br />

...<br />

ZIELFU=...<br />

RETURN<br />

END<br />

-------------------------------------------------<br />

N represents the number of parameters, <strong>and</strong> X represents the formal parameter vector.<br />

The actual values are supplied by KORR. The function should be written on the basis<br />

that KORR searches for a minimum if a maximum is to be sought, F must be supplied<br />

with a negative sign.<br />

RESTRI Constraints function, to be programmed by the user in the general style:<br />

-------------------------------------------------<br />

FUNCTION RESTRI(J,N,X)<br />

DIMENSION X(N)<br />

GOTO(1,2,3,...,(M)),J<br />

1 RESTRI=...<br />

RETURN<br />

2 RESTRI=...<br />

RETURN<br />

...<br />

...<br />

(M) RESTRI=...<br />

RETURN<br />

END<br />

-------------------------------------------------<br />

N <strong>and</strong> X have the meanings described for the objective function, while J (integer) is<br />

the serial number of the constraint. The statements should be written on the basis that<br />

KORR will accept vector X as feasible if RESTRI 0.0 for all J = 1(1)M.<br />

TKONTR The function for monitoring the computation time may bede nedby the user<br />

or called from the subroutine library in the particular machine. The following structure<br />

is assumed:<br />

REAL FUNCTION TKONTR(D)<br />

where D is a dummy parameter. TKONTR should be assigned the monitored quantity,<br />

e.g., the CPU time in seconds limited by TGRENZ. Many computers are supplied with


( + ) <strong>Evolution</strong> Strategy KORR 393<br />

ready-made timing software. If this is given as a function, only its name needs to be<br />

supplied to KORR instead of TKONTR as a parameter.<br />

GLEICH Function for generating a uniform r<strong>and</strong>om number distribution in the range<br />

(0,1]. The structure must be:<br />

REAL FUNCTION GLEICH(D)<br />

where D is arbitrary. GLEICH is the value of the r<strong>and</strong>om number. The st<strong>and</strong>ard library<br />

usually includes a suitable program, in which case only the appropriate name has to<br />

be supplied to KORR. The other subroutines <strong>and</strong> functions are explained brie y in the<br />

program itself.<br />

---------------------------------------------------------<br />

SUBROUTINE KORR<br />

1(IELTER,BKOMMA,NACHKO,IREKOM,BKORRL,KONVKR,IFALLK,<br />

2TGRENZ,EPSILO,DELTAS,DELTAI,DELTAP,N,M,NS,NP,NY,<br />

3ZSTERN,XSTERN,ZBEST,X,S,P,Y,ZIELFU,RESTRI,GAUSSN,<br />

4GLEICH,TKONTR,KANAL)<br />

LOGICAL BKOMMA,BKORRL,BFATAL,BKONVG,BLETAL<br />

DIMENSION EPSILO(4),XSTERN(N),X(N),S(NS),P(NP),<br />

1Y(NY)<br />

COMMON/PIDATA/PIHALB,PIEINS,PIZWEI<br />

EXTERNAL RESTRI,GAUSSN,GLEICH<br />

IREKOX = IREKOM / 100<br />

IREKOS = (IREKOM - IREKOX*100) / 10<br />

IREKOP = IREKOM - IREKOX*100 - IREKOS*10<br />

D = 0.<br />

CALL PRUEFG<br />

1(IELTER,BKOMMA,NACHKO,IREKOM,BKORRL,KONVKR,TGRENZ,<br />

2EPSILO,DELTAS,DELTAI,DELTAP,N,M,NS,NP,NY,KANAL,<br />

3BFATAL)<br />

C<br />

C CHECK INPUT PARAMETERS FOR FORMAL ERRORS.<br />

C<br />

IF(BFATAL) RETURN<br />

C<br />

C PREPARE AUXILIARY QUANTITIES. TIMING MONITORED IN<br />

C ACCORDANCE WITH THE TKONTR FUNCTION FROM HERE<br />

C ONWARDS.<br />

C<br />

TMAXIM=TGRENZ+TKONTR(D)<br />

IF(.NOT.BKORRL) GOTO 1<br />

PIHALB=2.*ATAN(1.)<br />

PIEINS=PIHALB+PIHALB<br />

PIZWEI=PIEINS+PIEINS


394 Appendix B<br />

1<br />

NL=1+N-NS<br />

NM=N-1<br />

NZ=NY/(IELTER+IELTER)<br />

IF(M.EQ.0) GOTO 2<br />

C<br />

C<br />

C<br />

CHECK FEASIBILITY OF INITIAL VECTOR XSTERN.<br />

IFALLK=-1<br />

ZSTERN=ZULASS(N,M,XSTERN,RESTRI)<br />

IF(ZSTERN.GT.0.) GOTO 3<br />

2 IFALLK=1<br />

ZSTERN=ZIELFU(N,XSTERN)<br />

3 CALL SPEICH<br />

1(0,BKORRL,EPSILO,N,NS,NP,NY,ZSTERN,XSTERN,S,P,Y)<br />

C<br />

C THE INITIAL VALUES SUPPLIED BY THE USER ARE STORED<br />

C<br />

C<br />

C<br />

IN FIELD Y AS THE DATA OF THE FIRST PARENT.<br />

IF(KONVKR.GT.1) Z1=ZSTERN<br />

ZBEST=ZSTERN<br />

LBEST=0<br />

IF(IELTER.EQ.1) GOTO 16<br />

DSMAXI=DELTAS<br />

DPMAXI=AMIN1(DELTAP*10.,PIHALB)<br />

DO 14 L=2,IELTER<br />

C IF IELTER > 1, THE OTHER IELTER - 1 INITIAL VECTORS<br />

C ARE DERIVED FROM THE VECTOR FOR THE FIRST PARENT BY<br />

C MUTATION (WITHOUT SELECTION). THE STRATEGY<br />

C<br />

C<br />

PARAMETERS ARE WIDELY SPREAD.<br />

DO 4 I=1,NS<br />

4 S(I)=Y(N+I)<br />

5 IF(TKONTR(D).LT.TMAXIM) GOTO 501<br />

IFALLK=-3<br />

GOTO 42<br />

501 IF(.NOT.BKORRL) GOTO 7<br />

DO 6 I=1,NP<br />

6 P(I)=Y(N+NS+I)<br />

7<br />

C<br />

CALL MUTATI<br />

1(NL,NM,BKORRL,DSMAXI,DELTAI,DPMAXI,N,NS,NP,X,S,P,<br />

2GAUSSN,GLEICH)<br />

C MUTATION IN ALL OBJECT AND STRATEGY PARAMETERS.


( + ) <strong>Evolution</strong> Strategy KORR 395<br />

C<br />

8<br />

DO 8 I=1,N<br />

X(I)=X(I)+Y(I)<br />

IF(IFALLK.GT.0) GOTO 9<br />

C<br />

C IF THE STARTING POINT IS NOT FEASIBLE, EACH<br />

C MUTATION IS CHECKED AT ONCE TO SEE WHETHER A<br />

C FEASIBLE VECTOR HAS BEEN FOUND. THE SEARCH ENDS<br />

C<br />

C<br />

WITH IFALLK = 0 IF THIS IS SO.<br />

Z=ZULASS(N,M,X,RESTRI)<br />

IF(Z)40,40,12<br />

9 IF(M.EQ.0) GOTO 11<br />

IF(.NOT.BLETAL(N,M,X,RESTRI)) GOTO 11<br />

C<br />

C IF A MUTATION FROM A FEASIBLE STARTING POINT<br />

C RESULTS IN A NON-FEASIBLE X VECTOR, THEN THE STEP<br />

C SIZES ARE REDUCED (ON THE ASSUMPTION THAT THEY WERE<br />

C INITIALLY TOO LARGE) IN ORDER TO AVOID THE<br />

C THE CONSUMPTION OF EXCESSIVE TIME IN DEFINING THE<br />

C<br />

C<br />

THE FIRST PARENT GENERATION.<br />

DO 10 I=1,NS<br />

10 S(I)=S(I)*.5<br />

GOTO 5<br />

11 Z=ZIELFU(N,X)<br />

12 IF(Z.GT.ZBEST) GOTO 13<br />

ZBEST=Z<br />

LBEST=L-1<br />

DSMAXI=DSMAXI*ALOG(2.)<br />

13 CALL SPEICH<br />

1((L-1)*NZ,BKORRL,EPSILO,N,NS,NP,NY,Z,X,S,P,Y)<br />

C<br />

C<br />

C<br />

STORE PARENT DATA IN ARRAY Y.<br />

IF(KONVKR.GT.1) Z1=Z1+Z<br />

14<br />

C<br />

CONTINUE<br />

C THE INITIAL PARENT GENERATION IS NOW COMPLETE.<br />

C ZSTERN AND XSTERN, WHICH HOLD THE BEST VALUES, ARE<br />

C OVERWRITTEN WHEN AN IMPROVEMENT OF THE INITIAL<br />

C<br />

C<br />

SITUATION IS OBTAINED.<br />

IF(LBEST.EQ.0) GOTO 16


396 Appendix B<br />

ZSTERN=ZBEST<br />

K=LBEST*NZ<br />

DO 15 I=1,N<br />

15 XSTERN(I)=Y(K+I)<br />

16 L1=IELTER<br />

L2=0<br />

IF(KONVKR.GT.1) KONVZ=0<br />

C<br />

C ALL INITIALIZATION STEPS COMPLETED AT THIS POINT.<br />

C EACH FRESH GENERATION NOW STARTS AT LABEL 17.<br />

C<br />

17 L3=L2<br />

L2=L1<br />

L1=L3<br />

IF(M.GT.0) L3=0<br />

LMUTAT=0<br />

C<br />

C LMUTAT IS THE MUTATION COUNTER WITHIN A GENERATION,<br />

C WHILE L3 IS THE COUNTER FOR LETHAL MUTATIONS WHEN<br />

C THE PROBLEM INVOLVES CONSTRAINTS.<br />

C<br />

IF(BKOMMA) GOTO 18<br />

C<br />

C IF BKOMMA=.FALSE. HAS BEEN SELECTED, THE PARENTS<br />

C MUST BE INCORPORATED IN THE SELECTION. THE DATA FOR<br />

C THESE ARE TRANSFERRED FROM THE FIRST (OR SECOND)<br />

C PART OF THE ARRAY Y TO THE SECOND (OR FIRST) PART.<br />

C IN THIS CASE THE WORST INDIVIDUAL MUST ALSO BE<br />

C KNOWN, THIS IS REPLACED BY THE FIRST BETTER<br />

C DESCENDANT.<br />

C<br />

CALL UMSPEI<br />

1(L1*NZ,L2*NZ,IELTER*NZ,NY,Y)<br />

CALL MINMAX<br />

1(-1.,L2,NZ,ZSCHL,LSCHL,IELTER,NY,Y)<br />

C<br />

C THE GENERATION OF EACH DESCENDANT STARTS AT LABEL 18<br />

C<br />

18 K1=L1+IELTER*GLEICH(D)<br />

C<br />

C RANDOM CHOICE OF A PARENT OR OF A PAIR OF PARENTS<br />

C IN ACCORDANCE WITH THE VALUE CHOSEN FOR IREKOM. IF<br />

C IREKOM=3 OR IREKOM=5, THE CHOICE OF PARENTS IS MADE<br />

C WITHIN GNPOOL.


( + ) <strong>Evolution</strong> Strategy KORR 397<br />

C<br />

19<br />

C<br />

K2=L1+IELTER*GLEICH(D)<br />

CALL GNPOOL<br />

1(1,L1,K1,K2,NZ,N,IELTER,IREKOS,NS,NY,S,Y,GLEICH)<br />

C STEP SIZES SUPPLIED FOR THE DESCENDANT FROM THE<br />

C<br />

C<br />

C<br />

POOL OF GENES.<br />

IF(BKORRL) CALL GNPOOL<br />

1(2,L1,K1,K2,NZ,N+NS,IELTER,IREKOP,NP,NY,P,Y,GLEICH)<br />

C POSITIONAL ANGLES OF ELLIPSOID SUPPLIED FOR THE<br />

C DESCENDANT FROM THE POOL OF GENES WHEN CORRELATION<br />

C<br />

C<br />

IS REQUIRED.<br />

CALL MUTATI<br />

1(NL,NM,BKORRL,DELTAS,DELTAI,DELTAP,N,NS,NP,X,S,P,<br />

2GAUSSN,GLEICH)<br />

C<br />

C CALL TO MUTATION SUBROUTINE FOR ALL VARIABLES,<br />

C INCLUDING POSSIBLY COORDINATE TRANSFORMATION. S<br />

C (AND P) ARE ALREADY THE NEW ATTRIBUTES OF THE<br />

C DESCENDANT, WHILE X REPRESENTS THE CHANGES TO BE<br />

C<br />

C<br />

C<br />

MADE IN THE OBJECT VARIABLES.<br />

CALL GNPOOL<br />

1(3,L1,K1,K2,NZ,0,IELTER,IREKOX,N,NY,X,Y,GLEICH)<br />

C OBJECT VARIABLES SUPPLIED FOR THE DESCENDANT FROM<br />

C THE POOL OF GENES AND ADDITION OF THE MODIFICATION<br />

C VECTOR. X NOW REPRESENTS THE NEW STATE OF THE<br />

C<br />

C<br />

DESCENDANT.<br />

LMUTAT=LMUTAT+1<br />

IF(IFALLK.GT.0) GOTO 20<br />

C<br />

C EVALUATION OF THE AUXILIARY OBJECTIVE FUNCTION FOR<br />

C<br />

C<br />

THE SEARCH FOR A FEASIBLE VECTOR.<br />

Z=ZULASS(N,M,X,RESTRI)<br />

IF(Z)40,40,22<br />

20<br />

C<br />

IF(M.EQ.0) GOTO 21<br />

C CHECK FEASIBILITY OF DESCENDANT. IF THE RESULT IS


398 Appendix B<br />

C NEGATIVE (LETHAL MUTATION), THE MUTATION IS NOT<br />

C COUNTED AS REGARDS THE NACHKO PARAMETER.<br />

C<br />

IF(.NOT.BLETAL(N,M,X,RESTRI)) GOTO 21<br />

IF(.NOT.BKOMMA) GOTO 25<br />

LMUTAT=LMUTAT-1<br />

L3=L3+1<br />

IF(L3.LT.NACHKO) GOTO 18<br />

L3=0<br />

C<br />

C TIME CHECK MADE NOT ONLY AFTER EACH GENERATION BUT<br />

C ALSO AFTER EVERY NACHKO LETHAL MUTATIONS FOR<br />

C CERTAINTY.<br />

C<br />

IF(TKONTR(D).LT.TMAXIM) GOTO 18<br />

IFALLK=3<br />

GOTO 26<br />

21 Z=ZIELFU(N,X)<br />

C<br />

C EVALUATION OF OBJECTIVE FUNCTION VALUE FOR THE<br />

C DESCENDANT.<br />

C<br />

22 IF(BKOMMA.AND.LMUTAT.LE.IELTER) GOTO 23<br />

IF(Z-ZSCHL)24,24,25<br />

23 LSCHL=L2+LMUTAT-1<br />

24 CALL SPEICH<br />

1(LSCHL*NZ,BKORRL,EPSILO,N,NS,NP,NY,Z,X,S,P,Y)<br />

C<br />

C TRANSFER OF DATA OF DESCENDANT TO PART OF ARRAY Y<br />

C HOLDING THE PARENTS FOR THE NEXT GENERATION.<br />

C<br />

IF(.NOT.BKOMMA.OR.LMUTAT.GE.IELTER) CALL MINMAX<br />

1(-1.,L2,NZ,ZSCHL,LSCHL,IELTER,NY,Y)<br />

C<br />

C LOOK FOR THE CURRENTLY WORST INDIVIDUAL STORED IN<br />

C ARRAY Y WITHOUT CONSIDERING THE PARENTS THAT STILL<br />

C CAN PRODUCE DESCENDANTS IN THIS GENERATION.<br />

C<br />

25 IF(LMUTAT.LT.NACHKO) GOTO 18<br />

C<br />

C END OF GENERATION.<br />

C<br />

26 CALL MINMAX<br />

1(1.,L2,NZ,ZBEST,LBEST,IELTER,NY,Y)


( + ) <strong>Evolution</strong> Strategy KORR 399<br />

C<br />

C LOOK FOR THE BEST OF THE INDIVIDUALS HELD AS<br />

C PARENTS FOR THE NEXT GENERATION. IF THIS IS BETTER<br />

C THAN ANY DESCENDANT PREVIOUSLY GENERATED, THE DATA<br />

C ARE WRITTEN INTO ZSTERN AND XSTERN.<br />

C<br />

IF(ZBEST.GT.ZSTERN) GOTO 28<br />

ZSTERN=ZBEST<br />

K=LBEST*NZ<br />

DO 27 I=1,N<br />

27 XSTERN(I)=Y(K+I)<br />

28 IF(IFALLK.EQ.3) GOTO 30<br />

Z2=0.<br />

K=L2*NZ<br />

DO 29 L=1,IELTER<br />

K=K+NZ<br />

29 Z2=Z2+Y(K)<br />

CALL ABSCHA<br />

1(IELTER,KONVKR,IFALLK,EPSILO,ZBEST,ZSCHL,Z1,Z2,<br />

2KONVZ,BKONVG)<br />

C<br />

C TEST CONVERGENCE CRITERION.<br />

C<br />

IF(BKONVG) GOTO 30<br />

C<br />

C CHECK TIME ELAPSED.<br />

C<br />

IF(TKONTR(D).LT.TMAXIM) GOTO 17<br />

C<br />

C PREPARE FINAL DATA FOR RETURN FROM KORR IF THE<br />

C STARTING POINT WAS FEASIBLE.<br />

C<br />

30 K=LBEST*NZ<br />

DO 31 I=1,N<br />

K=K+1<br />

31 X(I)=Y(K)<br />

DO 32 I=1,NS<br />

K=K+1<br />

32 S(I)=Y(K)<br />

IF(.NOT.BKORRL) RETURN<br />

DO 33 I=1,NP<br />

K=K+1<br />

33 P(I)=Y(K)<br />

RETURN


400 Appendix B<br />

C<br />

C PREPARE FINAL DATA FOR RETURN FROM KORR IF THE<br />

C STARTING POINT WAS NOT FEASIBLE.<br />

C<br />

40 DO 41 I=1,N<br />

41 XSTERN(I)=X(I)<br />

ZSTERN=ZIELFU(N,XSTERN)<br />

ZBEST=ZSTERN<br />

IFALLK=0<br />

42 RETURN<br />

END<br />

---------------------------------------------------------<br />

Subroutine PRUEFG<br />

PRUEFG checks the values given with the parameter list on calling KORR. If discrepancies<br />

are found, an attempt is made to eliminate them. If this is not possible, e.g., arrays<br />

required are not appropriately dimensioned, the search for the minimum is not initiated.<br />

Then PRUEFG outputs a message to the peripheral unit denoted by KANAL on the correction<br />

of the error or else a warning message. BFATAL supplies KORR with information<br />

on the outcome of the check as a Boolean value.<br />

---------------------------------------------------------<br />

SUBROUTINE PRUEFG<br />

1(IELTER,BKOMMA,NACHKO,IREKOM,BKORRL,KONVKR,TGRENZ,<br />

2EPSILO,DELTAS,DELTAI,DELTAP,N,M,NS,NP,NY,KANAL,<br />

3BFATAL)<br />

LOGICAL BKOMMA,BKORRL,BFATAL<br />

DIMENSION EPSILO(4)<br />

IREKOX = IREKOM / 100<br />

IREKOS = (IREKOM - IREKOX*100) / 10<br />

IREKOP = IREKOM - IREKOX*100 - IREKOS*10<br />

100 FORMAT(1H ,' CORRECTION. IELTER > 0 . ASSUMED: 2 AND<br />

1 KONVKR = ',I5)<br />

101 FORMAT(1H ,' CORRECTION. NACHKO > 0 . ASSUMED: ',I5)<br />

102 FORMAT(1H ,' WARNING. BETTER VALUE NACHKO >= 6*IELTER')<br />

103 FORMAT(1H ,' CORRECTION. IF BKOMMA = .TRUE., THEN<br />

1 NACHKO > IELTER . ASSUMED: ',I3)<br />

1041 FORMAT(1H ,' CORRECTION. 0 < IREKOX < 6 . ASSUMED: 1')<br />

1042 FORMAT(1H ,' CORRECTION. 0 < IREKOS < 6 . ASSUMED: 1')<br />

1043 FORMAT(1H ,' CORRECTION. 0 < IREKOP < 6 . ASSUMED: 1')<br />

105 FORMAT(1H ,' CORRECTION. IF IELTER = 1, THEN<br />

1 IREKOM = 111 . ASSUMED: 111')<br />

106 FORMAT(1H ,' CORRECTION. IF N = 1 OR NS = 1, THEN<br />

1 BKORRL = .FALSE. . ASSUMED: .FALSE.')<br />

107 FORMAT(1H ,' CORRECTION. KONVKR > 0 . ASSUMED: ',I5)


( + ) <strong>Evolution</strong> Strategy KORR 401<br />

108 FORMAT(1H ,' CORRECTION. IF IELTER = 1, THEN<br />

1 KONVKR > 1 . ASSUMED: ',I5)<br />

109 FORMAT(1H ,' CORRECTION. EPSILO(',I1,') > 0. .<br />

1 SIGN REVERSED')<br />

110 FORMAT(1H ,' WARNING. EPSILO(',I1,') TOO SMALL.<br />

1 TREATED AS 0. .')<br />

111 FORMAT(1H ,' CORRECTION. DELTAS >= 0. .<br />

1 SIGN REVERSED')<br />

112 FORMAT(1H ,' WARNING. EXP(DELTAS) = 1.<br />

1 OVER-ALL STEP SIZE CONSTANT')<br />

113 FORMAT(1H ,' CORRECTION. DELTAI >= 0. .<br />

1 SIGN REVERSED')<br />

114 FORMAT(1H ,' WARNING. EXP(DELTAI) = 1.<br />

1 STEP-SIZE RELATIONS CONSTANT')<br />

115 FORMAT(1H ,' CORRECTION. DELTAP >= 0. .<br />

1 SIGN REVERSED')<br />

116 FORMAT(1H ,' WARNING. DELTAP = 0.<br />

1 CORRELATION REMAINS FIXED')<br />

117 FORMAT(1H ,' WARNING. TGRENZ = 0 . ASSUMED: 0')<br />

119 FORMAT(1H ,' FATAL ERROR. N


402 Appendix B<br />

2 IF(.NOT.BKOMMA.OR.NACHKO.GE.6*IELTER) GOTO 3<br />

WRITE(KANAL,102)<br />

IF(NACHKO.GT.IELTER) GOTO 3<br />

NACHKO=6*IELTER<br />

WRITE(KANAL,103)NACHKO<br />

3 IF(IREKOX.GT.0.AND.IREKOX.LT.6) GOTO 301<br />

IREKOX=1<br />

WRITE(KANAL,1041)<br />

301 IF(IREKOS.GT.0.AND.IREKOS.LT.6) GOTO 302<br />

IREKOS=1<br />

WRITE(KANAL,1042)<br />

302 IF(IREKOP.GT.0.AND.IREKOP.LT.6) GOTO 4<br />

IREKOP=1<br />

WRITE(KANAL,1043)<br />

4 IF(IREKOM.EQ.111.OR.IELTER.NE.1) GOTO 5<br />

IREKOM=111<br />

IREKOX=1<br />

IREKOS=1<br />

IREKOP=1<br />

WRITE(KANAL,105)<br />

5 IF(.NOT.BKORRL.OR.(N.GT.1.AND.NS.GT.1)) GOTO 6<br />

BKORRL=.FALSE.<br />

WRITE(KANAL,106)<br />

6 IF(KONVKR.GT.0) GOTO 7<br />

IF(IELTER.EQ.1) KONVKR=N+N<br />

IF(IELTER.GT.1) KONVKR=1<br />

WRITE(KANAL,107)KONVKR<br />

GOTO 8<br />

7 IF(KONVKR.GT.1.OR.IELTER.GT.1) GOTO 8<br />

KONVKR=N+N<br />

WRITE(KANAL,108)KONVKR<br />

8 DO 12 I=1,4<br />

IF(I.EQ.2.OR.I.EQ.4) GOTO 9<br />

IF(EPSILO(I))10,11,12<br />

9 IF((1.+EPSILO(I))-1.)10,11,12<br />

10 EPSILO(I)=-EPSILO(I)<br />

WRITE(KANAL,109)I<br />

GOTO 12<br />

11 WRITE(KANAL,110)I<br />

12 CONTINUE<br />

IF(EXP(DELTAS)-1.)13,14,15<br />

13 DELTAS=-DELTAS<br />

WRITE(KANAL,111)<br />

GOTO 15


( + ) <strong>Evolution</strong> Strategy KORR 403<br />

14 IF(EXP(DELTAI).NE.1.) GOTO 15<br />

WRITE(KANAL,112)<br />

15 IF(EXP(DELTAI)-1.)16,17,18<br />

16 DELTAI=-DELTAI<br />

WRITE(KANAL,113)<br />

GOTO 18<br />

17 IF(IREKOS.GT.1.AND.EXP(DELTAS).GT.1.) GOTO 18<br />

WRITE(KANAL,114)<br />

18 IF(.NOT.BKORRL) GOTO 21<br />

IF(DELTAP)19,20,21<br />

19 DELTAP=-DELTAP<br />

WRITE(KANAL,115)<br />

GOTO 21<br />

20 WRITE(KANAL,116)<br />

21 IF(TGRENZ.GT.0.) GOTO 22<br />

WRITE(KANAL,117)<br />

22 IF(M.GE.0) GOTO 23<br />

M=0<br />

WRITE(KANAL,118)<br />

23 IF(N.GT.0) GOTO 24<br />

WRITE(KANAL,119)<br />

RETURN<br />

24 IF(NS.GT.0) GOTO 25<br />

WRITE(KANAL,120)<br />

RETURN<br />

25 IF(NP.GT.0) GOTO 26<br />

WRITE(KANAL,121)<br />

RETURN<br />

26 IF(NS.LE.N) GOTO 27<br />

NS=N<br />

WRITE(KANAL,122)N<br />

27 IF(BKORRL) GOTO 31<br />

IF(NP.EQ.1) GOTO 28<br />

NP=1<br />

WRITE(KANAL,123)<br />

28 NYY=(N+NS+1)*IELTER*2<br />

IF(NY-NYY)29,37,30<br />

29 WRITE(KANAL,124)<br />

RETURN<br />

30 NY=NYY<br />

WRITE(KANAL,125)NY<br />

GOTO 37<br />

31 NPP=N*(NS-1)-((NS-1)*NS)/2<br />

IF(NP-NPP)32,34,33


404 Appendix B<br />

32 WRITE(KANAL,126)<br />

RETURN<br />

33 NP=NPP<br />

WRITE(KANAL,127)NP<br />

34 NYY=(N+NS+NP+1)*IELTER*2<br />

IF(NY-NYY)35,37,36<br />

35 WRITE(KANAL,128)<br />

RETURN<br />

36 NY=NYY<br />

WRITE(KANAL,129)NY<br />

37 BFATAL=.FALSE.<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Function ZULASS<br />

This function is required only if there are constraints. If the starting point does not lie<br />

in the feasible region, ZULASS generates an auxiliary objective function that is used to<br />

search for a feasible initial vector.<br />

If ZULASS, the negative sum of the values for the functions representing constraints<br />

that have been violated, is zero, then X represents a feasible vector that can be used in<br />

restarting the search with KORR.<br />

XX represents XSTERN or X.<br />

---------------------------------------------------------<br />

FUNCTION ZULASS<br />

1(N,M,XX,RESTRI)<br />

DIMENSION XX(N)<br />

ZULASS=0.<br />

DO 1 J=1,M<br />

R=RESTRI(J,N,XX)<br />

IF(R.LT.0.) ZULASS=ZULASS-R<br />

1 CONTINUE<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Subroutine UMSPEI<br />

UMSPEI is required only if BKOMMA = .FALSE., whereupon the parents in the source<br />

generation have to be subject to selection. UMSPEI transposes the data on the parents<br />

within array Y.<br />

K1, K2, <strong>and</strong> KK are auxiliary quantities transmitted from KORR that de ne the number<br />

<strong>and</strong> addresses of the data to be transposed.


( + ) <strong>Evolution</strong> Strategy KORR 405<br />

---------------------------------------------------------<br />

SUBROUTINE UMSPEI<br />

1(K1,K2,KK,NY,Y)<br />

DIMENSION Y(NY)<br />

DO 1 K=1,KK<br />

1 Y(K2+K)=Y(K1+K)<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Subroutine GNPOOL<br />

GNPOOL supplies a set of variables for a descendant by drawing on the pool of parents<br />

taken together in accordance with the type of recombination selected. This subroutine is<br />

called once each for the object variables X, the strategy variables S, <strong>and</strong> possibly also P.<br />

To minimize storage dem<strong>and</strong>, the changes in the object variables by mutation are added<br />

immediately (J = 3). In intermediary recombination for the positional angle (J = 2), a<br />

check must be made on the di erence between the parental angles to establish suitable<br />

mean values. J = 1 denotes the case where step sizes are involved.<br />

L1 denotes the part of the gene pool from which the parent data are to be drawn if IREKO<br />

= 3 or IREKO = 5. K1 denotes the parent selected by KORR whose data are to be used<br />

when IREKO = 1 (no recombination). K1 <strong>and</strong> K2 denote the two parents whose data are<br />

to be recombined if IREKO =2orIREKO = 4 has been selected.<br />

NZ <strong>and</strong> NN are auxiliary quantities for deriving the addresses in array Y.<br />

NX representsNorNSorNP, XX representsXorSorP,IREKO represents one of the<br />

digits of IREKOM, i.e., IREKOX or IREKOS or IREKOP.<br />

---------------------------------------------------------<br />

SUBROUTINE GNPOOL<br />

1(J,L1,K1,K2,NZ,NN,IELTER,IREKO,NX,NY,XX,Y,GLEICH)<br />

DIMENSION XX(NX),Y(NY)<br />

COMMON/PIDATA/PIHALB,PIEINS,PIZWEI<br />

EXTERNAL GLEICH<br />

IF(J.EQ.3) GOTO 11<br />

GOTO(1,1,1,7,9),IREKO<br />

1 KI1=K1*NZ+NN<br />

IF(IREKO.GT.1) GOTO 3<br />

DO 2 I=1,NX<br />

2 XX(I)=Y(KI1+I)<br />

RETURN<br />

3 KI2=K2*NZ+NN<br />

IF(IREKO.EQ.3) GOTO 5


406 Appendix B<br />

DO 4 I=1,NX<br />

KI=KI1<br />

IF(GLEICH(D).GE..5) KI=KI2<br />

4 XX(I)=Y(KI+I)<br />

RETURN<br />

5 DO 6 I=1,NX<br />

XX1=Y(KI1+I)<br />

XX2=Y(KI2+I)<br />

XXI=(XX1+XX2)*.5<br />

IF(J.EQ.1) GOTO 6<br />

DXX=XX1-XX2<br />

IF(ABS(DXX).GT.PIEINS) XXI=XXI+SIGN(PIEINS,DXX)<br />

6 XX(I)=XXI<br />

RETURN<br />

7 DO 8 I=1,NX<br />

8 XX(I)=Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I)<br />

RETURN<br />

9 DO 10 I=1,NX<br />

XX1=Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I)<br />

XX2=Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I)<br />

XXI=(XX1+XX2)*.5<br />

IF(J.EQ.1) GOTO 10<br />

DXX=XX1-XX2<br />

IF(ABS(DXX).GT.PIEINS) XXI=XXI+SIGN(PIEINS,DXX)<br />

10 XX(I)=XXI<br />

RETURN<br />

11 GOTO(12,12,12,18,20),IREKO<br />

12 KI1=K1*NZ+NN<br />

IF(IREKO.GT.1) GOTO 14<br />

DO 13 I=1,NX<br />

13 XX(I)=XX(I)+Y(KI1+I)<br />

RETURN<br />

14 KI2=K2*NZ+NN<br />

IF(IREKO.EQ.3) GOTO 16<br />

DO 15 I=1,NX<br />

KI=KI1<br />

IF(GLEICH(D).GE..5) KI=KI2<br />

15 XX(I)=XX(I)+Y(KI+I)<br />

RETURN<br />

16 DO 17 I=1,NX<br />

17 XX(I)=XX(I)+(Y(KI1+I)+Y(KI2+I))*.5<br />

RETURN<br />

18 DO 19 I=1,NX<br />

19 XX(I)=XX(I)+Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I)


( + ) <strong>Evolution</strong> Strategy KORR 407<br />

RETURN<br />

20 DO 21 I=1,NX<br />

21 XX(I)=XX(I)+(Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I)<br />

1+Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I))*.5<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Subroutine SPEICH<br />

SPEICH transfers to the data pool Y for the parents of the next generation the data of<br />

a descendant representing a successful mutation (the object variables X <strong>and</strong> the strategy<br />

parameters S (<strong>and</strong> P, if used) together with the corresponding value of the objective function).<br />

A check is made that S (<strong>and</strong> P) fall within speci ed bounds.<br />

J is the address in array Y from which pointonwards the data are to be written <strong>and</strong> is<br />

provided by KORR.<br />

ZZ represents ZSTERN or Z, XX represents XSTERN or X.<br />

---------------------------------------------------------<br />

SUBROUTINE SPEICH<br />

1(J,BKORRL,EPSILO,N,NS,NP,NY,ZZ,XX,S,P,Y)<br />

LOGICAL BKORRL<br />

DIMENSION EPSILO(4),XX(N),S(NS),P(NP),Y(NY)<br />

COMMON/PIDATA/PIHALB,PIEINS,PIZWEI<br />

K=J<br />

DO 1 I=1,N<br />

K=K+1<br />

1 Y(K)=XX(I)<br />

DO 2 I=1,NS<br />

K=K+1<br />

2 Y(K)=AMAX1(S(I),EPSILO(1))<br />

IF(.NOT.BKORRL) GOTO 4<br />

DO 3 I=1,NP<br />

K=K+1<br />

PI=P(I)<br />

IF(ABS(PI).GT.PIEINS) PI=PI-SIGN(PIZWEI,PI)<br />

3 Y(K)=PI<br />

4 K=K+1<br />

Y(K)=ZZ<br />

RETURN<br />

END<br />

---------------------------------------------------------


408 Appendix B<br />

Subroutine MINMAX<br />

MINMAX searches for the smallest or largest value in a series of values of the objective<br />

function held in an array. KORR calls this subroutine to determine the best or worst<br />

parent, in the rst case in order to transfer its data to the location ZBEST (<strong>and</strong> perhaps<br />

also ZSTERN <strong>and</strong> XSTERN) <strong>and</strong> in the other case in order to give space for a better<br />

descendant. C=1.0 initiates a search for the best (smallest) value of the function, while<br />

C=;1.0 does the same for the worst (largest) value.<br />

LL <strong>and</strong> NZ are auxiliary quantities used to transmit information on the position of the<br />

required values within array Y. ZM <strong>and</strong> LM contain the best (or worst) values of the<br />

objective function <strong>and</strong> the number of the corresponding parent minus one.<br />

---------------------------------------------------------<br />

SUBROUTINE MINMAX<br />

1(C,LL,NZ,ZM,LM,IELTER,NY,Y)<br />

DIMENSION Y(NY)<br />

LM=LL<br />

K1=LL*NZ+NZ<br />

ZM=Y(K1)<br />

IF(IELTER.EQ.1) RETURN<br />

K1=K1+NZ<br />

K2=(LL+IELTER)*NZ<br />

KM=LL<br />

DO 1 K=K1,K2,NZ<br />

KM=KM+1<br />

ZZ=Y(K)<br />

IF((ZZ-ZM)*C.GT.0.) GOTO 1<br />

ZM=ZZ<br />

LM=KM<br />

1 CONTINUE<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Subroutine ABSCHA<br />

ABSCHA tests the convergence criterion. If KONVKR = 1 has been selected, the difference<br />

between the objective function values representing the best <strong>and</strong> worst parents<br />

(ZBEST <strong>and</strong> ZSCHL) must be less than the limits set by EPSILO(3) (absolute) or EP-<br />

SILO(4) (relative). Then the assignment BKONVG =.TRUE. is made.<br />

Alternatively, the current di erence ZSCHL-ZBEST is replaced by the change Z1;Z2 in<br />

the sum of all the parent objective function values occurring after KONVKR generations<br />

divided by IELTER.<br />

The Boolean variable BKONVG transmits the result of the convergence test to KORR.


( + ) <strong>Evolution</strong> Strategy KORR 409<br />

KONVZ is the generation counter if KONVKR > 1.<br />

---------------------------------------------------------<br />

SUBROUTINE ABSCHA<br />

1(IELTER,KONVKR,IFALLK,EPSILO,ZBEST,ZSCHL,Z1,Z2,<br />

2KONVZ,BKONVG)<br />

LOGICAL BKONVG<br />

DIMENSION EPSILO(4)<br />

IF(KONVKR.EQ.1) GOTO 1<br />

KONVZ=KONVZ+1<br />

IF(KONVZ.LT.KONVKR) GOTO 3<br />

KONVZ=0<br />

DELTAF=Z1-Z2<br />

Z1=Z2<br />

GOTO 2<br />

1 DELTAF=(ZSCHL-ZBEST)*IELTER<br />

2 IF(DELTAF.GT.EPSILO(3)*IELTER) GOTO 3<br />

IF(DELTAF.GT.EPSILO(4)*ABS(Z2)) GOTO 3<br />

IFALLK=ISIGN(2,IFALLK)<br />

BKONVG=.TRUE.<br />

RETURN<br />

3 BKONVG=.FALSE.<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Function GAUSSN<br />

GAUSSN converts a uniform r<strong>and</strong>om number distribution to a normal one. The function<br />

has been programmed for the trapezium algorithm (J. H. Ahrens <strong>and</strong> U. Dieter, Computer<br />

Methods for Sampling from the Exponential <strong>and</strong> Normal Distributions, Communications<br />

of the Association for Computing Machinery, vol. 15 (1972), pp. 873-882 <strong>and</strong> 1047). The<br />

Box-Muller rules require in many cases (machine-dependent) a longer run time even if<br />

both of the pair of numbers can be used.<br />

SIGMA is the st<strong>and</strong>ard deviation, which ismultiplied by the r<strong>and</strong>om number derived<br />

from a (0.0,1.0) normal distribution.<br />

---------------------------------------------------------<br />

FUNCTION GAUSSN<br />

1(SIGMA,GLEICH)<br />

1 U=GLEICH(D)<br />

U0=GLEICH(D)<br />

IF(U.GE..919544406) GOTO 2<br />

X=2.40375766*(U0+U*.825339283)-2.11402808<br />

GOTO 10


410 Appendix B<br />

2 IF(U.LT..965487131) GOTO 4<br />

3 U1=GLEICH(D)<br />

Y=SQRT(4.46911474-2.*ALOG(U1))<br />

U2=GLEICH(D)<br />

IF(Y*U2.GT.2.11402808) GOTO 3<br />

GOTO 9<br />

4 IF(U.LT..949990709) GOTO 6<br />

5 U1=GLEICH(D)<br />

Y=1.84039875+U1*.273629336<br />

U2=GLEICH(D)<br />

IF(.398942280*EXP(-.5*Y*Y)-.443299126+Y*.209694057<br />

1.LT.U2*.0427025816) GOTO 5<br />

GOTO 9<br />

6 IF(U.LT..925852334) GOTO 8<br />

7 U1=GLEICH(D)<br />

Y=.289729574+U1*1.55066917<br />

U2=GLEICH(D)<br />

IF(.398942280*EXP(-.5*Y*Y)-.443299126+Y*.209694057<br />

1.LT.U2*.0159745227) GOTO 7<br />

GOTO 9<br />

8 U1=GLEICH(D)<br />

Y=U1*.289729574<br />

U2=GLEICH(D)<br />

IF(.398942280*EXP(-.5*Y*Y)-.382544556<br />

1.LT.U2*.0163977244) GOTO 8<br />

9 X=Y<br />

IF(U0.GE..5) X=-Y<br />

10 GAUSSN=SIGMA*X<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Subroutine DREHNG<br />

DREHNG is called from MUTATI only if BKORRL = .TRUE. <strong>and</strong> N > 1. DREHNG<br />

performs the coordinate transformation of the modi cation vector for the object variables.<br />

Although the components of this vector are initially mutually independent, they<br />

are linearly related on account of the rotation speci ed by the positional angles P <strong>and</strong> so<br />

are correlated. The transformation involves NP partial rotations, in each ofwhich only<br />

two of the components of the modi cation vector are involved.


( + ) <strong>Evolution</strong> Strategy KORR 411<br />

---------------------------------------------------------<br />

SUBROUTINE DREHNG<br />

1(NL,NM,N,NP,X,P)<br />

DIMENSION X(N),P(NP)<br />

NQ=NP<br />

DO 1 II=NL,NM<br />

N1=N-II<br />

N2=N<br />

DO 1 I=1,II<br />

X1=X(N1)<br />

X2=X(N2)<br />

SI=SIN(P(NQ))<br />

CO=COS(P(NQ))<br />

X(N2)=X1*SI+X2*CO<br />

X(N1)=X1*CO-X2*SI<br />

N2=N2-1<br />

1 NQ=NQ-1<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Logical function BLETAL<br />

BLETAL tests the feasibility ofanobjectvariable vector immediately on production if<br />

constraints are imposed. The rst constraint tobeviolatedcausesBLETAL to signal<br />

to KORR via the function name (declared as a Boolean variable) that the mutation was<br />

lethal.<br />

---------------------------------------------------------<br />

LOGICAL FUNCTION BLETAL<br />

1(N,M,X,RESTRI)<br />

DIMENSION X(N)<br />

DO 1 J=1,M<br />

IF(RESTRI(J,N,X).LT.0.) GOTO 2<br />

1 CONTINUE<br />

BLETAL=.FALSE.<br />

RETURN<br />

2 BLETAL=.TRUE.<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Subroutine MUTATI<br />

MUTATI h<strong>and</strong>les the r<strong>and</strong>om alteration of the strategy variables <strong>and</strong> the object variables.<br />

First, the step sizes are altered in accordance with the DELTAS <strong>and</strong> DELTAI


412 Appendix B<br />

parameters by multiplication by two r<strong>and</strong>om factors with log-normal distributions. The<br />

resulting normal distribution is used in a r<strong>and</strong>om vector X that represents the changes<br />

in the object variables. If BKORRL = .TRUE. is set when KORR is called, i.e., linear<br />

correlation is required, the positional angle P is also mutated, with r<strong>and</strong>om numbers from<br />

a (0.0,DELTAP) normal distribution added to the original values. Also, DREHNG is<br />

called in that case to transform the vector of modi cations to the object variable.<br />

NL <strong>and</strong> NM are auxiliary quantities transmitted from KORR via MUTATI to DREHNG.<br />

---------------------------------------------------------<br />

SUBROUTINE MUTATI<br />

1(NL,NM,BKORRL,DELTAS,DELTAI,DELTAP,N,NS,NP,X,S,P,<br />

2GAUSSN,GLEICH)<br />

LOGICAL BKORRL<br />

DIMENSION X(N),S(NS),P(NP)<br />

EXTERNAL GLEICH<br />

DS=GAUSSN(DELTAS,GLEICH)<br />

DO 1 I=1,NS<br />

1 S(I)=S(I)*EXP(DS+GAUSSN(DELTAI,GLEICH))<br />

DO 2 I=1,N<br />

2 X(I)=GAUSSN(S(MIN0(I,NS)),GLEICH)<br />

IF(.NOT.BKORRL) RETURN<br />

DO 3 I=1,NP<br />

3 P(I)=P(I)+GAUSSN(DELTAP,GLEICH)<br />

CALL DREHNG<br />

1(NL,NM,N,NP,X,P)<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Note<br />

Without modi cations the subroutines EVOL, GRUP, <strong>and</strong>KORR may be used to solve<br />

optimization problems with integer (or discrete) <strong>and</strong> mixed-integer variables. The search<br />

for an optimum then, however, will only lead into the vicinity of the exact solution.<br />

The discreteness may be induced by the user when formulating the objective function, by<br />

merely rounding the correspondent variables to integers or by attributing discrete values<br />

to them.<br />

The following two examples will give hints only to possible formulations. In order to get<br />

the results in the form wanted the variables will have to be transformed in the same manner<br />

at the end of the optimum search with EVOL, GRUP, orKORR, as is done within<br />

the objective function.


( + ) <strong>Evolution</strong> Strategy KORR 413<br />

Example 1<br />

Minimize<br />

with xi<br />

0, integer for all i = 1(1)n<br />

F (x) =<br />

nX<br />

i=1<br />

(xi ; i)<br />

---------------------------------------------------------<br />

FUNCTION F(N,X)<br />

DIMENSION X(N)<br />

F=0.<br />

DO 1 I=1,N<br />

IX=IFIX(ABS(X(I)))<br />

XI=FLOAT(IX-I)<br />

F=F+XI*XI<br />

1 CONTINUE<br />

RETURN<br />

END<br />

---------------------------------------------------------<br />

Example 2<br />

Minimize<br />

with x 1 from f1.3, 1.5, 2.2, 2.8g only<br />

F (x) =(x 1 ; 2) 2 +(x 1 ; 2x 2) 2<br />

---------------------------------------------------------<br />

FUNCTION F(N,X)<br />

DIMENSION X(N), Y(4)<br />

DATA Y /1.3,1.5,2.2,2.8/<br />

DO 1 I=1,4<br />

X1=Y(I)<br />

IF (X(1)-X1) 2,2,1<br />

1 CONTINUE<br />

2 F1=X1-2.<br />

F2=X1-X(2)-X(2)<br />

F =F1*F1+F2*F2<br />

RETURN<br />

END<br />

---------------------------------------------------------


414 Appendix B


Appendix C<br />

Programs<br />

C.1 Contents of the Floppy Disk<br />

The oppy disk that accompanies this book contains:<br />

Sources of FORTRAN subroutines of the following direct optimization procedures<br />

as described in the Chapters 3, 5, <strong>and</strong> 7 of the book.<br />

- FIBO Coordinate strategy with Fibonacci division<br />

boh ( boh.f) calls subroutine bo ( bo.f)<br />

-GOLD Coordinate strategy with Golden section<br />

goldh (goldh.f) calls subroutine gold (gold.f)<br />

-LAGR Coordinate strategy with Lagrangian interpolation<br />

lagrh (lagrh.f) calls subroutine lagr (lagr.f)<br />

- HOJE Strategy of Hooke <strong>and</strong> Jeeves (pattern search)<br />

hoje (hoje.f) calls subroutine hilf (hilf.f)<br />

-ROSE Strategy of Rosenbrock (rotating coordinates search)<br />

rose (rose.f) calls subroutine grsmr (grsmr.f)<br />

- DSCG Strategy of Davies, Swann, <strong>and</strong> Campey<br />

with Gram-Schmidt orthogonalization<br />

dscg (dscg.f) calls subroutine lineg (lineg.f)<br />

subroutine grsmd (grsmd.f)<br />

- DSCP Strategy of Davies, Swann, <strong>and</strong> Campey<br />

with Palmer orthogonalization<br />

dscp (dscp.f) calls subroutine linep (linep.f)<br />

subroutine palm (palm.f)<br />

-POWE Powell's strategy of conjugate directions<br />

powe (powe.f) calls -<br />

-DFPS Davidon, Fletcher, Powell strategy (Variable metric)<br />

dfps (dfps.f) calls subroutine seth (seth.f)<br />

subroutine grad (grad.f)<br />

function updot (updot.f) calls dot (dot.f)<br />

function dot (dot.f)<br />

415


416 Appendix C<br />

- SIMP Simplex strategy of Nelder <strong>and</strong> Mead<br />

simp (simp.f) calls -<br />

- COMP Complex strategy of M. J. Box<br />

comp (comp.f) calls -<br />

-EVOL Two membered evolution strategy<br />

evol (evol.f) calls function z (included in evol.f)<br />

-KORR Multimembered evolution strategy<br />

korr2 (korr2.f) calls function zulass (included in korr2.f)<br />

function gaussn (included in korr2.f)<br />

function bletal (included in korr2.f)<br />

subroutine pruefg (included in korr2.f)<br />

subroutine speich (included in korr2.f)<br />

subroutine mutati (included in korr2.f)<br />

subroutine umspei (included in korr2.f)<br />

subroutine minmax (included in korr2.f)<br />

subroutine gnpool (included in korr2.f)<br />

subroutine abscha (included in korr2.f)<br />

subroutine drehng (included in korr2.f)<br />

Additionally, FORTRAN function sources of the 50 test problems are included:<br />

{ ZIELFU(N,X) one objective function with a computed GOTO for 50 entries.<br />

{ RESTRI(J,N,X) one constraints function with a computed GOTO for50entries<br />

<strong>and</strong> J as current number of the single restriction.<br />

No runtime package is provided for this set, however.<br />

C sources for all strategies mentioned above <strong>and</strong> C sources for the 50 test problems<br />

(GRUP with option REKO is missing since it has become one special case within<br />

KORR).<br />

A set of simple interfaces to run 13 of the above mentioned optimization routines<br />

with the above mentioned 50 test problems on a PC or workstation.<br />

C.2 About the Program Disk<br />

The oppy disk contains both FORTRAN <strong>and</strong> C sources for each of the strategies described<br />

in the book. All test problems presented in the catalogue of problems (see appendix<br />

A) exist as C code. A set of simple interfaces, easy to underst<strong>and</strong> <strong>and</strong> to exp<strong>and</strong>,<br />

combines the strategies <strong>and</strong> functions to OptimA, a ready for use program package.<br />

The programs are designed to run on a minimally con gured PC using a math-coprocessor<br />

or having an 80486 CPU <strong>and</strong> running the DOS or LINUX operating system. To accomplish<br />

semantic equivalence with the well tested original FORTRAN codes, all strategies<br />

have been translated via f2c, aFortran-to-C converter of AT&T Bell Laboratories. All<br />

C codes can be compiled <strong>and</strong> linked via gcc (Gnu C compiler, version 2.4). Of course,


Running the C Programs 417<br />

any other ANSI C compiler such as Borl<strong>and</strong> C++ that supports 4-byte-integers should<br />

produce correct results as well.<br />

LINUX <strong>and</strong> gcc are freely available under the conditions of the GNU General Public<br />

License. Information about ordering the Gnu C compiler in the United States is available<br />

through the Free Software Foundation by calling 617 876 3296.<br />

All C programs should compile <strong>and</strong> run on any UNIX workstation having gcc or another<br />

ANSI C compiler installed.<br />

C.3 Running the C Programs<br />

The following instructions are appropriate for installing <strong>and</strong> running the C programs on<br />

your PC or workstation. Installation as well as compilation <strong>and</strong> linking can be carried<br />

out automatically.<br />

C.3.1 How to Install OptimA on a PC Using LINUX<br />

or on a UNIX Workstation<br />

First, enter the directory where you want OptimA to be installed. Then copy the installation<br />

le via mtools by typing the comm<strong>and</strong>:<br />

mcopy a:install.sh .<br />

If you don't have mtools, copy wb-1p?.tar from oppy toworkspace <strong>and</strong> untar it. The<br />

instruction<br />

sh install.sh<br />

will copy the whole tree of directories from the disk to your local directory. The following<br />

directories <strong>and</strong> subdirectories will be created:<br />

fortran<br />

funct<br />

include<br />

lib<br />

rstrct<br />

strat<br />

util<br />

To compile, link, <strong>and</strong> run OptimA go to the workbench directory <strong>and</strong> type<br />

make<br />

to start a recursive compilation <strong>and</strong> linking of all C sources.


418 Appendix C<br />

C.3.2 How to Install OptimA on a PC Under DOS<br />

First, enter the directory where you want OptimA to be installed. The instruction<br />

a:INSTALL<br />

or<br />

b:INSTALLB<br />

will copy the whole tree of directories from the disk to your local directory. The same<br />

directories <strong>and</strong> subdirectories as mentioned above will be created. To compile, link, <strong>and</strong><br />

run OptimA go to the workbench directory <strong>and</strong> type<br />

mkOptimA<br />

to start a recursive compilation <strong>and</strong> linking of all C sources. This will take a while,<br />

depending on how fast your machine works.<br />

C.3.3 Running OptimA<br />

After the successful execution of make or mkOptimA, respectively, the executable le OptimA<br />

is located in the subdirectory bin. Here you can run the program package by issuing the<br />

comm<strong>and</strong><br />

OptimA<br />

First, the program will list the available strategies. After choosing a strategy by typing<br />

its number, a list of test problems is displayed. Type a number or continue the listing<br />

by hitting the return key. Depending on the method <strong>and</strong> the problem, the program will<br />

ask for the parameters to con gure the strategy. Please refer to Chapter 6 <strong>and</strong> Appendix<br />

Atochoose appropriate values. Of course, you are free to de ne your own parameter<br />

values, but please remember that the behavior of each strategy strongly depends on its<br />

parameter settings.<br />

Warnings during the process will inform the user of inappropriate parameter de nitions<br />

or abnormal program behavior. For example, the message timeout reached warns the<br />

user that the strategy may nd a better result if the user de ned maximal time were set<br />

to a larger value. The strategies COMP, EVOL, <strong>and</strong> KORR will try at most ve restarts after<br />

the rst timeout occurred.<br />

If a strategy that can process unrestricted problems only is applied to a restricted problem,<br />

awarning will be displayed, too. After the acknowledgement of this message by hitting<br />

the return key, the user can choose another function.<br />

C.4 Description of the Programs<br />

The following pages brie y describe the programs on which this package is based. A short<br />

description of how to incorporate self-de ned problem functions to OptimA follows.


Description of the Programs 419<br />

The directory FORTRAN lists all the original codes described in the book. The reader may<br />

write his own interfaces to these programs. For further information please refer to the C<br />

sources or to Schwefel (1980, 1981).<br />

All C source codes of the strategies have been translated from FORTRAN to C via f2c.<br />

Some modi cations in the C sources were done to gain higher portability <strong>and</strong> to achievea<br />

homogeneous program behavior. For example, all strategies are minimizing, use st<strong>and</strong>ard<br />

output functions, <strong>and</strong> perform operations on the same data types. All modi cations did<br />

not change the semantics of any strategy.<br />

To each optimization method a dialogue interface has been added. Here the strategy's<br />

speci c parameter de nition takes place. In the comments within the program listings<br />

the meaning <strong>and</strong> usage of each parameter is brie y described. All names of the dialogue<br />

interfaces end with the su x \ mod.c." The strategies together with the interfaces are<br />

listed in the directory named strat.<br />

The whole catalogue of problems (see Appendix A) has been coded as C functions. They<br />

are collected in the subdirectory funct.<br />

The problems 2.29 to 2.50 (see Appendix A) are restricted. Therefore, constraints functions<br />

to these problems were written <strong>and</strong> listed in directory rstrct. Because in some<br />

problems the number of constraints to be applied depends on the dimension of the function<br />

to be optimized, this number has to be calculated. This task is performed by the<br />

programs with pre x \rsn ." The evaluation of the constraint itself is done in the modules<br />

with pre x \rst ." A restriction holds if its value is negative.<br />

All strategies perform operations on vectors of varying dimensions. Therefore a set of<br />

tools to allocate <strong>and</strong> to de ne vectors is compiled in the package vec util which islocated<br />

in the subdirectory util. The procedures from this package are used only in the<br />

dialogue interfaces. All other programs perform operations on vectors as if they would<br />

use arrays of arbitrary but xed length.<br />

The main program \OptimA.c" performs only initialization tasks <strong>and</strong> runs the dialogue<br />

within which the user can choose a strategy <strong>and</strong> a function number.<br />

The strategies <strong>and</strong> functions are listed in tables, namely \func tab.c" <strong>and</strong> \strt tab.c."<br />

If the user wants to incorporate new problems to OptimA the table \func tab.c" hasto<br />

be extended. This task is relatively simple for a programmer with little C knowledge if<br />

he follows the next instructions carefully.<br />

C.4.1 How to Incorporate New Functions<br />

The following template is typical for every function de nition:<br />

#include "f2c.h"<br />

#include "math.h"<br />

doublereal probl_2_18(int n,doublereal *x)


420 Appendix C<br />

{<br />

}<br />

return(0.26*(x[0]*x[0] + x[1]*x[1])-0.48*x[0]*x[1] )<br />

Please add your own function into the directory funct. Here you will nd the le<br />

\func tab.c." Include the formal description of your problem into this table. Atypical<br />

template looks like:<br />

{<br />

},<br />

5,<br />

rs_nm_x_x,<br />

restr_x_x,<br />

"Problem x_x (restricted problem):\n\t x[1]+x[2]+... ",<br />

probl_x_x<br />

with the data type de nition:<br />

struct functions {<br />

long int dim /* Problem's dimension */<br />

long int (*rs_num)() /* Calculates the number */<br />

/* of constraints */<br />

doublereal (*restrictions)() /* Constraints function */<br />

char* name /* Mathem. description */<br />

doublereal (*function)() /* Objective function */<br />

}<br />

typedef struct functions funct_t<br />

The rst item denotes the number of dimensions of the problem. A problem with<br />

variable dimension will be denoted by a;1. In this case the program should inquire<br />

the dimension from the user.<br />

The second entry denotes the function that calculates the numbers of constraints<br />

to be applied to the problem. If no constraints are needed a NULL pointer has to be<br />

inserted.<br />

The next line will be displayed to the user during an OptimA session. This string<br />

provides a short description of the problem, typically in mathematical notation.<br />

The last item is a function-pointer to the objective function.<br />

Please do not add a new formal problem description into the func tab behind the last<br />

table entry. The latter denotes the end of the table <strong>and</strong> should not be displaced.


Examples 421<br />

To inform all problems of the new function, its prototype must be included into the header<br />

le func names.h.<br />

As a last step the Makefile has to be extended. The lists FUNCTSRCS <strong>and</strong> FUNCTOBJS<br />

denote the les that make up the list of problems. These lists have tobeextendedbythe<br />

lename of your program code.<br />

Now step back to the directory C <strong>and</strong> issue the comm<strong>and</strong> make or mkOptimA, respectively,<br />

to compile \OptimA."<br />

Restrictions can be incorporated into OptimA like functions. Every C code from the<br />

directory rstrct can be taken as template. The name of the constraints function <strong>and</strong> the<br />

name of the function that calculates the number of constraints has to be included in the<br />

formal problem description.<br />

C.5 Examples<br />

Here two examples of how OptimA works in real life will be presented. The rst one<br />

describes an application of the multimembered evolution strategy KORR to the corridor<br />

model (problem 2.37, function number 32). The second example demonstrates a batch<br />

run. The batch mode enables the user to apply a set of methods to a set of functions in<br />

one task.<br />

C.5.1 An Application of the Multimembered <strong>Evolution</strong><br />

Strategy to the Corridor Model<br />

After calling OptimA <strong>and</strong> choosing problem 2.37 by typing 32, atypical dialogue will look<br />

like:<br />

Multimembered evolution strategy applied to function:<br />

Problem 2.37 (Corridor model) (restricted problem):<br />

Sum[-x[i],{i,1,n}]<br />

Please enter the parameters for the algorithm:<br />

Dimension of the problem : 3<br />

Number of restrictions : 7<br />

Number of parents : 10<br />

Number of descendants : 100<br />

Plus (p) or the comma (c) strategy : c<br />

Should the ellipsoid be able to rotate (y/n) : y<br />

You can choose under several recombination types:


422 Appendix C<br />

1 No recombination<br />

2 Discrete recombination of pairs of parents<br />

3 Intermediary recombination of pairs of parents<br />

4 Discrete recombination of all parents<br />

5 Intermediary recombination of all parents in pairs<br />

Recombination type for the parameter vector : 2<br />

Recombination type for the sigma vector : 3<br />

Recombination type for the alpha vector : 1<br />

Check for convergence after how many generations (> 2*Dim.) : 10<br />

Maximal computation time in sec. : 30<br />

Lower bound to step sizes, absolute : 1e-6<br />

Lower bound to step sizes, relative : 1e-7<br />

Parameter in convergence test, absolute : 1e-6<br />

Parameter in convergence test, relative : 1e-7<br />

Common factor used in step-size changes (e.g. 1) : 1<br />

St<strong>and</strong>ard deviation for the angles<br />

of the mutation ellipsoid (degrees) : 5.0<br />

Number of distinct step-sizes : 3<br />

Initial values of the variables :<br />

0<br />

0<br />

0<br />

Initial step lengths :<br />

1<br />

1<br />

1<br />

Common factor used in step-size changes : 0.408248<br />

Individual factor used in step-size changes : 0.537285<br />

Starting at : F(x) = 0<br />

Time elapsed : 18.099276<br />

Minimum found : -300.000000<br />

at point : 99.999992 100.000000 99.999992<br />

Current best value of population: -300.000000<br />

C.5.2 OptimA Working in Batch Mode<br />

OptimA also supports a batch mode option. This option was introduced to enable a user<br />

to test the behavior of any strategy by varying parameter settings automatically. Of


Examples 423<br />

course, any function or method may bechanged during a run, as well. The batch le that<br />

will be processed should contain the list of input data you would typeinmanually during<br />

a whole session in non-batch mode. OptimA in batch mode suppresses the listing of the<br />

strategies <strong>and</strong> functions. That reduces the output a lot <strong>and</strong> makes it better readable.<br />

Atypical batch run looks like:<br />

OptimA -b < bat_file > results<br />

With a \bat file" like:<br />

8<br />

1<br />

100.100<br />

0.98e-6<br />

0.0e+0<br />

5<br />

5<br />

1<br />

1<br />

0.8e-6<br />

0.8e-6<br />

0.111<br />

0.111<br />

y<br />

the le \results" maylooklike:<br />

Method # : 8<br />

Function # : 1<br />

DFPS strategy (Variable metric) applied to function:<br />

Problem 2.1 (Beale):<br />

(1.5-x*(1-y))^2 + (2.25-x*(1-y^2))^2 + (2.625-x*(1-y^3))^2<br />

Dimension of the problem : 2<br />

Maximal computation time in sec. : 100.100000<br />

Accuracy required : 9.8e-07<br />

Expected value of the objective function<br />

at the optimum : 0<br />

Initial values of the variables :<br />

5<br />

5<br />

Initial step lengths :<br />

1<br />

1<br />

Lower bounds of the step lengths :


424 Appendix C<br />

8e-07<br />

8e-07<br />

Initial step lengths for construction of derivatives :<br />

0.111<br />

0.111<br />

Starting at : F(x) = 403069<br />

Time elapsed : 0.033332<br />

Minimum found : 0.000000<br />

at point : 3.000000 0.500000<br />

Both examples have been run on a SUN SPARC S10/40workstation.<br />

The oppy disk included into this book may not be copied, sold, or redistributed without<br />

the permission of John Wiley & Sons, Inc., New York.


Index 425<br />

Index<br />

Aarts, E.H.L., 161<br />

Abadie, J., 17, 24<br />

Abe, K., 239<br />

Ablay, P., 163<br />

Absolute minimum, see global minimum<br />

Accuracy of approximation, 26, 27, 29,<br />

32, 38, 41, 70, 76, 78, 81, 91, 92,<br />

94, 116, 146, 167, 168, 173, 175,<br />

206{208, 213, 214, 235<br />

Accuracy of computation, 12, 14, 32, 35,<br />

54, 57, 66, 67, 71, 78, 81, 83, 88,<br />

89, 99, 112{114, 145, 170, 173{<br />

175, 206, 209, 236, 329<br />

Ackley, D.H., 152<br />

Adachi, N., 77, 81, 82<br />

Adams, R.J., 96<br />

Adaptation, 5, 6, 9, 100, 102, 105, 142,<br />

147, 152<br />

Adaptive step size r<strong>and</strong>om search, 96, 97,<br />

200<br />

AESOP program package, 68<br />

Ahrens, J.H., 116<br />

AID program package, 68<br />

Aizerman, M.A., 90<br />

Akaike, H., 66, 67, 203<br />

Al<strong>and</strong>er, J.T., 152, 246<br />

Aleks<strong>and</strong>rov, V.M., 95<br />

Algebra, 5, 14, 41, 69, 75, 239<br />

All<strong>and</strong>, A., Jr., 244<br />

Allen, P., 102<br />

Allometry, 243<br />

Allowed region, see feasible region<br />

Altman, M., 68<br />

Amann, H., 93<br />

Analogue computers, 12, 15, 65, 68, 89,<br />

99, 236<br />

Analytic optimization, see indirect optimization<br />

Anders, U., 246<br />

Anderson, N., 35<br />

Anderson, R.L., 91<br />

Andrews, H.C., 5<br />

Andreyev, V.O., 94<br />

Animats, 103<br />

Anscombe, F.J., 101<br />

Antonov, G.E., 90<br />

Aoki, M., 23, 93<br />

Apostol, T.M., 17<br />

Appelbaum, J., 48<br />

Applications, 48, 53, 64, 68, 69, 99, 151,<br />

245{246<br />

Approximation problems, 5, 14, see also<br />

sum of squares minimization<br />

Archer, D.H, 48<br />

Arrow, K.J., 17, 18, 165<br />

Arti cial intelligence, 102, 103<br />

Arti cial life, 103<br />

Asai, K., 94<br />

Ashby, W.R., 9, 91, 100, 105<br />

Atmar, J.W., 151<br />

Automata, 6, 9, 44, 48, 94, 99, 102<br />

Avriel, M., 29, 31, 33<br />

Awdejewa, L.I., 18<br />

Axelrod, R., 21<br />

Azencott, R., 161<br />

Bach, H., 23<br />

Back, T., 118, 134, 147, 151, 155, 159,<br />

245, 246, 248<br />

Baer, R.M., 67<br />

Balakrishnan, A.V., 11, 18<br />

Balas, E., 18<br />

Balinski, M.L., 19


426 Index<br />

Banach, S., 10<br />

B<strong>and</strong>ler, J.W., 48, 115<br />

Banzhaf, W., 103<br />

Bard, Y., 78, 83, 205<br />

Barnes, G.H., 233, 239<br />

Barnes, J.G.P., 84<br />

Barnes, J.L., 102<br />

Barr, D.R., 241<br />

Barrier penalty functions (barrier methods),<br />

16, 107<br />

Bass, R., 81<br />

Bauer, F.L., 84<br />

Bauer, W.F., 93<br />

Beale, E.M.L., 18, 70, 84, 166, 327, 346<br />

Beamer, J.H., 26, 29, 39<br />

Beckman, F.S., 69<br />

Beckmann, M., 19<br />

Behnken, D.W., 65<br />

Beier, W., 105<br />

Beightler, C.S., 1, 23, 27, 32, 38, 87<br />

Bekey, G.A., 12, 65, 89, 95, 96, 98, 99<br />

Belew, R.K., 152<br />

Bell, D.E., 20<br />

Bell, M., 44, 178<br />

Bellman, R.W., 11, 38, 102<br />

Beltrami, E.J., 87<br />

Bendin, F., 248<br />

Berg, R.L., 101<br />

Berlin, V.G., 90<br />

Berman, G., 29, 39<br />

Bernard, J.W., 48<br />

Bernoulli, Joh., 2<br />

Bertram, J.E., 20<br />

Bessel function, 129, 130<br />

Beveridge, G.S.G., 15, 23, 28, 32, 37, 64,<br />

65<br />

Beyer, H.-G., 118, 134, 149, 159<br />

Biasing, 98, 156, 174<br />

Biggs, M.C., 76<br />

Binary optimization, 18, 247<br />

Binomial distribution, 7, 108, 243<br />

Bionics, 99, 102, 105, 238<br />

Birkho , G., 48<br />

Bisection method, 33, 34<br />

Bjorck, A., 35<br />

Blakemore, J.W., 23<br />

Bledsoe, W.W., 239<br />

Blind r<strong>and</strong>om search, see pure r<strong>and</strong>om<br />

search<br />

Blum, J.R., 19, 20<br />

Boas, A.H., 26<br />

Bocharov, I.N., 89, 90<br />

Boltjanski, W.G. (Boltjanskij, V.G.), 18<br />

Boltzmann, L., 160<br />

Bolzano method, 33, 34, 38<br />

Booker, L.B., 152<br />

Booth, A.D., 67, 329<br />

Booth, R.S., 27<br />

Boothroyd, J., 33, 77, 178<br />

Born, J., 118, 149<br />

Borowski, N., 98, 240<br />

Bossert, W.H., 146<br />

Bourgine, P., 103<br />

Box, G.E.P., 6, 7, 65, 68, 69, 89, 101, 115,<br />

see also EVOP method<br />

Box, M.J., 17, 23, 28, 54, 56{58, 61, 68,<br />

89, 115, 174, 332, see also complex<br />

strategy<br />

Boxing in the minimum, 28, 29, 32, 36,<br />

41, 56, 209<br />

Brachistochrone problem, 11<br />

Bracken, J., 348<br />

Brajnes, S.N., 102<br />

Bram, J., 27<br />

Branch <strong>and</strong> bound methods, 18<br />

Br<strong>and</strong>l, V., 93<br />

Branin, F.H., Jr., 88<br />

Braverman, E.M., 90<br />

Bremermann, H.J., 100, 101, 105, 238<br />

Brent, R.P., 23, 27, 34, 35, 74, 84, 88, 89,<br />

174<br />

Brocker, D.H., 95, 98, 99<br />

Broken rational programming, 20<br />

Bromberg, N.S., 89<br />

Brooks, S.H., 58, 87, 89, 91{95, 100, 174<br />

Brown, K.M., 75, 81, 84<br />

Brown, R.R., 66<br />

Broyden, C.G., 14, 77, 81{84, 172, 205


Index 427<br />

Broyden-Fletcher-Shanno formula, 83<br />

Brudermann, U., 246<br />

Brughiera, P., 88<br />

Bryson, A.E., Jr., 68<br />

Budne, T.A., 101<br />

Buehler, R.J., 67, 68<br />

Bunny-hop search, 48<br />

Burkard, R.E., 18<br />

Burt, D.A., 48<br />

Calculus of observations, see observational<br />

calculus<br />

Campbell, D.T., 102<br />

Campey, I.G., 54, see also DSC strategy<br />

Campos, I., 248<br />

Canon, M.D., 18<br />

Cantrell, J.W., 70<br />

Carroll, C.W., 16, 57, 115<br />

Cartesian coordinates, 10<br />

Casey, J.K., 68, 89<br />

Casti, J., 239<br />

Catalogue of problems, 110, 205, 325{366<br />

Cauchy, A., 66<br />

Causality, 237<br />

Cea, J., 23, 47, 68<br />

Cembrowicz, R.G., 246<br />

Cerny, V., 160<br />

Chambliss, J.P., 81<br />

Ch<strong>and</strong>ler, C.B., 48<br />

Ch<strong>and</strong>ler, W.J., 239<br />

Chang, S.S.L., 11, 90<br />

Charalambous, C., 115<br />

Chatterji, B.N. <strong>and</strong> Chatterjee, B., 99<br />

Chazan, D., 239<br />

Cherno , H., 75<br />

Chichinadze, V.K., 88, 91<br />

2 distribution, 108<br />

Cholesky, matrix decomposition, 14, 75<br />

Chromosome mutations, 106, 148<br />

Circumferential distribution, 95{97, 109<br />

Cizek, F., 106<br />

Clayton, D.G., 54<br />

Clegg, J.C., 11<br />

Cochran, W.G., 7<br />

Cockrell, L.D., 93, 99<br />

Cohen, A.I., 70<br />

Cohn, D.L., 100<br />

Collatz, L., 5<br />

Colville, A.R., 68, 174, 175, 339<br />

Combinatorial optimization, 152<br />

Complex strategy, 17, 61{65, 89, 115, 177,<br />

179, 185, 190, 191, 201, 202, 210,<br />

212, 213, 216, 217, 228{230, 232,<br />

327, 341, 346, 357, 361{363, 365,<br />

366<br />

Computational intelligence, 152<br />

Computer-aided design (CAD), 5, 6, 23<br />

Computers, see analogue, digital, hybrid,<br />

parallel, <strong>and</strong> process computers<br />

Concave, see convex<br />

Conceptual algorithms, 167<br />

Condition of a matrix, 67, 180, 203, 242,<br />

326<br />

Conjugate directions, 54, 69, 74, 82, 88,<br />

170{172, 202, see also Powell<br />

strategy<br />

Conjugate gradients, 38, 68, 69, 77, 81,<br />

169{172, 204, 235, see also<br />

Fletcher-Reeves strategy<br />

Conrad, M., 103<br />

Constraints, 8, 12, 14{18, 24, 44, 48, 49,<br />

57, 62, 87, 90{93, 105, 107, 115,<br />

119, 134, 150, 176, 212{214, 216,<br />

236<br />

Constraints, active, 17, 44, 62, 116, 118,<br />

213, 215<br />

Constraints satisfaction problem (CSP),<br />

91<br />

Contour tangent method, 39<br />

Control theory, 9, 11, 18, 23, 70, 88, 89,<br />

99, 112<br />

Convergence criterion, 113{114, 145{146,<br />

see also termination of the search<br />

Converse, A.O., 23<br />

Convex, 17, 34, 39, 47, 66, 101, 166, 169,<br />

236, 239<br />

Cooper, L., 23, 38, 48, 87


428 Index<br />

Coordinate strategy, 41{44, 47, 48, 67,<br />

87, 100, 164, 167, 172, 177, 200,<br />

202{204, 207, 209, 228{230, 233,<br />

327, 332, 339, 340, 362, 363, see<br />

also Fibonacci division, golden<br />

section, <strong>and</strong> Lagrangian interpolation<br />

Coordinate transformation, 241<br />

Cornick, D.E., 70<br />

Correlation, 118, 240, 241, 243, 246<br />

Corridor model objective function, 110,<br />

116, 120, 123, 124, 134{142, 215,<br />

231, 232, 351, 352, 361, 364, 365<br />

Cost of computation, 12, 38, 39, 64, 66,<br />

74, 89, 90, 92, 168, 170, 179, 204,<br />

230, 232, 234, see also rate of convergence<br />

Cottrell, B.J., 67<br />

Courant, R., 11, 66<br />

Covariances, 155, 204, 240, 241<br />

Cowdrey, D.R., 93<br />

Cox, D.R., 7<br />

Cox, G.M., 7<br />

Cragg, E.E., 70<br />

Created response surface technique, 16,<br />

57<br />

Creeping r<strong>and</strong>om search, 94, 95, 99, 100,<br />

236, 237<br />

Crippen, G.M., 89<br />

Criterion of merit, 2, 7<br />

Crockett, J.B., 75<br />

Crossover, 154<br />

Crowder, H., 70<br />

Cryer, C.W., 43<br />

Cubic interpolation, 34, 37, see also Lagrangian<br />

<strong>and</strong> Hermitian interpolation<br />

Cullum, C.D., Jr., 18<br />

Cullum, J., 83<br />

Curry, H.B., 66, 67<br />

Curse of dimensions, Bellman's, 38<br />

Curtis, A.R., 66<br />

Curtiss, J.H., 93<br />

Curve tting, 35, 64, 84, 151, 246<br />

Cybernetics, 9, 101, 102, 322<br />

Dambrauskas, A.P., 58, 64<br />

Daniel, J.W., 15, 23, 68, 70<br />

Dantzig, G.B., 17, 57, 88, 166<br />

Darwin, C., 106, 109, 244<br />

Davidon, W.C., 77, 81, 82, 170<br />

Davidon-Fletcher-Powell strategy, see<br />

DFP strategy<br />

Davidor, Y., 152<br />

Davies, D., 23, 28, 54, 56, 57, 76, 81, see<br />

also Davies-Swann-Campey strategy<br />

Davies, M., 84<br />

Davies, O.L., 7, 58, 68<br />

Davies-Swann-Campey strategy, see DSC<br />

strategy<br />

Davis, L., 152<br />

Davis, R.H., 70<br />

Davis, R.S., 66, 89<br />

Davis, S.H., Jr., 23<br />

Day, R.G., 97<br />

Debye series, 130<br />

Decision theory, 94<br />

Decision tree methods, 18<br />

De Graag, D.P., 95, 98<br />

De Jong, K., 152<br />

Dekker, T.J., 34<br />

Demyanov, V.F., 11<br />

Denn, M.M., 11<br />

Dennis, J.E., Jr., 75, 81, 84<br />

Derivative-free methods, 15, 40, 80, 83,<br />

172, 174, see also direct search<br />

strategies<br />

Derivatives, numerical evaluation of, 19,<br />

23, 35, 66, 68, 71, 76, 78, 81, 83,<br />

95, 97, 170{172<br />

Descendants, number of, 126, 142{144<br />

Descent, theory of, 100, 109<br />

Design <strong>and</strong> analysis of experiments, 6, 58,<br />

65, 89<br />

D'Esopo, D.A., 41<br />

DeVogelaere, R., 44, 178<br />

DFP strategy, 77{78, 83, 97, 170{172, 243


Index 429<br />

DFP-Stewart strategy, 78{81, 177, 178,<br />

184, 189, 195, 200, 201, 209, 210,<br />

219, 228{231, 337, 341, 343, 363,<br />

364<br />

Diblock search, 33<br />

Dichotomous search, 27, 29, 33, 39<br />

Dickinson, A.W., 93, 98, 174<br />

Dieter, U., 116<br />

Di erential calculus, 2, 11<br />

Digital computers, 6, 10{12, 14, 15, 32,<br />

33, 92, 99, 110, 173, 236<br />

Dijkhuis, B., 37<br />

Dinkelbach, W., 17<br />

Diploidy, 106, 148<br />

Direct optimization, 13{15, 20<br />

Direct search strategies, 40{65, 68, 90<br />

Directed r<strong>and</strong>om search, 98<br />

Discontinuity, 13, 23, 25, 42, 88, 91, 116,<br />

176, 211, 214, 231, 236, 341, 349<br />

Discovery, 2<br />

Discrete distribution, 110, 243<br />

Discrete optimization, 11, 18, 32, 39, 44,<br />

64, 88, 91, 108, 152, 160, 243, 247<br />

Discrete recombination, 148, 153, 156<br />

Discretization, see parameterization<br />

Divergence, 35, 76, 169<br />

Dixon, L.C.W., 15, 23, 29, 34, 35, 58, 71,<br />

76, 78, 81{83<br />

Dobzhansky, T., 101<br />

Dominance <strong>and</strong> recessiveness, 101, 106,<br />

148<br />

Dowell, M., 35<br />

Draper, N.R., 7, 65, 69<br />

Drenick, R.F., 48<br />

Drepper, F.R., 103, 246<br />

Drucker, H., 61<br />

DSC strategy, 54{57, 74, 89, 177, 183,<br />

188, 194, 200{202, 209, 228{230,<br />

362, 363<br />

Dubovitskii, A.Ya., 11<br />

Dueck, G., 98, 164<br />

Du n, R.J., 14<br />

Dunham, B., 102<br />

Dvoretzky, A.,20<br />

Dynamic optimization, 7, 9, 10, 48, 64,<br />

89{91, 94, 99, 102, 245, 248<br />

Dynamic programming, 11, 12, 18, 149<br />

Ebeling, W., 102, 163<br />

Edelbaum, T.N., 13<br />

Edelman, G.B., 103<br />

E ectivity of a method, see robustness<br />

E ciency of a method, see rate of convergence<br />

Eigen, M., 101<br />

Eigenvalue problems, 5<br />

Eigenvalues of a matrix, 76, 83, 326<br />

Eisenberg, M.A., 239<br />

Eldredge, N., 148<br />

Elimination methods, see interval division<br />

methods<br />

Elitist strategy, 157<br />

Elkin, R.M., 44, 66, 67<br />

Elliott, D.F., 83<br />

Ellipsoid method, 166<br />

Emad, F.P., 98<br />

Emery, F.E., 48, 87<br />

Engelhardt, M., 20<br />

Engeli, M., 43<br />

Enumeration methods, see grid method<br />

Epigenetic apparatus, 153, 154<br />

Equation, di erential, 15, 65, 68, 93, 246,<br />

345, 346<br />

Equations, system of, 5, 13, 14, 23, 39,<br />

65, 66, 75, 83, 93, 172, 235, 336<br />

Equidistant search, see grid method<br />

Erlicki, M.S., 48<br />

Ermakov, S., 19<br />

Ermoliev, Yu., 19, 90<br />

Errors, computational, 47, 174, 205, 209,<br />

210, 212, 219, 228, 229, 236<br />

Euclid of Alex<strong>and</strong>ria, 32<br />

Euclidean norm, 167, 335<br />

Euclidean space, 10, 24, 49, 97<br />

Euler, L., 2, 15<br />

Even block search, 27<br />

<strong>Evolution</strong>, cultural, 244<br />

<strong>Evolution</strong>, organic, 1, 3, 100, 102, 105,<br />

106, 109, 142, 153, 237, 238


430 Index<br />

<strong>Evolution</strong> strategy, 3, 6, 7, 16, 105{151,<br />

168, 173, 175, 177, 179, 200, 203,<br />

210, 213, 219, 228{230, 232{235,<br />

248, 333, 337, 350, 354, 355, 359,<br />

361, 364, 365, 367, 413, see also<br />

two membered <strong>and</strong> multimembered<br />

evolution strategies<br />

<strong>Evolution</strong> strategy, asynchronous parallel,<br />

248<br />

<strong>Evolution</strong> strategy, parallel, 248<br />

<strong>Evolution</strong> strategy, 1=5 success rule, 110,<br />

112, 114, 116, 118, 142, 200, 213{<br />

215, 237, 349, 361<br />

<strong>Evolution</strong> strategy (1+1), 105{119, 125,<br />

163, 177, 185, 191, 200, 203, 212,<br />

213, 216, 217, 228, 231{233, 328,<br />

349, 363<br />

<strong>Evolution</strong> strategy (1+ ), 123, 134, 145<br />

<strong>Evolution</strong> strategy (1 , ), 145<br />

<strong>Evolution</strong> strategy (10 , 100), 177, 186,<br />

191, 200, 203, 211{215, 217, 228,<br />

231{233<br />

<strong>Evolution</strong> strategy ( +1), 119<br />

<strong>Evolution</strong> strategy ( + ), 119<br />

<strong>Evolution</strong> strategy ( , ), 119, 145, 148,<br />

238, 244, 248<br />

<strong>Evolution</strong> strategy ( ), 247<br />

<strong>Evolution</strong>, synthetic theory, 106<br />

<strong>Evolution</strong>ary algorithms, 151, 152, 161<br />

<strong>Evolution</strong>ary computation, 152<br />

<strong>Evolution</strong>ary operation, see EVOP method<br />

<strong>Evolution</strong>ary principles, 3, 100, 106, 118,<br />

146, 244<br />

<strong>Evolution</strong>ary programming, 151<br />

<strong>Evolution</strong>ism, 244<br />

EVOP method, 6, 7, 9, 64, 68, 69, 89, 101<br />

Experimental optimization, 6{9, 36, 44,<br />

68, 89, 91, 92, 95, 110, 113, 245,<br />

247, see also design <strong>and</strong> analysis<br />

of experiments<br />

Expert system, 248<br />

Extreme value controller, see optimizer<br />

Extremum, see minimum<br />

Faber, M.M., 18<br />

Fabian, V., 20, 90<br />

Factorial design, 38, 58, 65, 68, 246<br />

Faddejew, D.K. <strong>and</strong> Faddejewa, W.N., 27,<br />

67, 240<br />

Fagiuoli, E., 96<br />

Falkenhausen, K. von, 246<br />

Favreau, R.F., 95, 96, 98, 100<br />

Feasible region, 8, 9, 12, 16, 17, 25, 101<br />

Feasible region, not connected, 217, 239,<br />

360<br />

Feasible starting point, search for, 62, 91,<br />

115<br />

Feistel, R., 102, 163<br />

Feldbaum, A.A., 6, 9, 88{90, 99<br />

Fend, F.A., 48<br />

Fiacco, A.V., 16, 76, 81, 115, see also<br />

SUMT method<br />

Fibonacci division, 29{32, 38, 177, 178,<br />

181, 187, 192, 200, 202<br />

Fielding, K., 83<br />

Finiteness of a sequence of iterations, 68,<br />

166, 172<br />

Finkelstein, J.J., 18<br />

Fisher, R.A., 7<br />

Fletcher, R., 24, 38, 68{71, 74, 77, 80{84,<br />

97, 170, 171, 204, 205, 335, 349<br />

Fletcher-Powell strategy, see DFP strategy<br />

Fletcher-Reeves strategy, 69, 70, 78, 170{<br />

172, 204, 233, see also conjugate<br />

gradients<br />

Flood, M.M., 68, 89<br />

Floudas, C.A., 91<br />

Fogarty, L.E., 68<br />

Fogel, D.B., 151<br />

Fogel, L.J., 102, 105, 151<br />

Forrest, S., 152<br />

Forsythe, G.E., 34, 66, 67<br />

Fox, R.L., 23, 34, 205<br />

Frankhauser, P., 246<br />

Frankovic, B., 9<br />

Franks, R., 95, 96, 98, 100<br />

Fraser, A.S., 152


Index 431<br />

Friedberg, R.M., 102, 152<br />

Friedmann, M., 41<br />

Fu, K.S., 94, 99<br />

Function space, 10<br />

Functional analysis theory, 11<br />

Functional optimization, 10{12, 15, 23,<br />

54, 68, 70, 85, 89, 90, 151, 174<br />

Furst, H., 98<br />

Gaede, K.W., 8, 108, 144<br />

Gaidukov, A.L., 98<br />

Gal, S., 31<br />

Galar, R., 102<br />

Game theory, 5,6,20<br />

Gar nkel, R.S., 18<br />

Gauss, C.F., 41, 84<br />

Gauss-Newton method, 84<br />

Gauss-Seidel strategy, see coordinate<br />

strategy<br />

Gaussian approximation, see sum of<br />

squares minimization<br />

Gaussian distribution, see normal distribution<br />

Gaussian elimination, 14, 75, 172<br />

Gaviano, M., 96<br />

Gelatt, C.D., 160<br />

Gelf<strong>and</strong>, I.M., 89<br />

Gene duplication <strong>and</strong> deletion, 247<br />

Gene pool, 146, 148<br />

Generalized least squares, 84<br />

Genetic algorithms, 151{160<br />

Genetic code, 153, 154, 243<br />

Genotype, 106, 152, 153, 157<br />

Geo rion, A.M., 24<br />

Geometric programming, 14<br />

Gerardin, L., 105<br />

Gersht, A.M., 90<br />

Gessner, P., 11<br />

Gibson, J.E., 88, 90<br />

Gilbert, E.G., 68, 89<br />

Gilbert, H.D., 90, 98<br />

Gilbert, P., 239<br />

Gill, P.E., 81<br />

Ginsburg, T., 43, 69<br />

Girsanov, I.V., 11<br />

Glass, H., 48, 87<br />

Gla , K., 105<br />

Glatt, C.R., 68<br />

Global convergence, 39, 88, 94, 96, 98,<br />

117, 118, 149, 216, 217, 238, 239<br />

Global minimum, 24{26, 90, 168, 329, 344,<br />

348, 356, 357, 359, 360<br />

Global optimization, 19, 29, 84, 88{91,<br />

236, 244<br />

Global penalty function, 16<br />

Glover, F., 162, 163<br />

Gnedenko, B.W., 137<br />

Goldberg, D.E., 152, 154<br />

Golden section, 32, 33, 177, 178, 181, 187,<br />

192, 200, 202<br />

Goldfarb, D., 81<br />

Goldfeld, S.M., 76<br />

Goldstein, A.A., 66, 67, 76, 81, 88<br />

Golinski, J., 92<br />

Goll, R., 244<br />

Golub, G.H., 57, 84<br />

Gomory, R.E., 18<br />

Gonzalez, R.S., 95<br />

Gorges-Schleuter, M., 159, 247<br />

Gorvits, G.G., 174<br />

GOSPEL program package, 68<br />

Goto, K., 82<br />

Gottfried, B.S., 23<br />

Gould, S.J., 148<br />

Gradient strategies, 6, 15, 19, 37, 40, 65{<br />

69, 88{90, 94, 95, 98, 166, 167,<br />

171, 172, 174, 235<br />

Gradient strategies, second order, see<br />

Newton strategies<br />

Gradstein, I.S., 136<br />

Gram-Schmidt orthogonalization, 48, 53,<br />

54, 57, 69, 177, 178, 183, 188, 194,<br />

201, 202, 209, 229, 230, 362<br />

Gran, R., 88<br />

Graphical methods, 20<br />

Grasse, P.P., 243<br />

Grassmann, P., 100<br />

Grauer, M., 20


432 Index<br />

Graves, R.L., 23<br />

Great deluge algorithm, 164<br />

Greedy algorithm, 162, 248<br />

Greenberg, H., 18<br />

Greenberg, H.-J., 162<br />

Greenstadt, J., 70, 76, 81, 83, 326<br />

Grefenstette, J.J., 152<br />

Grid method, 12, 26, 27, 32, 38, 39, 65,<br />

92, 93, 100, 149, 168, 236<br />

GROPE program package, 68<br />

Guilfoyle, G., 38<br />

Guin, J.A., 64<br />

Gurin, L.S., 89, 97, 98<br />

Hadamard, J., 66<br />

Hadley, G., 12, 17, 166<br />

Haeckel strategy, 163<br />

Haefner, K., 103<br />

Hague, D.S., 68<br />

Haimes, Y.Y., 10<br />

Hamilton, P.A., 77<br />

Hamilton, W.R., 15<br />

Hammel, U., 245, 248<br />

Hammer, P.L., 19<br />

Hammersley, J.M., 93<br />

Hamming cli s, 154, 155<br />

Hancock, H., 14<br />

H<strong>and</strong>scomb, D.C., 93<br />

Hansen, P.B., 239<br />

Haploidy, 148<br />

Harkins, A., 89<br />

Harmonic division, 32<br />

Hartmann, D., 151, 246<br />

Haubrich, J.G.A., 68<br />

Heckler, R., 246, 248<br />

Heidemann, J.C., 70<br />

Heinhold, J., 8, 108, 144<br />

Hemstitching, 16<br />

Henn, R., 20<br />

Herdy, M., 164<br />

Hermitian interpolation, 37, 38, 69, 77,<br />

88<br />

Herschel, R., 99<br />

Hertel, H., 105<br />

Hesse, R., 88<br />

Hessian matrix (Hesse, L.O.), 13, 69, 75,<br />

169, 170<br />

Hestenes, M.R., 11, 14, 69, 70, 81, 172<br />

Heuristic methods, 7, 18, 40, 88, 91, 98,<br />

102, 162, 173<br />

Heusener, G., 245<br />

Hext, G.R., 57, 58, 64, 68, 89<br />

Heydt, G.T., 93, 98, 99<br />

Heynert, H., 105<br />

Hilbert, D., 10, 11<br />

Hildebr<strong>and</strong>, F.B., 66<br />

Hill climbing strategies, 23 , 85, 87<br />

Hill, I.D., 33, 178<br />

Hill, J.C., 88, 90<br />

Hill, J.D., 94<br />

Himmelblau, D.M., 23, 48, 81, 87, 174,<br />

176, 229, 339<br />

Himsworth, F.R., 57, 58, 64, 68, 89<br />

History vector method, 98<br />

Hit-or-miss method, 93<br />

Ho, Y.C., 68<br />

Hock, W., 174<br />

Hodanova, D., 106<br />

Ho mann, U., 23, 74<br />

Ho meister, F., 151, 234, 246, 248<br />

Ho er, A., 151, 246<br />

Hofmann, H., 23, 74<br />

Holl<strong>and</strong>, J.H., 105, 152, 154<br />

Hollstien, R.B., 152<br />

Holst, W.R., 67<br />

Homeostat, 9, 91, 100<br />

Hoo, S.K., 88<br />

Hooke, R., 44, 87, 90, 92<br />

Hooke-Jeeves strategy, 44{48, 87, 90, 177,<br />

178, 182, 188, 193, 200, 202, 210,<br />

228, 230, 233, 332, 339<br />

Hopper, M.J., 178<br />

Horner, computational scheme of, 14<br />

Horst, R., 91<br />

Hoshino, S., 57, 81<br />

Hotelling, H., 36<br />

House, F.R., 77<br />

Householder, A.S., 27, 75


Index 433<br />

Householder method, 57<br />

Houston, B.F., 48<br />

Howe, R.M., 68<br />

Hu, T.C., 18<br />

Huang, H.Y., 70, 78, 81, 82<br />

Huberman, B.A., 103<br />

Huelsman, L.P., 68<br />

Hu man, R.A., 48<br />

Hull, T.E., 93<br />

Human brain, 6, 102<br />

Humphrey, W.E., 67<br />

Hunter, J.S., 65<br />

Hupfer, P., 92, 94, 98<br />

Hurwicz, L., 17, 18, 165<br />

Hutchinson, D., 61<br />

Hwang, C.L., 20<br />

Hybrid computers, 12, 15, 68, 89, 99, 236<br />

Hybrid methods, 38, 162{164, 169<br />

Hyperplane annealing, 162<br />

Hyslop, J., 206<br />

Idelsohn, J.M., 93, 94<br />

Illiac IV, 239<br />

Imamura, H., 89<br />

Indirect optimization, 13{15, 27, 35, 75,<br />

170, 235<br />

Indusi, J.P., 87<br />

In mum, 9<br />

Information theory, 5<br />

Integer optimization, 18, 247<br />

Interior point method, 166<br />

Intermediary recombination, 148, 153,<br />

156<br />

Interpolation methods, 14, 27, 33{38<br />

Interval division methods, 27, 29{33, 41<br />

Invention, 2<br />

Inverse Hessian matrix, 77, 78<br />

Inversion of a matrix, 76, 170, 175<br />

Isolation, 106, 244<br />

Iterative methods, 11, 13<br />

Ivakhnenko, A.G., 102<br />

Jacobi, C.G.J., 15<br />

Jacobi method, 65, 326<br />

Jacobian matrix, 16, 84<br />

Jacobson, D.H., 12<br />

Jacoby, S.L.S., 23, 67, 174<br />

James, F.D., 33, 178<br />

Janac, K., 90<br />

Jarratt, P., 34, 35, 84<br />

Jarvis, R.A., 91, 93, 94, 99<br />

Jeeves, T.A., 44, 84, 87, 90, 92, see<br />

also Hooke-Jeeves strategy<br />

Johannsen, G., 99<br />

John, F., 166<br />

John, P.W.M., 7<br />

Johnk, M.D., 115<br />

Johnson, I., 38<br />

Johnson, M.P., 81<br />

Johnson, S.M., 31, 32<br />

Jones, A., 84<br />

Jones, D.S., 81<br />

Jordan, P., 109<br />

Kamiya, A., 100<br />

Kammerer, W.J., 70<br />

Kantorovich, L.V., 66, 67<br />

Kaplan, J.L., 64<br />

Kaplinskii, A.I., 90<br />

Kappler, H., 18, 166<br />

Karmarkar, N., 166, 167<br />

Karnopp, D.C., 93, 94, 96<br />

Karp, R.M., 239<br />

Karplus, W.I., 12, 89<br />

Karr, C.L., 160<br />

Karreman, H.F., 11<br />

Karumidze, G.V., 94<br />

Katkovnik, V.Ya., 88, 90<br />

Kaupe, A.F., Jr., 39, 44, 178<br />

Kavanaugh, W.P., 95, 98, 99<br />

Kawamura, K., 70<br />

Keeney, R.E., 20<br />

Kelley, H.J., 15, 68, 70, 81<br />

Kempthorne, O., 7, 67, 68<br />

Kenworthy, I.C.,69<br />

Kesten, H., 20<br />

Kettler, P.C., 82, 83<br />

Khachiyan, L.G., 166, 167<br />

Khovanov, N.V., 96, 102


434 Index<br />

Khurgin, Ya.I., 89<br />

Kiefer, J., 19, 29, 31, 32, 178<br />

Kimura, M., 239<br />

King, R.F., 35<br />

Kirkpatrick, S., 160<br />

Kitajima, S., 94<br />

Kivelidi, V.Kh., 89<br />

Kiwiel, K.C., 19<br />

Kjellstrom, G., 98<br />

Klerer, M., 24<br />

Klessig, R., 15, 70<br />

Klimenko, E.S., 94<br />

Klingman, W.R., 48, 87<br />

Klockgether, J., 7, 245<br />

Klotzler, R., 11<br />

Kobelt, D., 246<br />

Koch, H.W., 244<br />

Kopp, R.E., 18<br />

Korbut, A.A., 18<br />

Korn, G.A., 12, 24, 89, 93, 99<br />

Korn, T.M., 12, 89<br />

Korst, J., 161<br />

Kosako, H., 99<br />

Kovacs, Z., 80, 179<br />

Kowalik, J.S., 23, 42, 67{69, 84, 174, 334,<br />

335, 345<br />

Koza, J., 152<br />

Krallmann, H., 246<br />

Krasnushkin, E.V., 99<br />

Krasovskii, A.A., 89<br />

Krasulina, T.P., 20<br />

Krauter, G.E., 246<br />

Kregting, J., 93<br />

Krelle, W., 17, 18, 166<br />

Krolak, P.D., 38<br />

Kuester, J.L., 18, 58, 179<br />

Kuhn, H.W., 17, 166<br />

Kuhn-Tucker theorem, 17, 166<br />

Kulchitskii, O.Yu., 90<br />

Kumar, K.K., 160<br />

Kunzi, H.P., 17, 18, 20, 166<br />

Kursawe, F., 102, 148, 245, 248<br />

Kushner, H.J., 20, 90<br />

Kussul, E., 101<br />

Kwakernaak, H., 89<br />

Kwasnicka, H. <strong>and</strong> Kwasnicki, W., 102<br />

Kwatny, H.G., 90<br />

Laarhoven, P.J.M. van, 161<br />

Lagrange multipliers, 15, 17<br />

Lagrange, J.L., 2, 15<br />

Lagrangian interpolation, 27, 35{37, 41,<br />

56, 64, 73, 80, 89, 101, 177, 182,<br />

187, 193, 200, 202<br />

Lam, L.S.-B., 100<br />

Lance, G.M., 54<br />

L<strong>and</strong>, A.H., 18<br />

Lange-Nielsen, T., 54<br />

Langguth, V., 89<br />

Langton, C.G., 103<br />

Lapidus, L., 68<br />

Larichev, O.I., 174<br />

Larson, R.E., 239<br />

Lasdon, L.S., 70<br />

Lattice search, see grid method<br />

Lau ermair, T., 162<br />

Lavi, A., 23, 48, 93<br />

Lawler, E.L., 160<br />

Lawrence, J.P., 87, 98<br />

Learning (<strong>and</strong> forgetting), 9, 54, 70, 78,<br />

98, 101, 103, 162, 236<br />

Least squares method, see sum of squares<br />

minimization<br />

LeCam, L.M., 102<br />

Lee, R.C.K., 11<br />

Lehner, K., 248<br />

Leibniz, G.W., 1<br />

Leitmann, G., 11, 18<br />

Lemarechal, C., 19<br />

Leon, A., 68, 89, 174, 337, 356<br />

Leonardo of Pisa, 29<br />

Lerner, A.Ja., 11<br />

Lesniak, Z.K., 92<br />

Lethal mutation, 115, 136, 137, 158<br />

Levenberg, K., 66, 84<br />

Levenberg-Marquardt method, 84<br />

Levine, L., 65<br />

Levine, M.D., 10


Index 435<br />

Levy, A.V., 70, 78, 81<br />

Lew, A.Y., 96<br />

Lew, H.S., 100<br />

Lewallen, J.M., 174<br />

Lew<strong>and</strong>owski, A., 20<br />

Ley ner, U., 151, 246<br />

Lilienthal, O., 238<br />

Lill, S.A., 80, 178, 179<br />

Lindenmayer, A., 103<br />

Line search, 25{38, 42, 54, 66, 70, 71, 77,<br />

89, 101, 167, 170, 171, 173, 180,<br />

214, 228, see also interval division<br />

<strong>and</strong> interpolation methods<br />

Linear convergence, 34, 168, 169, 172, 173,<br />

236, 365<br />

Linear model objective function, 96, 124{<br />

127<br />

Linear programming, 17, 57, 88, 100, 101,<br />

151, 166, 212, 235, 353<br />

Little, W.D., 93, 244<br />

Lobac, V.P., 89<br />

Local minimum, 13, 23{26, 88, 90, 329<br />

Locker, A., 102<br />

Log-normal distribution, 143, 144, 150<br />

Loginov, N.V., 90<br />

Lohmann, R., 164<br />

Long step methods, 66<br />

Longest step procedure, 66<br />

Lootsma, F.A., 24, 81, 174<br />

Lowe, C.W., 69, 101<br />

Lucas, E., 32<br />

Luce, A.D., 21<br />

Luenberger, D.G., 18<br />

Luk, A., 101<br />

Lyvers, H.I., 16<br />

MacDonald, J.R., 84<br />

MacDonald, P.A., 48<br />

Machura, M., 54, 179<br />

MacLane, S., 48<br />

MacLaurin, C., 13<br />

Madsen, K., 35<br />

Mamen, R., 81<br />

M<strong>and</strong>erick, B., 152<br />

M<strong>and</strong>ischer, M., 160<br />

Mangasarian, O.L., 18, 24<br />

Manner, R., 152<br />

Marfeld, A.F., 6<br />

Markwich, P., 246<br />

Marquardt, D.W., 84<br />

Marti, K., 118<br />

Masters, C.O., 61<br />

Masud, A.S.M., 20<br />

Mathematical biosciences, 102<br />

Mathematical optimization, 6{9<br />

Mathematical programming, 15{17, 23,<br />

85, see also linear, quadratic, <strong>and</strong><br />

non-linear programming<br />

Mathematization, 102<br />

Matthews, A., 76, 81<br />

Matyas, J., 97{99, 240, 338<br />

Maximum likelihood method, 8<br />

Maximum, see minimum<br />

Maybach, R.L., 97<br />

Mayne, D.Q., 12, 81<br />

Maze method, 44<br />

McArthur, D.S., 92, 94, 98<br />

McCormick, G.P., 16, 67, 70, 76, 78, 81,<br />

82, 88, 115, 348, see also SUMT<br />

method<br />

McGhee, R.B., 65, 68, 89, 93<br />

McGlade, J.M., 102<br />

McGrew, D.R., 10<br />

McGuire, M.R., 239<br />

McMillan, C., Jr., 18<br />

McMurtry, G.J., 94, 99<br />

Mead, R., 58, 84, 97, see also simplex<br />

strategy<br />

Medvedev, G.A., 89, 99<br />

Meerkov, S.M., 94<br />

Meissinger, H.F., 99<br />

Meliorization, 1<br />

Memory gradient method, 70<br />

Meredith, D.L., 160<br />

Merzenich, W., 101<br />

Metropolis, N., 160<br />

Meyer, J.-A., 103<br />

Michalewicz, Z., 152, 159<br />

Michel, A.N., 70


436 Index<br />

Michie, D., 102<br />

Mickey, M.R., 58, 89, 95<br />

Midpoint method, 33<br />

Miele, A., 68, 70<br />

Mi in, R., 19<br />

Migration, 106, 248<br />

Miller, R.E., 239<br />

Millstein, R.E., 239<br />

Milyutin, A.A., 11<br />

Minima <strong>and</strong> maxima, theory of, see optimality<br />

conditions<br />

Minimax concept, 26, 27, 31, 34, 92<br />

Minimum, 8, 13, 16, 24, 36<br />

Minimum 2 method, 8<br />

Minot, O.N., 102<br />

Minsky, M., 102<br />

Miranker, W.L., 233, 239<br />

Missing links, 1<br />

Mitchell, B.A., Jr., 99<br />

Mitchell, R.A., 64<br />

Mixed integer optimization, 18, 164, 243<br />

Mize, J.H., 18<br />

Mlynski, D., 69, 89<br />

Mockus, J.B., see Motskus, I.B.<br />

Model, internal (of a strategy), 9, 10, 28,<br />

38, 41, 90, 169, 204, 231, 235{237<br />

Model, mathematical (of a system), 7, 8,<br />

65, 68, 160, 235<br />

Modi ed Newton methods, 76<br />

Moler, C., 87<br />

Moment rosetta search, 48<br />

Monro, S., 19<br />

Monte-Carlo methods, 92{94, 109, 149,<br />

160, 168<br />

Moran, P.A.P., 101<br />

More, J.J., 81, 179<br />

Morgenstern, O., 6<br />

Morrison, D.D., 84<br />

Morrison, J.F., 334<br />

Motskus, I.B., 88, 94<br />

Motzkin, T.S., 67<br />

Movshovich, S.M., 96<br />

Mufti, I.H., 18<br />

Mugele, R.A., 44<br />

Muhlenbein, H., 163<br />

Mulawa, A., 54, 179<br />

Muller, M.E., 115<br />

Muller, P.H., 98<br />

Muller-Merbach, H., 17, 166<br />

Multicellular individuals, 247<br />

Multidimensional optimization, 2, 38 , 85<br />

Multimembered evolution strategy, 101,<br />

103, 118{151, 153, 158, 235{248,<br />

329, 333, 335, 344, 347, 355{357,<br />

359, 360, 362, 363, 365, 366, 375,<br />

413, see also evolution strategy<br />

( , )<strong>and</strong>( + )<br />

Multimodality, 12, 24, 85, 88, 157, 159,<br />

239, 245, 248<br />

Multiple criteria decision making<br />

(MCDM), 2, 20, 148, 245<br />

Munson, J.K., 95<br />

Murata, T., 44<br />

Murray, W., 24, 76, 81, 82<br />

Murtagh, B.A., 78, 82<br />

Mutation, 3, 100{102, 106{108, 154, 155,<br />

237<br />

Mutation rate, 100, 101, 154, 237<br />

Mutator genes, 142, 238<br />

Mutseniyeks, V.A., 99<br />

Myers, G.E., 70, 78, 81<br />

Nabla operator, 13<br />

Nachtigall, W., 105<br />

Nag, A., 58<br />

Nake, F., 49<br />

Narendra, K.S., 94<br />

Nashed, M.Z., 70<br />

Neave, H.R., 116<br />

Neighborhood model, 247<br />

Nelder, J.A., 58, 84, 97<br />

Nelder-Mead strategy, see simplex strategy<br />

Nemhauser, G.L., 18<br />

Nenonen, L.K., 70<br />

Network planning, 20<br />

Neumann, J. von, 6<br />

Neustadt, L.W., 11, 18<br />

Newman, D.J., 39


Index 437<br />

Newton, I., 2, 14<br />

Newton direction, 70, 75{77, 84<br />

Newtonian interpolation, 27, 35<br />

Newton-Raphson method, 35, 75, 76, 97,<br />

167, 169{171<br />

Newton strategies, 40, 71, 74{85, 89, 171,<br />

235<br />

Neyman, J., 102<br />

Niching, 100, 106, 238, 248<br />

Nicholls, R.L., 23<br />

Nickel, K., 168<br />

Niederreiter, H., 115<br />

Niemann, H., 5<br />

Nikolic, Z.J., 94<br />

Nissen, V., 103<br />

Nollau, V., 98<br />

Non-linear programming, 17, 18, 166<br />

Non-smooth or non-di erentiable optimization,<br />

19<br />

Nonstationary optimum, 248<br />

Norkin, K.B., 88<br />

Normal distribution, 7, 90, 94, 95, 97,<br />

101, 108, 116, 120, 128, 153, 236,<br />

240, 243<br />

North, J.H., 102<br />

North, M., 246<br />

Numerical mathematics, 5, 27, 239<br />

Numerical optimization, see direct optimization<br />

Nurminski, E.A., 19<br />

Objective function, 2, 8<br />

Observational calculus, 5, 7<br />

Odd block search, 27<br />

Odell, P.L., 20<br />

Oettli, W., 18, 167<br />

O'Hagan, M., 48, 87<br />

Oi, K., 82<br />

Oldenburger, R., 9<br />

Oliver, L.T., 31<br />

One dimensional optimization, 25{38, see<br />

also line search<br />

One step methods, see relaxation methods<br />

O'Neill, R., 58, 179<br />

Ontogenetic learning, 163<br />

Opacic, J., 88<br />

Operations research, 5, 17, 20<br />

Optimal control, see control theory<br />

Optimality conditions, 2, 13{15, 23, 167 ,<br />

235<br />

Optimality of organic systems, 99, 100,<br />

105<br />

Optimization, prerequisites for, 1<br />

Optimization problem, 2, 5{8, 14, 20, 24<br />

Optimizer, 9, 10, 48, 99, 248<br />

<strong>Optimum</strong>, see minimum<br />

<strong>Optimum</strong>,maintaining (<strong>and</strong> hunting), see<br />

dynamic optimization<br />

<strong>Optimum</strong> gradient method, 66<br />

<strong>Optimum</strong> principle of Bellman, 11, 12<br />

Oren, S.S., 82<br />

Ortega, J.M., 5, 27, 41, 42, 82, 84<br />

Orthogonalization, see Gram-Schmidt<br />

<strong>and</strong> Palmer orthogonalization<br />

Osborne, M.R., 23, 42, 68, 69, 84, 174,<br />

335, 345<br />

Osche, G., 106, 119<br />

Ostermeier, A., 118<br />

Ostrowski, A.M., 34, 66<br />

Overadaptation, 148<br />

Overholt, K.J., 31{33, 178<br />

Overrelaxation <strong>and</strong> underrelaxation, 43,<br />

67<br />

Owens, A.J., 102, 105, 151<br />

Page, S.E., 155<br />

Pagurek, B., 70<br />

Palmer, J.R., 57, 178<br />

Palmer orthogonalization, 57, 177, 178,<br />

183, 188, 194, 202, 209, 230<br />

Papageorgiou, M., 23<br />

Papentin, F., 102<br />

Parallel computers, 161, 163, 234, 239,<br />

243, 245, 247, 248<br />

Parameter optimization, 6, 8, 10{13, 15,<br />

16, 20, 23, 105<br />

Parameterization, 15, 151, 346<br />

Pardalos, P.M., 91<br />

Pareto-optimal, 20, 245


438 Index<br />

Parkinson, J.M., 61<br />

Partan (parallel tangents) method, 67{69<br />

Pask, G., 101<br />

Path-oriented strategies, 98, 160, 236, 248<br />

Patrick, M.L., 239<br />

Pattern recognition, 5<br />

Pattern search, see Hooke-Jeeves strategy<br />

Paviani, D.A., 87<br />

Pearson, J.D., 38, 70, 76, 78, 81, 82, 205<br />

Peckham, G., 84<br />

Penalty function, 15, 16, 48, 49, 57, 207<br />

Perceptron, 102<br />

Peschel, M., 20<br />

Peters, E., 163, 248<br />

Peterson, E.L., 14<br />

Phenotype, 106, 153{155, 157, 158<br />

Pierre, D.A., 23, 48, 68, 95<br />

Pierson, B.L., 82<br />

Pike, M.C., 33, 44, 178<br />

Pincus, M., 93<br />

Pinkham, R.S., 93<br />

Pinsker, I.Sh., 44<br />

Pixner, J., 33, 178<br />

Pizzo, J.T., 23, 67, 174<br />

Plane, D.R., 18<br />

Plaschko, P., 151, 246<br />

Pleiotropy, 243<br />

Pluznikov, L.N., 94<br />

Polak, E., 15, 18, 70, 76, 77, 167, 169<br />

Policy, 11<br />

Polyak, B.T., 70<br />

Polygeny, 243<br />

Polyhedron strategies, see simplex <strong>and</strong><br />

complex strategies<br />

Ponstein, J., 17<br />

Pontrjagin, L.S., 18<br />

Poor man's optimizer, 44<br />

Population principle, 101, 119, 238<br />

Posynomes, 14<br />

Powell, D.R., 84<br />

Powell, M.J.D., 57, 70, 71, 74, 77, 82, 84,<br />

88, 97, 170, 202, 205, 335, 337,<br />

349, see also DFP, DFP-Stewart,<br />

<strong>and</strong> Powell strategies<br />

Powell, S., 18<br />

Powell strategy, 69{74, 88, 163, 170{172,<br />

177, 178, 183, 189, 195, 200, 202,<br />

204, 209, 210, 219, 228{230, 327,<br />

332, 339, 341, 343, 364<br />

Poznyak, A.S., 90<br />

Practical algorithms, 167<br />

Predator-prey model, 247<br />

Press, W.H., 115<br />

Price, J.F., 76, 81, 88<br />

Probabilistic automaton, 94<br />

Problem catalogue, see catalogue of problems<br />

Process computers, 10<br />

Projected gradient method, 57, 70<br />

Proofs of convergence, 42, 47, 66, 77, 97,<br />

167, 168<br />

Propoi, A.I., 90<br />

Prusinkiewicz, P., 103<br />

Pseudo-r<strong>and</strong>om numbers, see r<strong>and</strong>om<br />

number generation<br />

Pugachev, V.N., 95<br />

Pugh, E.L., 89<br />

Pun, L., 23<br />

Punctuated equilibrium, 148<br />

Pure r<strong>and</strong>om search, 91, 92, 100, 237<br />

Q-properties, 169, 170, 172, 179, 243<br />

Quadratic convergence, 68, 69, 74, 76, 78,<br />

81{83, 168, 169, 200, 202, 236<br />

Quadratic interpolation, see Lagrangian<br />

<strong>and</strong> Hermitian interpolation<br />

Quadratic programming, 166, 233, 235<br />

Qu<strong>and</strong>t, R.E., 76<br />

Quasi-Newton method, 37, 70, 76, 83, 89,<br />

170, 172, 205, 233, 235, see also<br />

DFP <strong>and</strong> DFP-Stewart strategies<br />

Rabinowitz, P., 84<br />

Rai a, H., 20, 21<br />

Rajtora, S.G., 82<br />

Ralston, A., 27<br />

R<strong>and</strong>om direction, 20, 88, 90, 98, 101, 202<br />

R<strong>and</strong>om evolutionary operation, see<br />

REVOP method


Index 439<br />

R<strong>and</strong>om exchange step, 88, 166<br />

R<strong>and</strong>om number generation, 115, 150, 210,<br />

212, 217, 237<br />

R<strong>and</strong>om sequence, 87, 93<br />

R<strong>and</strong>om step length, 95, 96, 108<br />

R<strong>and</strong>om strategies, 3, 12, 19, 87{103, 105,<br />

240<br />

R<strong>and</strong>om walk, 247<br />

R<strong>and</strong>omness, 87, 91, 93, 237<br />

Rank one methods, 82, 83, 172<br />

Raphson, J., see Newton-Raphson method<br />

Rappl, G., 118<br />

Raster method, see grid method<br />

Rastrigin, L.A., 93, 95, 96, 98, 99<br />

Rate of convergence, 7, 38, 39, 64, 66, 67,<br />

69, 90, 94{98, 101, 110, 118, 120{<br />

141, 167{169, 179{204, 217{232,<br />

234, 236, 239, 240, 242, see also<br />

linear <strong>and</strong> quadratic convergence<br />

Rauch, S.W., 82<br />

Rawlins, G.J.E., 152<br />

Rayleigh-Ritz method, 15<br />

Razor search, 48<br />

Rechenberg, I., 6, 7, 97, 100, 105, 107,<br />

118{120, 130, 142, 149, 164, 168,<br />

172, 179, 231, 238, 245, 352<br />

Recognition processes, 102<br />

Recombination, 3, 101, 106, 146{148, 153{<br />

159, 186, 191, 200, 203, 204, 211{<br />

213, 215{217, 228, 231, 232, 240,<br />

335, 355, 357, 363, 365, 366, see<br />

also discrete <strong>and</strong> intermediary recombination<br />

Reeves, C.M., 38, 69, 93, 170, 204, see<br />

also Fletcher-Reeves strategy<br />

References, 249{323<br />

Regression, 8, 19, 84, 235, 246<br />

Regression, non-linear, 84<br />

Regula falsi (falsorum), 27, 34, 35, 39<br />

Reid, J.K., 66<br />

Rein, H., 100<br />

Reinsch, C., 14<br />

Relative minimum, 38, 42, 43, 66, 209<br />

Relaxation methods, 14, 20, 41, 172, see<br />

also coordinate strategy<br />

Reliability, see robustness<br />

Repair enzymes, 142, 238<br />

Replicator algorithm, 163<br />

Restart of a search, 61, 67, 70, 71, 88,<br />

89, 169, 201, 202, 205, 210, 219,<br />

228{230, 362, 364<br />

REVOP method, 101<br />

Reynolds, O., 238<br />

Rhead, D.G., 74<br />

Rheinboldt, W.C., 5, 27, 41, 42, 82, 84<br />

Ribiere, G., 70, 82<br />

Rice, J.R., 57<br />

Richardson, D.W., 155<br />

Richardson, J.A., 58, 179<br />

Richardson, M., 239<br />

Riding the constraints, 16<br />

Riedl, R., 102, 153<br />

Ritter, K., 24, 70, 82, 88, 168<br />

Rivlin, L., 48<br />

Robbins, H., 19<br />

Roberts, P.D., 70<br />

Roberts, S.M., 16<br />

Robots,6,9,103<br />

Robustness, 3, 13, 34, 37{39, 53, 61, 64,<br />

70, 90, 94, 118, 178, 204{217, 236,<br />

238<br />

Rocko , M.L., 41<br />

Rodlo , R.K., 246<br />

Rogson, M., 100<br />

Roitblat, H., 103<br />

Rosen, J.B., 18, 24, 57, 91, 352<br />

Rosen, R., 100<br />

Rosenblatt, F., 102<br />

Rosenbrock, H.H., 23, 29, 48, 50, 54, 343,<br />

349<br />

Rosenbrock strategy, 16, 48{54, 64, 177,<br />

179, 184, 190, 196, 201, 202, 207,<br />

209, 212, 213, 216, 228, 230{232,<br />

357, 363, 365, 366<br />

Rosenman, E.A., 11<br />

Ross, G.J.S., 84


440 Index<br />

Rotating coordinates method, see Rosenbrock<br />

<strong>and</strong> DSC strategies<br />

Rothe, R., 25<br />

Roughgarden, J.W., 102<br />

Rounding error, see accuracy of computation<br />

Rozonoer, L.I., 90<br />

Rozvany, G., 247<br />

Ruban, A.I., 99<br />

Rubin, A.I., 95<br />

Rubinov, A.M., 11<br />

Rudd, D.F., 98, 356<br />

Rudelson, L.Ye., 102<br />

Rudolph, G., 91, 118, 134, 151, 154, 161,<br />

162, 241, 243, 248<br />

Rustay, R.C., 68, 89<br />

Rutishauser, H., 5, 41, 43, 48, 65, 75, 172,<br />

326<br />

Rybashov, M.V., 68<br />

Ryshik, I.M., 136<br />

Saaty, T.L., 20, 27, 166<br />

Sacks, J., 20<br />

Saddle point, 13, 14, 17, 23, 25, 35, 36,<br />

39, 66, 76, 88, 168, 176, 209, 211,<br />

345<br />

Sala , S., 100<br />

Sameh, A.H., 239<br />

Samuel, A.L., 102<br />

Sargent, R.W.H., 78, 82<br />

Saridis, G.N., 90, 98<br />

Satterthwaite, F.E., 98, 101<br />

Saunders, M.A., 57<br />

Savage, J.M., 119<br />

Savage, L.J., 41<br />

Sawaragi, Y., 48<br />

Sayama, H., 82<br />

Scaling of the variables, 7, 44, 54, 58, 74,<br />

146{148, 232, 239<br />

Scha er, J.D., 152<br />

Schechter, R.S., 15, 23, 28, 32, 37, 41{43,<br />

64, 65<br />

Schee er, L., 14<br />

Scheel, A., 118<br />

Schema theorem, 154<br />

Scheraga, H.A., 89<br />

Scheuer, E.M., 241<br />

Scheuer, T., 98, 164<br />

Schinzinger, R., 67<br />

Schittkowski, K., 174<br />

Schley, C.H., Jr., 70<br />

Schlierkamp-Voosen, D., 163<br />

Schmalhausen, I.I., 101<br />

Schmetterer, L., 90<br />

Schmidt, E., see Gram-Schmidt orthogonalization<br />

Schmidt, J.W., 35, 39<br />

Schmitt, E., 20, 90<br />

Schmutz, M., 103<br />

Schneider, G., 246<br />

Schneider, M., 100<br />

Schrack, G., 98, 240<br />

Schumer, M.A., 89, 93, 96{99, 101, 200,<br />

240<br />

Schuster, P., 101<br />

Schwarz, H.R., 5, 41, 65, 75, 172, 326<br />

Schwefel, D., 246<br />

Schwefel, H.-P., 7, 102, 103, 118, 134, 148,<br />

151, 152, 155, 163, 204, 234, 239,<br />

242, 245{248<br />

Schwetlick, H., 39<br />

Scott, E.L., 102<br />

Sebald, A.V., 151<br />

Sebastian, D.J., 82<br />

Sebastian, H.-J., 24<br />

Secant method, 34, 39, 84<br />

Second order gradient strategies, see Newton<br />

strategies<br />

Sectioning algorithms, 14<br />

Seidel, P.L., 41, see also coordinate strategy<br />

Selection, 3, 100{102, 106, 142, 153, 157<br />

Sensitivity analysis, 17<br />

Separable objective function, 12, 42<br />

Sequential methods, 27 , 38 , 88, 237<br />

Sequential unconstrained minimization<br />

technique, see SUMT method<br />

Sergiyevskiy, G.M.,69<br />

Sexual propagation, 3, 101, 106, 146, 147


Index 441<br />

Shah, B.V., 67, 68<br />

Shanno, D.F., 76, 82{84<br />

Shapiro, I.J., 94<br />

Shedler, G.S., 239<br />

Shemeneva, V.V., 95<br />

Shimelevich, L.I., 88<br />

Shimizu, T., 92<br />

Shindo, A., 64<br />

Short step methods, 66<br />

Shrinkage r<strong>and</strong>om search, 94<br />

Shubert, B.O., 29<br />

Sigmund, K., 21<br />

Silverman, G., 84<br />

Simplex method, see linear programming<br />

Simplex strategy, 57{61, 64, 84, 89, 97,<br />

177, 179, 184, 190, 196, 201, 202,<br />

208, 210, 228{231, 341, 361{364<br />

Simplex, 17, 58, 353<br />

Simulated annealing, 160{162<br />

Simulation, 13, 93, 102, 103, 152, 245, 246<br />

Simultaneous methods, 26{27, 92, 168,<br />

237<br />

Singer, E., 44<br />

Single step methods, see relaxation methods<br />

Singularity, 70, 74, 78, 82, 205, 209<br />

Sirisena, H.R., 15<br />

Slagle, J.R., 102<br />

Slezak, N.L., 241<br />

Smith, C.S., 54, 71, 74<br />

Smith, D.E., 174<br />

Smith, F.B., Jr., 84<br />

Smith, J. Maynard, 21, 102<br />

Smith, L.B., 44, 178<br />

Smith, N.H., 98, 356<br />

Soeder, C.-J., 103<br />

Somatic mutations, 247<br />

Sondak, N.E., 66, 89<br />

Sonderquist, F.J., 48<br />

Sorenson, H.W., 68, 70<br />

Southwell, R.V., 20, 41, 43, 65<br />

Spang, H.A., 93, 174<br />

Spath, H., 84<br />

Spears, W., 152<br />

Spedicato, E., 82<br />

Spendley, W., 57, 58, 61, 64, 68, 84, 89<br />

Speyer, J.L., 70<br />

Sphere model objective function, 110, 117,<br />

120, 123, 124, 127{134, 142, 173,<br />

179, 203, 215, 325, 338<br />

Spider method, 48<br />

Sprave, J., 247, 248<br />

Spremann, K., 11<br />

Stagnation, 47, 58, 61, 64, 67, 87, 88, 100,<br />

157, 201, 205, 238, 341<br />

St<strong>and</strong>ard deviation, see variance<br />

Stanton, E.L., 34<br />

Stark, R.M., 23<br />

Static optimization, 9, 10<br />

Stebbins, G.L., 106<br />

Steepest descent/ascent, 66{68, 166, 169,<br />

235<br />

Steiglitz, K., 87, 96, 98, 99, 101, 200, 240<br />

Stein, M.L., 14, 67<br />

Steinberg, D., 23<br />

Steinbuch, K., 6<br />

Stender, J., 152<br />

Step length control, 110{113, 142{145,<br />

168, 172, 237, see also evolution<br />

strategy, 1=5 success rule<br />

Steuer, R.E., 20<br />

Stewart, E.C., 95, 98, 99<br />

Stewart, G.W., 78, 84, see also DFP-<br />

Stewart strategy<br />

Stiefel, E., 5, 41, 43, 65, 67, 69, 75, 172,<br />

326<br />

Stochastic approximation, 19, 20, 64, 83,<br />

90, 94, 99, 236<br />

Stochastic optimization, 18<br />

Stochastic perturbations, 9, 20, 36, 58,<br />

68, 69, 89, 91, 92, 94, 95, 97, 99,<br />

236, 245<br />

Stoer, J., 18<br />

Stoller, D.S., 241<br />

Stolz, O., 14<br />

Stone, H.S., 239<br />

Storage requirement, 47, 53, 57, 180, 232{<br />

234, 236


442 Index<br />

Storey, C., 23, 50, 54<br />

Strategy, 2,6,100<br />

Strategy comparison, 57, 64, 68, 71, 78,<br />

80, 83, 84, 92, 97, 165{234<br />

Strategy parameter, 144, 204, 238, 240{<br />

242<br />

Stratonovich, R.L., 90<br />

Strong minimum, 24, 328, 333<br />

Strongin, R.G., 94<br />

Structural optimization, 247<br />

Struggle for existence, 100, 106<br />

Suboptimum, 15<br />

Subpopulations, 248<br />

Success/failure routine, 29<br />

Suchowitzki, S.I., 18<br />

Sugie, N., 38<br />

Sum of squares minimization, 5, 83, 331,<br />

335, 346<br />

SUMT method, 16<br />

Supremum, 9<br />

Sutti, C., 88<br />

Suzuki, S., 352<br />

Svechinskii, V.B. (Svecinskij, V.B.), 90,<br />

102<br />

Swann, W.H., 23, 28, 54, 56, 57, see also<br />

DSC strategy<br />

Sweschnikow, A.A., 137<br />

Sworder, D.D., 83<br />

Sydow, A., 68<br />

Sylvester, criterion of, 240<br />

Synge, J.L., 44<br />

Sysoyev, V.V., 95<br />

Szego, G.P., 24, 70, 88<br />

Tabak, D., 18, 78, 82<br />

Tabu search, 162{164<br />

Tabulation method, see grid method<br />

Takamatsu, T., 82<br />

Talkin, A.I., 68<br />

Tammer, K., 24<br />

Tan, S.T., 17<br />

Tapley, B.D., 174<br />

Taran, V.A., 94<br />

Taylor, G., 84<br />

Taylor series (Taylor, B.), 75, 84<br />

Tazaki, E., 64<br />

Tchebyche approximation (Tschebyschow,<br />

P.L.), 5, 331, 370<br />

Termination of the search, 35, 38, 49, 54,<br />

59, 64, 67, 71, 96, 113, 114, 117,<br />

145, 146, 150, 167, 168, 175, 176,<br />

180, 212, 238<br />

Ter-Saakov, A.P., 69<br />

Theodicee, 1<br />

Theory of maxima <strong>and</strong> minima, 11<br />

Thom, R., 102<br />

Thomas, M.E., 20<br />

Three point scheme, 29<br />

Threshold strategy, 98, 164<br />

Tietze, J.L., 70<br />

Timofejew-Ressowski, N.W., 101<br />

Todd, J., 326<br />

Togawa, T., 100<br />

Tokumaru, H., 82<br />

Tolle, H., 18, 68<br />

Tomlin, F.K., 44, 178<br />

Torn, A., 91<br />

Total step procedure, 65<br />

Tovstucha, T.I., 89<br />

Trabattoni, L., 88<br />

Trajectory optimization, see functional<br />

optimization<br />

Traub, J.F., 14, 27<br />

Travelling salesperson problem (TSP), 159,<br />

161<br />

Treccani, G., 70, 88<br />

Trial <strong>and</strong> error, 13, 41<br />

Trial polynomial, 27, 33{35, 37, 68, 235<br />

Trinkaus, H.F., 35<br />

Trotter, H.F., 76<br />

Tschebyschow, P.L., see Tchebyche approximation<br />

Tse, E., 239<br />

Tseitlin, B.M., 44<br />

Tsetlin, M.L., 89<br />

Tsypkin, Ya.Z., 6, 9, 89, 90<br />

Tucker, A.W., 17, 166<br />

Tui, H., 88


Index 443<br />

Turning point, see saddle point<br />

Two membered evolution strategy, 97,<br />

101, 105{118, 172, 238, 329, 352,<br />

357, 359, 363, 366, 367, 374, see<br />

also evolution strategy (1+1)<br />

Tzschach, H.G., 18<br />

Ueing, U., 88, 358, 359<br />

Umeda, T., 64<br />

Unbehauen, H., 18<br />

Uncertainty,interval of, 26{28, 32, 39, 92,<br />

180<br />

Uniform distribution, 91, 92, 95, 115<br />

Unimodality, 24, 27, 28, 39, 168, 236<br />

Uzawa, H., 18<br />

Vagin, V.N., 102<br />

Vajda, S., 7, 18<br />

V<strong>and</strong>erplaats, G.N., 23<br />

VanNice, R.I., 44<br />

VanNorton, R., 41<br />

Varah, J.M., 68<br />

Varela, F.J., 103<br />

Varga, J., 18<br />

Varga, R.S., 43<br />

Variable metric, 70, 77, 83, 169{172, 178,<br />

233, 242, 243, 246, see also DFP<br />

<strong>and</strong> DFP-Stewart strategies<br />

Variables, 2, 8, 11<br />

Variance analysis, 8<br />

Variance ellipse, 109<br />

Variance methods, 82<br />

Variational calculus, 2, 11, 15, 66<br />

Vaysbord, E.M., 90, 94, 99<br />

Vecchi, M.P., 160<br />

Venter, J.H., 20<br />

Vetters, K., 39<br />

Vilis, T., 10<br />

Viswanathan, R., 94<br />

Vitale, P., 84<br />

Vogelsang, R., 5<br />

Vogl, T.P., 24, 48, 93<br />

Voigt, H.-M., 163<br />

Voltaire, F.M., 1<br />

Volume-oriented strategies, 98, 160, 236,<br />

248<br />

Volz, R.A., 12, 70<br />

Wacker, H., 12<br />

Waddington, C.H., 102<br />

Wagner, K., 151, 246<br />

Wald, A., 7, 89<br />

Walford, R.B., 93<br />

Wallack, P., 68<br />

Walsh, J., 27<br />

Walsh, M.J., 102, 105, 151<br />

Ward, L., 58<br />

Wasan, M.T., 19<br />

Wasscher, E.J., 68, 76<br />

Weak minimum, 24, 25, 113, 328, 332,<br />

333<br />

Weber, H.H., 17<br />

Wegge, L., 76<br />

Weierstrass, K., theorem of, 25<br />

Weinberg, F., 18, 91<br />

Weisman, J., 23, 47, 89<br />

Weiss, E.A., 48<br />

Wells, M., 77<br />

Werner, J., 82<br />

Wets, R.J.-B., 19<br />

Wetterling, W., 5<br />

Wheatley, P., 38<br />

Wheeling, R.F., 95, 98<br />

White, L.J., 97<br />

White, R.C., Jr., 93, 95<br />

Whitley, L.D., 152, 155<br />

Whitley, V.W.,76<br />

Whitting, I.J., 84<br />

Whittle, P., 18<br />

Wiedemann, J., 151<br />

Wiener, N., 6<br />

Wierzbicki, A.P., 20<br />

Wilde, D.J., 1, 20, 23, 26, 27, 29, 31{33,<br />

38, 39, 87<br />

Wilf, H.S., 27<br />

Wilkinson, J.H., 14, 75<br />

Wilson, E.O., 146<br />

Wilson, K.B., 6, 65, 68, 89


444 Index<br />

Wilson, S.W., 103<br />

Witt, U., 103<br />

Witte, B.F.W., 67<br />

Witten, I.H., 94<br />

Witzgall, C., 18<br />

Wolfe, P., 19, 23, 39, 66, 70, 82, 84, 166,<br />

360<br />

Wol , W., 103<br />

Wolfowitz, J., 19<br />

Wood, C.F., 47, 48, 89<br />

Woodside, C.M., 70<br />

Wright, S.J., 179<br />

Yates, F., 7<br />

Youden, W.J., 101<br />

Yudin, D.B., 90, 94, 99<br />

Yvon, J.P., 96<br />

Zach, F., 9<br />

Zadeh, N., 41<br />

Zahradnik, R.L., 23<br />

Zakharov, V.V., 94<br />

Zangwill, W.I., 18, 41, 66, 71, 74, 170,<br />

202<br />

Zehnder, C.A., 18, 91<br />

Zeleznik, F.J., 84<br />

Zellnik, H.E., 66, 89<br />

Zener, C., 14<br />

Zerbst, E.W., 105<br />

Zero-one optimization, see binary optimization<br />

Zettl, G., 344, 348<br />

Zhigljavsky, A.A., 91<br />

Zigangirov, K.S., 89<br />

Zilinskas, A., 91<br />

Zoutendijk, G., 18, 70<br />

Zurmuhl, R., 27, 35, 172<br />

Zwart, P.B., 88<br />

Zypkin, Ja.S., see Tsypkin, Ya.Z.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!