Evolution and Optimum Seeking

Evolution and Optimum Seeking 

Hans-Paul Schwefel

Preface 

In 1963 two students at the Technical University of Berlin met and were soon to collaborate 

on experiments which used the wind tunnel of the Institute of Flow Engineering. 

During the search for the optimal shapes of bodies in a ow, which was then a matter 

of laborious intuitive experimentation, the idea was conceived of proceeding strategically. 

However, attempts with the coordinate and simple gradient strategies were unsuccessful. 

Then one of the students, Ingo Rechenberg, now Professor of Bionics and Evolutionary 

Engineering, hit upon the idea of trying random changes in the parameters de ning the 

shape, following the example of natural mutations. The evolution strategy was born. A 

third student, Peter Bienert, joined them and started the construction of an automatic 

experimenter, which would work according to the simple rules of mutation and selection. 

The second student, I myself, set about testing the e ciency of the new methods with 

the help of a Zuse Z23 computer for there were plenty of objections to these random 

strategies. In spite of an occasional lack of nancial support, the Evolutionary Engineering 

Group which had been formed held rmly together. Ingo Rechenberg received his 

doctorate in 1970 for the seminal thesis: Optimierung technischer Systeme nach Prinzipien 

der biologischen Evolution. It contains the theory of the two membered evolution 

strategy and a rst proposal for a multimembered strategy, which in the nomenclature introduced 

here, is of the ( +1) type. In the same year nancial support from the Deutsche 

Forschungsgemeinschaft (German Research Association) enabled the initiation of the work 

which comprises most of the present book. This work was concluded, at least temporarily, 

in 1974 with the thesis Evolutionsstrategie und numerische Optimierung and published 

by Birkhauser, Basle, Switzerland, in 1977 under the title Numerische Optimierung von 

Computer-Modellen mittels der Evolutionsstrategie as well as by Wiley, Chichester, in 

1981 as monograph Numerical optimization of computer models. 

Between 1976 and 1985 the author was not able to continue his work in the eld of 

Evolution Strategies (nowadays abbreviated: ESs). The general interest in this type of 

optimum seeking algorithms was not broad enough for there to be nancial support. On 

the other hand, the number of articles, journals, and books devoted to (mathematical) 

optimization has increased tremendously. 

Looking back upon the development from 1964 on, when the rst ES version was devoted 

to experimental optimization, i.e., upon 30 years, or roughly one human generation, reveals 

three interesting facts: 

First, ESs are not at all outdated. On the contrary, three consecutive conferences 

on Parallel Problem Solving from Nature (PPSN) in 1990 (see Schwefel and 

Manner, 1991), 1992 (Manner and Manderick, 1992), and 1994 (Davidor, Schwefel, 

and Manner, 1994) have demonstrated a revived and increasing interest. 

Secondly, the computational environment has changed over time, not only with 

respect to the number of (also personal) computers and their data processing power, 

but even more with respect to new architectures. MIMD (Multiple Instructions 

v

vi 

Multiple Data) machines with many processors working in parallel for one task 

seem to wait for inherently parallel problem solving concepts like ESs. Biological 

metaphors prevail within the new branch of Arti cial Intelligence, called Arti cial 

Life (AL). 

Third, updating this dissertation from 1974/1975 once more (after adding only a few 

pages to Chapter 7 in 1981) can be done without rewriting the bulk of the chapters on 

traditional approaches. Since the emphasis always has been centered on derivativefree 

direct optimum-seeking methods, it should be su cient to add material on three 

concepts now, i.e., Genetic Algorithms (GAs), Simulated Annealing (SA), and Tabu 

Search (TS). This was done with the new Sections 5.3 to 5.5 in Chapter 5. 

Another innovation is a oppy disk with all those procedures which had been used for the 

test series in the 1970s, along with a users' manual. Hopefully, some incorrectnesses have 

been deleted now, too. 

A rst thank goes again to my friend Dr. Mike Finnis whose translation of my German 

original into English still forms the core of this book. Thanks go also to those 

who helped me in completing this update, especially Ms. Heike Bracklo, who brought 

the scanned ASCII text into LaTeX formats, Mr. Ulrich Hermes, Mr. Jorn Mehnen, and 

Mr. Joachim Sprave forthemany graphs and ready for use computer programs, as well as 

all those who helped in the process of proofreading the complete work. Finally, Iwould 

like to thank the Wiley team for the fruitful collaboration during the process of editing 

the camera-ready script. 

Dortmund, Autumn 1994 Hans-Paul Schwefel

Contents 

Preface v 

1 Introduction 1 

2 Problems and Methods of Optimization 5 

2.1 General Statement of the Problems : : : : : : : : : : : : : : : : : : : : : : 5 

2.2 Particular Problems and Methods of Solution : : : : : : : : : : : : : : : : 6 

2.2.1 Experimental Versus Mathematical Optimization : : : : : : : : : : 6 

2.2.2 Static Versus Dynamic Optimization : : : : : : : : : : : : : : : : : 9 

2.2.3 Parameter Versus Functional Optimization : : : : : : : : : : : : : : 10 

2.2.4 Direct (Numerical) Versus 

Indirect (Analytic) Optimization : : : : : : : : : : : : : : : : : : : 13 

2.2.5 Constrained Versus Unconstrained Optimization : : : : : : : : : : : 16 

2.3 Other Special Cases : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 

3 Hill climbing Strategies 23 

3.1 One Dimensional Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : 25 

3.1.1 Simultaneous Methods : : : : : : : : : : : : : : : : : : : : : : : : : 26 

3.1.2 Sequential Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : 27 

3.1.2.1 Boxing in the Minimum : : : : : : : : : : : : : : : : : : : 28 

3.1.2.2 Interval Division Methods : : : : : : : : : : : : : : : : : : 29 

3.1.2.2.1 Fibonacci Division. : : : : : : : : : : : : : : : : : 29 

3.1.2.2.2 The Golden Section. : : : : : : : : : : : : : : : : 32 

3.1.2.3 Interpolation Methods : : : : : : : : : : : : : : : : : : : : 33 

3.1.2.3.1 Regula Falsi Iteration. : : : : : : : : : : : : : : : 34 

3.1.2.3.2 Newton-Raphson Iteration. : : : : : : : : : : : : 35 

3.1.2.3.3 Lagrangian Interpolation. : : : : : : : : : : : : : 35 

3.1.2.3.4 Hermitian Interpolation. : : : : : : : : : : : : : : 37 

3.2 Multidimensional Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : 38 

3.2.1 Direct Search Strategies : : : : : : : : : : : : : : : : : : : : : : : : 40 

3.2.1.1 Coordinate Strategy : : : : : : : : : : : : : : : : : : : : : 41 

3.2.1.2 Strategy of Hooke andJeeves: Pattern Search : : : : : : : 44 

3.2.1.3 Strategy of Rosenbrock: Rotating Coordinates : : : : : : : 48 

3.2.1.4 Strategy of Davies, Swann, and Campey (DSC) : : : : : : 54 

3.2.1.5 Simplex Strategy of Nelder and Mead : : : : : : : : : : : 57 

vii

viii 

3.2.1.6 Complex Strategy of Box : : : : : : : : : : : : : : : : : : 61 

3.2.2 Gradient Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : : 65 

3.2.2.1 Strategy of Powell: Conjugate Directions : : : : : : : : : : 69 

3.2.3 Newton Strategies : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74 

3.2.3.1 DFP: Davidon-Fletcher-Powell Method 

(Quasi-Newton Strategy, Variable Metric Strategy) : : : : 77 

3.2.3.2 Strategy of Stewart: 

Derivative-free Variable Metric Method : : : : : : : : : : : 78 

3.2.3.3 Further Extensions : : : : : : : : : : : : : : : : : : : : : : 81 

4 Random Strategies 87 

5 Evolution Strategies for Numerical Optimization 105 

5.1 The Two Membered Evolution Strategy : : : : : : : : : : : : : : : : : : : :105 

5.1.1 The Basic Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : :106 

5.1.2 The Step Length Control : : : : : : : : : : : : : : : : : : : : : : : :110 

5.1.3 The Convergence Criterion : : : : : : : : : : : : : : : : : : : : : : :113 

5.1.4 The Treatment of Constraints : : : : : : : : : : : : : : : : : : : : :115 

5.1.5 Further Details of the Subroutine EVOL : : : : : : : : : : : : : : :115 

5.2 A Multimembered Evolution Strategy : : : : : : : : : : : : : : : : : : : : :118 

5.2.1 The Basic Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : :118 

5.2.2 The Rate of Progress of the (1 , )Evolution Strategy : : : : : : : :120 

5.2.2.1 The Linear Model (Inclined Plane) : : : : : : : : : : : : :124 

5.2.2.2 The Sphere Model : : : : : : : : : : : : : : : : : : : : : :127 

5.2.2.3 The Corridor Model : : : : : : : : : : : : : : : : : : : : :134 

5.2.3 The Step Length Control : : : : : : : : : : : : : : : : : : : : : : : :142 

5.2.4 The Convergence Criterion for >1Parents : : : : : : : : : : : : :145 

5.2.5 Scaling of the Variables by Recombination : : : : : : : : : : : : : :146 

5.2.6 Global Convergence : : : : : : : : : : : : : : : : : : : : : : : : : : :149 

5.2.7 Program Details of the ( + ) ES Subroutines : : : : : : : : : : : :149 

5.3 Genetic Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :151 

5.3.1 The Canonical Genetic Algorithm for Parameter Optimization : : :152 

5.3.2 Representation of Individuals : : : : : : : : : : : : : : : : : : : : :153 

5.3.3 Recombination and Mutation : : : : : : : : : : : : : : : : : : : : :155 

5.3.4 Reproduction and Selection : : : : : : : : : : : : : : : : : : : : : :157 

5.3.5 Further Remarks : : : : : : : : : : : : : : : : : : : : : : : : : : : :158 

5.4 Simulated Annealing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :160 

5.5 Tabu Search and Other Hybrid Concepts : : : : : : : : : : : : : : : : : : :162 

6 Comparison of Direct Search Strategies for Parameter Optimization 165 

6.1 Di culties : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :165 

6.2 Theoretical Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :166 

6.2.1 Proofs of Convergence : : : : : : : : : : : : : : : : : : : : : : : : :167 

6.2.2 Rates of Convergence : : : : : : : : : : : : : : : : : : : : : : : : : :168 

6.2.3 Q-Properties : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :169

6.2.4 Computing Demands : : : : : : : : : : : : : : : : : : : : : : : : : :170 

6.3 Numerical Comparison of Strategies : : : : : : : : : : : : : : : : : : : : : :173 

6.3.1 Computer Used : : : : : : : : : : : : : : : : : : : : : : : : : : : : :174 

6.3.2 Optimization Methods Tested : : : : : : : : : : : : : : : : : : : : :175 

6.3.3 Results of the Tests : : : : : : : : : : : : : : : : : : : : : : : : : : :179 

6.3.3.1 First Test: Convergence Rates 

for a Quadratic Objective Function : : : : : : : : : : : : :179 

6.3.3.2 Second Test: Reliability : : : : : : : : : : : : : : : : : : :204 

6.3.3.3 Third Test: Non-Quadratic Problems 

with Many Variables : : : : : : : : : : : : : : : : : : : : :217 

6.4 Core storage required : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :232 

7 Summary and Outlook 235 

8 References 249 

Appendices 325 

A Catalogue of Problems 325 

A.1 Test Problems for the First Part of the Strategy Comparison : : : : : : : :325 

A.2 Test Problems for the Second Part of the Strategy Comparison : : : : : : :327 

A.3 Test Problems for the Third Part of the Strategy Comparison : : : : : : :361 

B Program Codes 367 

B.1 (1+1) Evolution Strategy EVOL : : : : : : : : : : : : : : : : : : : : : : : :367 

B.2 ( , )Evolution Strategies GRUP and REKO : : : : : : : : : : : : : : : :375 

B.3 ( + )Evolution Strategy KORR : : : : : : : : : : : : : : : : : : : : : : :386 

C Programs 415 

C.1 Contents of the Floppy Disk : : : : : : : : : : : : : : : : : : : : : : : : : :415 

C.2 About the Program Disk : : : : : : : : : : : : : : : : : : : : : : : : : : : :416 

C.3 Running the C Programs : : : : : : : : : : : : : : : : : : : : : : : : : : : :417 

C.3.1 How to Install OptimA on a PC Using LINUX 

or on a UNIX Workstation : : : : : : : : : : : : : : : : : : : : : : :417 

C.3.2 How to Install OptimA on a PC Under DOS : : : : : : : : : : : :418 

C.3.3 Running OptimA : : : : : : : : : : : : : : : : : : : : : : : : : : : :418 

C.4 Description of the Programs : : : : : : : : : : : : : : : : : : : : : : : : : :418 

C.4.1 How to Incorporate New Functions : : : : : : : : : : : : : : : : : :419 

C.5 Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :421 

C.5.1 An Application of the Multimembered Evolution Strategy 

to the Corridor Model : : : : : : : : : : : : : : : : : : : : : : : : :421 

C.5.2 OptimA Working in Batch Mode : : : : : : : : : : : : : : : : : : :422 

Index 425 

ix

Chapter 1 

Introduction 

There is scarcely a modern journal, whether of engineering, economics, management, 

mathematics, physics or the social sciences, in which the concept optimization is missing 

from the subject index. If one abstracts from all specialist points of view, the recurring 

problem is to select a better or best (Leibniz, 1710 eventually, heintroduced the term 

optimal ) alternative fromamonganumber of possible states of a airs. However, if one 

were to follow thehypothesis of Leibniz, as presented in his Theodicee, that our world 

is the best of all possible worlds, one could justi ably sink into passive fatalism. There 

would be nothing to improve or to optimize. 

Biology, especially since Darwin, has replaced the static world picture of Leibniz' time 

by a dynamic one, that of the more or less gradual development of the species culminating 

in the appearance of man. Paleontology is providing an increasingly complete picture 

of organic evolution. So-called missing links repeatedly turn out to be not missing, but 

rather hitherto undiscovered stages of this process. Very much older than the recognition 

that man is the result (or better, intermediate state) of a meliorization process is 

the seldom-questioned assumption that he is a perfect end product, the \pinnacle of creation." 

Furthermore, long before man conceived of himself as an active participant in 

the development of things, he had unconsciously in uenced this evolution. There can be 

no doubt that his ability and e orts to make the environment meet his needs raised him 

above other forms of life and have enabled him, despite physical inferiority, to nd,to 

hold, and to extend his place in the world{so far at least. As long as mankind has existed 

on our planet, spaceship earth, we, together with other species have mutually in uenced 

and changed our environment. Has this always been done in the sense of meliorization? 

In 1759, the French philosopher Voltaire (1759), dissatis ed with the conditions of his 

age, was already taking up arms against Leibniz' philosophical optimism and calling for 

conscious e ort to change the state of a airs. In the same way today, whenwe optimize 

we nd that we are both the subject and object of the history of development. In the 

desire to improve an object, a process, or a system, Wilde and Beightler (1967) see an 

expression of the human striving for perfection. Whether such alofty goal can be attained 

depends on many conditions. 

It is not possible to optimize when there is only one way to carry out a task{then one 

has no alternative. If it is not even known whether the problem at hand is soluble, the 

1

2 Introduction 

situation calls for an invention or discovery and not, at that stage, for any optimization. 

But wherever two or more solutions exist and one must decide upon one of them, one 

should choose the best, that is to say optimize. Those independent features that distinguish 

the results from one another are called (independent) variables or parameters of the 

object or system under consideration they may be represented as binary, integer, otherwise 

discrete, or real values. A rational decision between the real or imagined variants 

presupposes a value judgement, which requires a scale of values, a quantitative criterion 

of merit, according to which one solution can be classi ed as better, another as worse. 

This dependent variable is usually called an objective (function) because it depends on 

the objective of the system{the goal to be attained with it{and is functionally related to 

the parameters. There may even exist several objectives at the same time{the normal 

case in living systems where the mix of objectives also changes over time and may, in fact, 

be induced by the actual course of the evolutionary paths themselves. 

Sometimes the hardest part of optimization is to de ne clearly an objective function. 

For instance, if several subgoals are aimed at, a relativeweightmust be attached to each of 

the individual criteria. If these are contradictory one only can hope to nd a compromise 

on a trade-o subset of non-dominated solutions. Variability and distinct order of merit 

are the unavoidable conditions of any optimization. One may sometimes also think one 

has found the right objective for a subsystem, only to realize later that, in doing so, 

one has provoked unwanted side e ects, the rami cations of which have worsened the 

disregarded total objective function. We are just now experiencing how narrow-minded 

scales of value can steer us into dangerous plights, and how it is sometimes necessary to 

consider the whole Earth as a system, even if this is where di erences of opinion about 

value criteria are the greatest. 

The second di culty in optimization, particularly of multiparameter objectives or 

processes, lies in the choice or design of a suitable strategy for proceeding. Even when the 

objective has been su ciently clearly de ned, indeed even when the functional dependence 

on the independent variables has been mathematically (or computationally) formulated, 

it often remains di cult enough, if not completely impossible, to nd the optimum, 

especially in the time available. 

The uninitiated often think that it must be an easy matter to solve a problem expressed 

in the language of mathematics, that most exact of all sciences. Far from it: The problem 

of how to solve problems is unsolved{and mathematicians have beenworking on it for 

centuries. For giving exact answers to questions of extremal values and corresponding 

positions (or conditions) we are indebted, for example, to the di erential and variational 

calculus, ofwhich the development in the 18th century is associated with such illustrious 

names as Newton, Euler, Lagrange, and Bernoulli. These constitute the foundations of 

the present methods referred to as classical, and of the further developments in the theory 

of optimization. Still, there is often a long way from the theory, which is concerned 

with establishing necessary (and su cient) conditions for the existence of minima and 

maxima, to the practice, the determination of these most desirable conditions. Practically 

signi cant solutions of optimization problems rst became possible with the arrival of 

(large and) fast programmable computers in the mid-20th century. Since then the ood 

of publications on the subject of optimization has been steadily rising in volume it is a

Introduction 3 

simple matter to collect several thousand published articles about optimization methods. 

Even an interested party nds it di cult to keep pace nowadays with the development 

that is going on. It seems far from being over, for there still exists no all-embracing theory 

of optimization, nor is there any universal method of solution. Thus it is appropriate, in 

Chapter 2, to give a general survey of optimization problems and methods. The special 

rôle of direct, static, non-discrete, and non-stochastic parameter optimization emerges 

here, for many of these methods can be transferred to other elds the converse is less often 

possible. In Chapter 3, some of these strategies are presented in more depth, principally 

those that extract the information they require only from values of the objective function, 

thatistosay without recourse to analytic partial derivatives (derivative-free methods). 

Methods of a probabilistic nature are omitted here. 

Methods which use chance as an aid to decision making, are treated separately in 

Chapter 4. In numerical optimization, chance is simulated deterministically by means of 

a pseudorandom number generator able to produce some kind of deterministic chaos only. 

One of the random strategies proves to be extremely promising. It imitates, in a highly 

simpli ed manner, the mutation-selection game of nature. This concept, a two membered 

evolution strategy, is formulated in a manner suitable for numerical optimization in Chapter 

5, Section 5.1. Following the hypothesis put forward by Rechenberg, that biological 

evolution is, or possesses, an especially advantageous optimization process and is therefore 

worthy of imitation, an extended multimembered scheme that imitates the population 

principle of evolution is introduced in Chapter 5, Section 5.2. It permits a more natural 

as well as more e ective speci cation of the step lengths than the two membered scheme 

and actually invites the addition of further evolutionary principles, such as, for example, 

sexual propagation and recombination. An approximate theory of the rate of convergence 

can also be set up for the (1 , )evolution strategy, inwhich only the best of descendants 

of a generation become parents of the following one. 

A short excursion, new to this edition, introduces nearly concurrent developments 

that the author was unaware of when compiling his dissertation in the early 1970s, i.e., 

genetic algorithms, simulated annealing, andtabu search. 

Chapter 6 then makes a comparison of the evolution strategies with the direct search 

methods of zero, rst, and second order, which were treated in detail in Chapter 3. 

Since the predictive power of theoretical proofs of convergence and statements of rates 

of convergence is limited to simple problem structures, the comparison includes mainly 

numerical tests employing various model objective functions. The results are evaluated 

from two points of view: 

E ciency, or speed of approach to the objective 

E ectivity, or reliability under varying conditions 

The evolution strategies are highly successful in the test of e ectivity orrobustness. 

Contrary to the widely held opinion that biological evolution is a very wasteful method 

of optimization, the convergence rate test shows that, in this respect too, the evolution 

methods can hold their own and are sometimes even more e cient than many purely 

deterministic methods. The circle is closed in Chapter 7, where the analogy between

4 Introduction 

iterative optimization and evolution is raised once again for discussion, with a look at 

some natural improvements and extensions of the concept of the evolution strategy. 

The list of test problems that were used can be found in Appendix A, and FORTRAN 

codes of the evolution strategies, with detailed guidance for users, are in Appendix B. 

Finally, Appendix C explains how to use the C and FORTRAN programs on the oppy 

disk.

Chapter 2 

Problems and Methods of 

Optimization 

2.1 General Statement of the Problems 

According to whether one emphasizes the theoretical aspect (existence conditions of optimal 

solutions) or the practical (procedures for reaching optima), optimization nowadays 

is classi ed as a branch of applied or numerical mathematics, operations research, orof 

computer-assisted systems (engineering) design. In fact many optimization methods are 

based on principles which were developed in linear and non-linear algebra. Whereas for 

equations, or systems of equations, the problem is to determine a quantity or set of quantities 

such that functions which depend on them have speci ed values, in the case of an 

optimization problem, an initially unknown extremal value is sought. Many of the current 

methods of solution of systems of linear equations start with an approximation and successively 

improve itby minimizing the deviation from the required value. For non-linear 

equations and for incomplete or overdetermined systems this way of proceeding is actually 

essential (Ortega and Rheinboldt, 1970). Thus many seemingly quite di erent and apparently 

unrelated problems turn out, after a suitable reformulation, to be optimization 

problems. 

Into this class come, for example, the solution of di erential equations (boundary 

and initial value problems) and eigenvalue problems, as well as problems of observational 

calculus, adaptation, and approximation (Stiefel, 1965 Schwarz, Rutishauser, and Stiefel, 

1968 Collatz and Wetterling, 1971). In the rst case, the basic problem again is to 

solve equations in the second, the problem is often reduced to minimize deviations in the 

Gaussian sense (sum of squares of residues) or the Tchebyche sense (maximum of the 

absolute residues). Even game theory (Vogelsang, 1963) and pattern or shape recognition 

as a branch of information theory (Andrews, 1972 Niemann, 1974) have features in 

common with the theory of optimization. In one case, from among a stored set of idealized 

types, a pattern will be sought that has the maximum similarity to the one presented in 

another case, the search will be for optimal courses of action in con ict situations. Here, 

two or more interests are competing. Each player tries to maximize his chanceofwinning 

with regard to the way in which his opponent supposedly plays. Most optimization 

5

6 Problems and Methods of Optimization 

problems, however, are characterized by a single interest, to reach an objectivethatisnot 

in uenced by others. 

The engineering aspect of optimization has manifested itself especially clearly with the 

design of learning robots, which have toadapt their operation to the prevailing conditions 

(see for example Feldbaum, 1962 Zypkin, 1970). The feedback between the environment 

and the behavior of the robot is e ected here by a program, a strategy, which can perhaps 

even alter itself. Wiener (1963) goes even further and considers self-reproducing machines, 

thus arriving at a consideration of robots that are similar to living beings. Computers 

are often regarded as the most highly developed robots, and it is therefore tempting to 

make comparisons with the human brain and its neurons and synapses (von Neumann, 

1960, 1966 Marfeld, 1970 Steinbuch, 1971). They are nowadays the most important aid 

to optimization, and many problems are intractable without them. 

2.2 Particular Problems and Methods of Solution 

The lack of a universal method of optimization has led to the present availability of 

numerous procedures that eachhave only limited application to special cases. No attempt 

will be made here to list them all. A short survey should help to distinguish the parameter 

optimization strategies, treated in detail later, from the other procedures, but while at 

the same time exhibiting some features they have incommon. The chosen scheme of 

presentation is to discuss two opposing concepts together. 

2.2.1 Experimental Versus Mathematical Optimization 

If the functional relation between the variables and the objective function is unknown, 

one is forced to experiment either on the real object or on a scale model. To dosoone 

must be as free as possible to vary the independentvariables and have access to measuring 

instruments with which the dependent variable, the quality, can be measured. Systematic 

investigation of all possible states of the system will be too costly if there are many 

variables, and random sampling of various combinations is too unreliable for achieving 

the desired result. A procedure must be signi cantly more e ective if it systematically 

exploits whatever information is retained about preceding attempts. Such a plan is also 

called a strategy. The concept originated in game theory and was formulated by von 

Neumann and Morgenstern (1961). 

Many of the search strategies of mathematical optimization to be discussed later 

were also applied under experimental conditions{not always successfully. An important 

characteristic of the experiment is the unavoidable e ect of (stochastic) disturbances on 

the measured results. A good experimental optimization strategy has to take account of 

this fact and approach the desired extremum with the least possible cost in attempts. 

Two methods in particular are most frequently mentioned in this connection: 

The EVOP (evolutionary operation) method proposed by G.E.P.Box (1957), a 

development of the experimental gradient method of Box and Wilson (1951) 

The strategy of arti cial evolution designed by Rechenberg (1964)

Particular Problems and Methods of Solution 7 

The algorithm of Rechenberg's evolution strategy will be treated in detail in Chapter 5. 

In the experimental eld it has often been applied successfully: for example, to the solution 

of multiparameter shaping problems (Rechenberg, 1964 Schwefel, 1968 Klockgether and 

Schwefel, 1970). All variables are simultaneously changed by a small random amount. 

The changes are (binomially or) normally distributed. The expected value of the random 

vector is zero (for all components). Failures leave the starting condition unchanged, 

only successes are adopted. Stochastic disturbances or perturbations, brought about by 

errors of measurement, do not a ect the reliability but in uence the speed of convergence 

according to their magnitude. Rechenberg (1973) gives rules for the optimal choice of a 

common variance of the probability density distribution of the random changes for both 

the unperturbed and the perturbed cases. 

The EVOP method of G. E. P. Boxchanges only two or three parameters at a time{if 

possible those which have the strongest in uence. A square or cube is constructed with 

an initial condition at its midpoint its 2 2 = 4 or 2 3 = 8 corners represent the points in 

a cycle of trials. These deterministically established states are tested sequentially, several 

times if perturbations are acting. The state with the best result becomes the midpoint 

of the next pattern of points. Under some conditions, one also changes the scaling of 

the variables or exchanges one or more parameters for others. Details of this altogether 

heuristic way of proceeding are described by Box and Draper (1969, 1987). The method 

has mainly been applied to the dynamic optimization of chemical processes. Experiments 

are performed on the real system, sometimes over a period of several years. 

The counterpart to experimental optimization is mathematical optimization. The 

functional relation between the criterion of merit or quality and the variables is known, 

at least approximately to put it another way, a more or less simpli ed mathematical 

model of the object, process or system is available. In place of experiments there appears 

the manipulation of variables and the objective function. It is sometimes easy to set up 

a mathematical model, for example if the laws governing the behavior of the physical 

processes involved are known. If, however, these are largely unresearched, as is often the 

case for ecological or economic processes, the work of model building can far exceed that 

of the subsequent optimization. 

Depending on what deliberate in uence one can have on the process, one is either 

restricted to the collection of available data or one can uncover the relationships between 

independent and dependent variables by judiciously planning and interpreting tests. Such 

methods (Cochran and Cox, 1950 Kempthorne, 1952 Davies, 1954 Cox, 1958 Fisher, 

1966 Vajda, 1967 Yates, 1967 John, 1971) were rst applied only to agricultural problems, 

but later spread into industry. Since the analyst is intent on building the best 

possible model with the fewest possible tests, such an analysis itself constitutes an optimization 

problem, just as does the synthesis that follows it. Wald (1966) therefore 

recommends proceeding sequentially, that is to construct a model as a hypothesis from 

initial experiments or given a priori information, and then to improve it in a stepwise 

fashion by a further series of tests, or, alternatively, to sometimes reject the model completely. 

The tting of model parameters to the measured data can be considered as an 

optimization problem insofar as the expected error or maximum risk is to be minimized. 

This is a special case of optimization, called calculus of observations ,whichinvolves sta-


tistical tests like regression and variance analyses on data subject to errors, for which the 

principle of maximum likelihood or minimum 2 is used (see Heinhold and Gaede, 1972). 

The cost of constructing a model of large systems with many variables, or of very 

complicated objects, can become so enormous that it is preferable to get to the desired 

optimal condition by direct variation of the parameters of the process, in other words to 

optimize experimentally. The fact that one tries to analyze the behavior of a model or 

system at all is founded on the hope of understanding the processes more fully and of 

being able to solve thesynthesis problem in a more general way than is possible in the 

case of experimental optimization, which is tied to a particular situation. 

If one has succeeded in setting up a mathematical model of the system under consideration, 

then the optimization problem can be expressed mathematically as follows: 

F (x) =F (x 1x 2:::xn) ! extr 

The round brackets symbolize the functional relationship between the n independent 

variables 

fxi i = 1(1)ng 1 

and the dependent variable F , the quality or objective function. In the following it 

is always a scalar quantity. The variables can be scalars or functions of one or more 

parameters. Whether a maximum or a minimum is sought for is of no consequence for 

the method of optimization because of the relation 

maxfF (x)g = ; minf;F (x)g 

Without loss of generality one can concentrate on one of the types of problem usually 

the minimum problem is considered. Restrictions do arise, insofar as in many practical 

problems the variables cannot be chosen arbitrarily. They are called constraints. The 

simplest of these are the non-negative conditions: 

xi 0 for all i = 1(1)n 

They are formulated more generally like the objective function: 

8 

>< 

Gj(x) =Gj(x1x2:::xn) >: = 

9 

>= 

> 

0 for all j = 1(1)m 

The notation chosen here follows the convention of parameter optimization. One 

distinguishes between equalities and inequalities. Each equality constraint reduces the 

number of true variables of the problem by one. Inequalities, on the other hand, simply 

reduce the size of the allowed space of solutions without altering its dimensionality. The 

sense of the inequality is not critical. Like the interchanging of minimum and maximum 

problems, one can transform one type into the other by reversing the signs of the terms. 

It is su cient to limit consideration to one formulation. For minimum problems this is 

1 The term 1(1)n stands for 1,2,3,...,n.


normally the type Gj(x) 0: Points on the edge of the (closed) allowed space are thereby 

permitted. A di erent situation arises if the constraint isgiven as a strict inequality of 

the form Gj(x) > 0: Then the allowed space can be open if Gj(x) iscontinuous. If for 

Gj(x) 0, with other conditions the same, the minimum lies on the border Gj(x) =0, 

then for Gj(x) > 0, there is no true minimum. One refers here to an in mum, the 

greatest lower bound, at which actually Gj(x) = 0. In analogous fashion one distinguishes 

between maxima andsuprema (smallest upper bounds). Optimization in the following 

means always to nd a maximum or a minimum, perhaps under inequality constraints. 

2.2.2 Static Versus Dynamic Optimization 

The term static optimization means that the optimum is time invariant or stationary. 

It is su cient to determine its position and size once and for all. Once the location 

of the extremum has been found, the search isover. In many cases one cannot control 

all the variables that in uence the objective function. Then it can happen that these 

uncontrollable variables change with time and displace the optimum (non-stationary case). 

The goal of dynamic optimization 2 is therefore to maintain an optimal condition in 

the face of varying conditions of the environment. The search for the extremum becomes 

a more or less continuous process. According to the speed of movement of the optimum, 

it may be necessary, instead of making the slow adjustment of the independent variables 

by hand{as for example in the EVOP method (see Chap. 2, Sect. 2.2.1), to give the task 

to a robot or automaton. 

The automaton and the process together form a control loop. However, unlike conventional 

control loops this one is not required to maintain a desired value of a quantity 

but to discover the most favorable value of an unknown and time-dependent quantity. 

Feldbaum (1962), Frankovic et al. (1970), and Zach (1974) investigate in detail such automatic 

optimization systems, known as extreme value controllers or optimizers. In each 

case they are built around a search process. For only one variable (adjustable setting) a 

variety ofschemes can be designed. It is signi cantly more complicated for an optimal 

value loop when several parameters have tobeadjusted. 

Many of the search methods are so very costly because there is no a priori information 

about the process to be controlled. Hence nowadays one tries to build adaptive control 

systems that use information gathered over a period of time to set up an internal model 

of the system, or that, in a sense, learn. Oldenburger (1966) and, in more detail, Zypkin 

(1970) tackle the problems of learning and self-learning robots. Adaptation is said to take 

place if the change in the control characteristics is made on the basis of measurements 

of those input quantities to the process that cannot be altered{also known as disturbing 

variables. If the output quantities themselves are used (here the objective function) to 

adjust the control system, the process is called self-learning or self-adaptation. The latter 

possibility is more reliable but, because of the time lag, slower. Cybernetic engineering is 

concerned with learning processes in a more general form and always sees or even seeks 

links with natural analogues. 

An example of a robot that adapts itself to the environment is the homeostat of Ashby 

2 Some authors use the term dynamic optimization in a di erent way than is done here.


(1960). Nowadays, however, one does not build one's own optimizer every time there is a 

given problem to be solved. Rather one makes use of so-called process computers, which 

for a new task only need another special program. They can handle large and complicated 

problems and are coupled to the process by sensors and transducers in a closed loop (online) 

(Levine and Vilis, 1973 McGrew and Haimes, 1974). The actual computer usually 

works digitally, so that analogue-digital and digital-analogue converters are required for 

input and output. Process computers are employed for keeping process quantities constant 

and maintaining required pro les as well as for optimization. In the latter case an internal 

model (a computer program) usually serves to determine the optimal process parameters, 

taking account of the latest measured data values in the calculation. 

If the position of the optimum in a dynamic process is shifting very rapidly, the 

manner in which the search process follows the extremum takes on a greater signi cance 

for the overall quality. In this case one has to go about setting up a dynamic model and 

specifying all variables, including the controllable ones, as functions of time. The original 

parameter optimization goes over to functional optimization. 

2.2.3 Parameter Versus Functional Optimization 

The case when not only the objective function but also the independent variables are 

scalar quantities is called parameter optimization. Numerical values 

fx i i = 1(1)ng 

of the variables or parameters are sought for which thevalue of the objective function 

F = F (x ) = extrfF (x)g 

is an optimum. The number of parameters describing a state of the object or system is 

nite. In the simplest case of only one variable (n = 1), the behavior of the objective 

function is readily visualized on a diagram with two orthogonal axes. The value of the 

parameter is plotted on the abscissa and that of the objective function on the ordinate. 

The functional dependence appears as a curve. For n = 2 a three dimensional Cartesian 

coordinate system is required. The state of the system is represented as a point in the 

horizontal plane and the value of the objective function as the vertical height above it. 

A mountain range is obtained, the surface of which expresses the relation of dependent 

to independent variables. To further simplify the representation, the curves of intersection 

between the mountain range and parallel horizontal planes are projected onto the 

base plane, which provides a contour diagram of the objective function. From this three 

dimensional picture and its two dimensional projection, concepts like peak, plateau, valley, 

ridge, and contour line are readily transferred to the n-dimensional case, which is 

otherwise beyond our powers of description and visualization. 

In functional optimization, instead of optimal points in three dimensional Euclidean 

space, optimal trajectories in function spaces (such asBanach or Hilbert space) are to 

be determined. Thus one refers also to in nite dimensional optimization as opposed to 

the nite dimensional parameter optimization. Since the variables to be determined are


themselves functions of one or more parameters, the objective function is a function of a 

function, or a functional. 

A classical problem is to determine the smooth curve down which apoint mass will 

slide between two points in the shortest time, acted upon by the force of gravity and 

without friction. Known as the brachistochrone problem, it can be solved by means of the 

ordinary variational calculus (Courant and Hilbert, 1968a,b Denn, 1969 Clegg, 1970). If 

the functions to be determined depend on several variables it is a multidimensional variational 

problem (Klotzler, 1970). In many cases the time t appears as the only parameter. 

The objective function is commonly an integral, in the integrand of which will appear not 

only the independent variables 

x(t) =fx 1(t)x 2(t):::xn(t)g 

but also their derivatives _xi(t) =@xi=@t and sometimes also the parameter t itself: 

F (x(t)) = 

Z t2 

t1 

f(x(t) _x(t)t) dt ! extr 

Such problems are typical in control theory, where one has to nd optimal controlling 

functions for control processes (e.g., Chang, 1961 Lee, 1964 Leitmann, 1964 Hestenes, 

1966 Balakrishnan and Neustadt, 1967 Karreman, 1968 Demyanov and Rubinov, 1970). 

Whereas the variational calculus and its extensions provide the mathematical basis 

of functional optimization (in the language of control engineering: optimization with distributed 

parameters), parameter optimization (with localized parameters) is based on the 

theory of maxima and minima from the elementary di erential calculus. Consequently 

both branches have followed independent paths of development and become almost separate 

disciplines. The functional analysis theory of Dubovitskii and Milyutin (see Girsanov, 

1972) has bridged the gap between the problems by allowing them to be treated as special 

cases of one fundamental problem, and it could thus lead to a general theory of 

optimization. However di erent their theoretical bases, in cases of practical signi cance 

the problems must be solved on a computer, and the iterative methods employed are then 

broadly the same. 

One of these iterative methods is the dynamic programming or stepwise optimization 

of Bellman (1967). It was originally conceived for the solution of economic problems, in 

which time-dependent variables are changed in a stepwise way at xed points in time. 

The method is a discrete form of functional optimization in which the trajectory sought 

appears as a steplike function. At each step a decision is taken, the sequence of which is 

called a policy. Assuming that the state at a given step depends only on the decision at 

that step and on the preceding state{i.e., there is no feedback{, then dynamic programming 

can be applied. The Bellman optimum principle implies that each piece of the optimal 

trajectory that includes the end point is also optimal. Thus one begins by optimizing the 

nal decision at the transition from the last-but-one to the last step. Nowadays dynamic 

programming is frequently applied to solving discrete problems of optimal control and 

regulation (Gessner and Spremann, 1972 Lerner and Rosenman, 1973). Its advantage 

compared to other, analytic methods is that its algorithm can be formulated as a program 

suitable for digital computers, allowing fairly large problems to be tackled (Gessner and


Wacker, 1972). Bellman's optimum principle can, however, also be expressed in di erential 

form and applied to an area of continuous functional optimization (Jacobson and Mayne, 

1970). 

The principle of stepwise optimization can be applied to problems of parameter optimization, 

if the objective function is separable (Hadley, 1969): that is, it must be expressible 

as a sum of partial objective functions in whichjustoneoravery few variables appear 

at a time. The number of steps (k) corresponds to the number of the partial functions at 

each step a decision is made only on the (`) variables in the partial objective. They are 

also called control or decision variables. Subsidiary conditions (number m) in the form 

of inequalities can be taken into account. The constraint functions, like thevariables, 

are allowed to take a nite number (b) of discrete values and are called state variables. 

The recursion formula for the stepwise optimization will not be discussed here. Only the 

number of required operations (N) in the calculation will be mentioned, which is of the 

order 

N kb m+` 

For this reason the usefulness of dynamic programming is mainly restricted to the case 

` = 1k = n and m = 1. Then at each ofthen steps, just one control variable is 

speci ed with respect to one subsidiary condition. In the other limiting case where all 

variables have to be determined at one step, the normal case of parameter optimization, 

the process goes over to a grid method (complete enumeration) with a computational 

requirementoforderO(b (n+m) ). Herein lies its capability for locating global optima, even 

of complicated multimodal objective functions. However, it is only especially advantageous 

if the structure of the objective function permits the enumeration to be limited to a small 

part of the allowed region. 

Digital computers are poorly suited to solving continuous problems because they cannot 

operate directly with functions. Numerical integration procedures are possible, but 

costly. Analogue computers are more suitable because they can directly imitate dynamic 

processes. Compared to digital computers, however, they have a small numerical range 

and low accuracy and are not so easily programmable. Thus sometimes digital and analogue 

computers are coupled for certain tasks as so-called hybrid computers. With such 

systems a set of di erential equations can be tackled to the same extent as a problem 

in functional optimization (Volz, 1965, 1973). The digital computer takes care of the 

iteration control, while on the analogue computer the di erentiation and integration operations 

are carried out according to the parameters supplied by the digital computer. 

Korn and Korn (1964), and Bekey and Karplus (1971), describe the operations involved 

in trajectory optimization and the solution of di erential equations by means of hybrid 

computers. The fact that random methods are often used for such problems has to do 

with the computational imprecision of the analogue part, with which deterministic processes 

usually fail to cope. If requirements for accuracy are very high, however, purely 

digital computation has to take over, with the consequent greater cost in computation 

time.


2.2.4 Direct (Numerical) Versus 

Indirect (Analytic) Optimization 

The classi cation of mathematical methods of optimization into direct and indirect procedures 

is attributed to Edelbaum (1962). Especially if one has a computer model of a 

system, with which one can perform simulation experiments, the search for a certain set 

of exogenous parameters to generate excellent results asks for robust direct optimization 

methods. Direct or numerical methods are those that approach the solution in a stepwise 

manner (iteratively), at each step (hopefully) improving the value of the objective 

function. If this cannot be guaranteed, a trial and error process results. 

An indirect or analytic procedure attempts to reach the optimum in a single (calculation) 

step, without tests or trials. It is based on the analysis of the special properties of 

the objective function at the position of the extremum. In the simplest case, parameter 

optimization without constraints, one proceeds on the assumption that the tangent plane 

at the optimum is horizontal, i.e., the rst partial derivatives of the objective function 

exist and vanish in x : 

@F 

=0 for all i = 1(1)n (2.1) 

@xi x=x 

This system of equations can be expressed with the so-called Nabla operator (r) asa 

single vector equation for the stationary point x : 

rF (x )=0 (2.2) 

Equation (2.1) or (2.2) transforms the original optimization problem into a problem of 

solving a set of, perhaps non-linear, simultaneous equations. If F (x) or one or more of its 

derivatives are not continuous, there may be extrema that do not satisfy the otherwise 

necessary conditions. On the other hand not every point inIR n {the n-dimensional space 

of real variables{ that satis es conditions (2.1) need be a minimum it could also be a 

maximum or a saddle point. Equation (2.2) is referred to as a necessary condition for the 

existence of a local minimum. 

To givesu cient conditions requires further processes of di erentiation. In fact, 

di erentiations must be carried out until the determinant of the matrix of the second 

or higher partial derivatives at the point x is non-zero. Things remain simple in the case 

of only one variable, when it is required that the lowest order non-vanishing derivative is 

positive and of even order. Then and only then is there a minimum. If the derivative is 

negative, x represents a maximum. A saddle point exists if the order is odd. 

For n variables, at least the n 

(n +1) second partial derivatives 

2 

@2F (x) 

for all i j = 1(1)n 

@xi @xj 

must exist at the point x . The determinant of the Hessian matrix r 2 F (x )must be 

positive, as well as the further n ; 1 principle subdeterminants of this matrix. While 

MacLaurin had already completely formulated the su cient conditions for the existence 

of minima and maxima of one parameter functions in 1742, the corresponding theory


for functions of several variables was only completed nearly 150 years later by Schee er 

(1886) and Stolz (1893) (see also Hancock, 1960). 

Su cient conditions can only be applied to check a solution that was obtained from 

the necessary conditions. The analytic path thus always leads the original optimization 

problem back to the problem of solving a system of simultaneous equations (Equation 

(2.2)). If the objective function is of second order, one is dealing with a linear system, 

which can be solved with the aid of one of the usual methods of linear algebra. Even if noniterative 

procedures are used, such astheGaussian elimination algorithm or the matrix 

decomposition method of Cholesky, this cannot be done with a single-step calculation. 

Rather the number of operations grows as O(n 3 ): With fast digital computers it is certainly 

a routine matter to solve systems of equations with even thousands of variables however, 

the inevitable rounding errors mean that complete accuracy is never achieved (Broyden, 

1973). 

One can normally be satis ed with a su ciently good approximation. Here relaxation 

methods, which are iterative, show themselves to be comparable or superior. It depends in 

detail on the structure of the coe cient matrix. Starting from an initial approximation, 

the error as measured by the residues of the equations is minimized. Relaxation procedures 

are therefore basically optimization methods but of a special kind, since the value of the 

objective function at the optimum is known beforehand. This a priori information can be 

exploited to make savings in the computations, as can the fact that each component of 

the residue vector must individually go to zero (e.g., Traub, 1964 Wilkinson and Reinsch, 

1971 Hestenes, 1973 Hestenes and Stein, 1973). 

Objective functions having terms or members of higher than second order lead to 

non-linear equations as the necessary conditions for the existence of extrema. In this 

case, the stepwise approach tothenull position is essential, e.g., with the interpolation 

method, which was conceived in its original form by Newton (Chap. 3, Sect. 3.1.2.3.2). 

The equations are linearized about the current approximation point. Linear relations 

for the correcting terms are then obtained. In this way a complete system of n linear 

equations has to be solved at each step of the iteration. Occasionally a more convenient 

approach is to search for the minimum of the function 

~F (x) = 

nX 

i=1 

with the help of a direct optimization method. Besides the fact that ~ F(x) goes to zero, not 

only at the sought for minimum of F (x) but also at its maxima and saddle points, it can 

sometimes yield non-zero minima of no interest for the solution of the original problem. 

Thus it is often preferable not to proceed via the conditions of Equation (2.2) but to 

minimize F (x) directly. Only in special cases do indirect methods lead to faster, more 

elegant solutions than direct methods. Such is, for example, the case if the necessary 

existence condition for minima with one variable leads to an algebraic equation, and 

sectioning algorithms like the computational scheme of Horner can be used or if objective 

functions are in the form of so-called posynomes, for which Du n, Peterson, and Zener 

(1967) devised geometric programming, anentirely indirect method. 

@F 

@xi 

! 2


Subsidiary conditions, or constraints, complicate matters. In rare cases equality constraints 

can be expressed as equations in one variable, that can be eliminated from the 

objective function, or constraints in the form of inequalities can be made super uous 

by a transformation of the variables. Otherwise there are the methods of bounded variation 

and Lagrange multipliers, in addition to penalty functions and the procedures of 

mathematical programming. 

The situation is very similar for functional optimization, except that here the indirect 

methods are still dominanteven today. Thevariational calculus provides as conditions for 

optima di erential instead of ordinary equations{actually ordinary di erential equations 

(Euler-Lagrange) or partial di erential equations (Hamilton-Jacobi). In only a few cases 

can such a system be solved in a straightforward way for the unknown functions. One 

must usually resort again to the help of a computer. Whether it is advantageous to 

use a digital or an analogue computer depends on the problem. It is a matter of speed 

versus accuracy. Ahybrid system often turns out to be especially useful. If, however, the 

solution cannot be found by a purely analytic route, why not choose from the start the 

direct procedure also for functional optimization? In fact with the increasing complexity 

of practical problems in numerical optimization, this eld is becoming more important, as 

illustrated by the work of Daniel (1969), who takes over methods without derivatives from 

parameter optimization and applies them to the optimization of functionals. An important 

point in this is the discretization or parameterization of the originally continuous problem, 

which canbeachieved in at least two ways: 

By approximation of the desired functions using a sum of suitable known functions or 

polynomials, so that only the coe cients of these remain to be determined (Sirisena, 

1973) 

By approximation of the desired functions using step functions or sides of polygons, 

so that only heights and positions of the connecting points remain to be determined 

Recasting a functional into a parameter optimization problem has the great advantage 

that a digital computer can be used straightaway to nd the solution numerically. The 

disadvantage that the result only represents a suboptimum is often not serious in practice, 

because the assumed values of parameters of the process are themselves not exactly 

known (Dixon, 1972a). The experimentally determined numbers are prone to errors or 

to statistical uncertainties. In any case, large and complicated functional optimization 

problems cannot be completely solved by the indirect route. 

The direct procedure can either start directly with the functional to be minimized, if 

the integration over the substituted function can be carried out (Rayleigh-Ritz method) 

or with the necessary conditions, the di erential equations, which specify the optimum. In 

the latter case the integral is replaced by a nite sum of terms (Beveridge and Schechter, 

1970). In this situation gradient methods are readily applied (Kelley, 1962 Klessig and 

Polak, 1973). The detailed way to proceed depends very much on the subsidiary conditions 

or constraints of the problem.


2.2.5 Constrained Versus Unconstrained Optimization 

Special techniques have been developed for handling problems of optimization with constraints. 

In parameter optimization these are the methods of penalty functions and mathematical 

programming. In the rst case a modi ed objective function is set up, which 

For the minimum problem takes the value F (x) = +1 in the forbidden region, 

but which remains unchanged in the allowed (feasible) region (barrier method e.g., 

used within the evolution strategies, see Chap. 5) 

Only near the boundary inside the allowed region, yields values di erent from F (x) 

and thus keeps the search at a distance from the edge (partial penalty function e.g., 

used within Rosenbrock's strategy, see Chap. 3, Sect. 3.2.1.3) 

Di ers from F (x) over the whole space spanned by thevariables (global penalty 

function) 

This last is the most common way of treating constraints in the form of inequalities. The 

main ideas here are due to Carroll (1961 created response surface technique) and to Fiacco 

and McCormick (1964, 1990 SUMT, sequential unconstrained minimization technique). 

For the problem 

F (x) ! min 

Gj(x) 0 for all j = 1(1)m 

Hk(x) = 0 for all k = 1(1)` 

the penalty function is of the form (with rvjwk > 0 and Gj > 0) 

~F (x) =F (x)+r 

mX 

j=1 

vj 1 

+ 

Gj(x) r 

`X 

k=1 

wk [Hk(x)] 2 

The coe cients vj and wk are weighting factors for the individual constraints and r is a 

free parameter. The optimum of ~F(x) will depend on the choice of r, so it is necessary 

to alter r in a stepwise way. The original extreme value problem is thereby solved by a 

sequence of optimizations in which r is gradually reduced to zero. One can hope in this 

way at least to nd good approximations for the required minimum problem within a 

nite sequence of optimizations. 

The choice of suitable values for r is not, however, easy. Fiacco (1974) and Fiacco 

and McCormick (1968, 1990) give some indications, and also suggest further possibilities 

for penalty functions. These procedures are usually applied in conjunction with gradient 

methods. The hemstitching method and the riding the constraints method of Roberts and 

Lyvers (1961) work by changing the chosen direction whenever a constraint is violated, 

without using a modi ed objective function. They orient themselves with respect to 

the gradient of the objective and the derivatives of the constraint functions (Jacobian 

matrix). In hemstitching, there is always a return into the feasible region, while in riding 

the constraints the search runs along the active constraint boundaries. The variables are


reset into the allowed region by thecomplex method of M. J. Box (1965) (a direct search 

strategy) whenever explicit bounds are crossed. Implicit constraints on the other hand 

are treated as barriers (see Chap. 3, Sect. 3.2.1.6). 

The methods of mathematical programming, both linear and non-linear, treat the 

constraints as the main aspect of the problem. They were specially evolved for operations 

research (Muller-Merbach, 1971) and assume that all variables must always be positive. 

Such non-negativity conditions allow special solution procedures to be developed. The 

simplest models of economic processes are linear. There are often no better ones available. 

For this purpose Dantzig (1966) developed the simplex method of linear programming (see 

also Krelle and Kunzi, 1958 Hadley, 1962 Weber, 1972). 

The linear constraints, together with the condition on the signs of the variables, span 

the feasible region in the form of a polygon (for n = 2) or a polyhedron, sometimes called 

simplex. Since the objective function is also linear, except in special cases, the desired 

extremum must lie in a corner of the polyhedron. It is therefore su cient just to examine 

the corners. The simplex method of Dantzig does this in a particularly economic way, since 

only those corners are considered in which the objective function has progressively better 

values. It can even be thought of as a gradient method along the edges of the polyhedron. 

It can be applied in a straightforward way to manyhundreds, even thousands, of variables 

and constraints. For very large problems, which mayhave a particular structure, special 

methods have also been developed (Kunzi and Tan, 1966 Kunzi, 1967). Into this category 

come the revised and the dual simplex methods, the multiphase and duplex methods, and 

decomposition algorithms. An unpleasant property of linear programs is that sometimes 

just small changes of the coe cients in the objective function or the constraints can cause 

a big alteration in the solution. To reveal such dependencies, methods of parametric linear 

programming and sensitivity analysis have been developed (Dinkelbach, 1969). 

Most strategies of non-linear programming resemble the simplex method or use it 

as a subprogram (Abadie, 1972). This is the case in particular for the techniques of 

quadratic programming, which are conceived for quadratic objective functions and linear 

constraints. The theory of non-linear programming is based on the optimality conditions 

developed by Kuhn and Tucker (1951), an extension of the theory of maxima and minima 

to problems with constraints in the form of inequalities. These can be expressed 

geometrically as follows: at the optimum (in a corner of the allowed region) the gradient 

of the objective function lies within the cone formed by the gradients of the active 

constraints. To start with, this is only a necessary condition. It becomes su cient under 

certain assumptions concerning the structure of the objective and constraint functions. 

For minimum problems, the objective function and the feasible region must be convex, 

that is the constraints must be concave. Such a problem is also called a convex program. 

Finally the Kuhn-Tucker theorem transforms a convex program into an equivalent saddle 

point problem (Arrow and Hurwicz, 1956), just as the Lagrange multiplier method does 

for constraints in the form of equalities. A complete theory of equality constraints is due 

to Apostol (1957). 

Non-linear programming is therefore only applicable to convex optimization, in which, 

to be precise, one must distinguish at least seven types of convexity (Ponstein, 1967). In 

addition, all the functions are usually required to be continuously di erentiable, with an


analytic speci cation of their partial derivatives. There is an extensive literature on this 

subject, of whichthebooksby Arrow, Hurwicz, and Uzawa (1958), Zoutendijk (1960), Vajda 

(1961), Kunzi, Krelle, and Oettli (1962), Kunzi, Tzschach, and Zehnder (1966, 1970), 

Kunzi and Krelle (1969), Zangwill (1969), Suchowitzki and Awdejewa (1969), Mangasarian 

(1969), Stoer and Witzgall (1970), Whittle (1971), Luenberger (1973), and Varga 

(1974) are but a small sample. Kappler (1967) considers some of the procedures from the 

point of view of gradient methods. Kunzi and Oettli (1969) give a survey of the more 

extended procedures together with an extensive bibliography. FORTRAN programs are 

to be found in McMillan (1970), Kuester and Mize (1973), and Land and Powell (1973). 

Of special importance in control theory are optimization problems in which the constraints 

are partly speci ed as di erential equations. They are also called non-holonomous 

constraints. Pontrjagin et al. (1967) have given necessary conditions for the existence of 

optima in these problems. Their trick was to distinguish between the free control functions 

to be determined and the local or state functions which are bound by constraints. 

Although the theory has given a strong foothold to the analytic treatment ofoptimal 

control processes, it must be regarded as a case of good luck if a practical problem can 

be made to yield an exact solution in this way. One must usually resort in the end to 

numerical approximation methods in order to obtain the desired optimum (e.g., Balakrishnan 

and Neustadt, 1964, 1967 Rosen, 1966 Leitmann, 1967 Kopp, 1967 Mufti, 1970 

Tabak, 1970 Canon, Cullum, and Polak, 1970 Tolle, 1971 Unbehauen, 1971 Boltjanski, 

1972 Luenberger, 1972 Polak, 1973). 

2.3 Other Special Cases 

According to the typeofvariables there are still other special areas of mathematical 

optimization. In parameter optimization for example the variables can sometimes be 

restricted to discrete or integer values. The extreme case is if a parameter may only 

take two distinct values, zero and unity. Mixed variable types can also appear in the 

same problem hence the terms discrete, integer, binary (or zero-one), and mixed-integer 

programming. Most of the solution procedures that have been worked out deal with linear 

integer problems (e.g., those proposed by Gomory, Balas, and Beale). An important 

class of methods, the branch and bound methods, is described for example by Weinberg 

(1968). They are classed together with dynamic programming as decision tree strategies. 

For the general non-linear case, a last resort can be to try out all possibilities. This 

kind of optimization is referred to as complete enumeration. Since the cost of such a 

procedure is usually prohibitive, heuristic approaches are also tried, with which usable, 

not necessarily optimal, solutions can be found (Weinberg and Zehnder, 1969). More 

clever ways of proceeding in special cases, for example by applying non-integer techniques 

of linear and non-linear programming, can be found in Korbut and Finkelstein (1971), 

Greenberg (1971), Plane and McMillan (1971), Burkard (1972), Hu (1972), and Gar nkel 

and Nemhauser (1972, 1973). 

By stochastic programming is meant the solution of problems with objective functions, 

and sometimes also constraints, that are subject to statistical perturbations (Faber, 1970). 

It is simplest if such problems can be reduced to deterministic ones, for example byworking

Other Special Cases 19 

with expectation values. However, there are some problems in which the probability 

distributions signi cantly in uence the optimal solution. Operational methods at rst 

only existed for special cases such as, for example, warehouse problems (Beckmann, 1971). 

Their numbers as well as the elds of application are growing steadily (Hammer, 1984 

Ermoliev and Wets, 1988 Ermakov, 1992). In general, one has to make a clear distinction 

between deterministic solution methods for more or less noisy or stochastic situations 

and stochastic methods for deterministic but di cult situations like multimodal or fractal 

topologies. Here we refer to the former in Chapter 4 we will do so for the latter, especially 

under the aspect of global optimization. 

In a rather new branch within the mathematical programming eld, called non-smooth 

or non-di erentiable optimization, more or less classical gradient-type methods for nding 

solutions still persist (e.g., Balinski and Wolfe, 1975 Lemarechal and Mi in, 1978 

Nurminski, 1982 Kiwiel, 1985). 

For successively approaching the zero or extremum of a function if the measured values 

are subject to uncertainties, a familiar strategy is that of stochastic approximation (Wasan, 

1969). The original concept is due to Robbins and Monro (1951). Kiefer and Wolfowitz 

(1952) have adapted it for problems in which the maximum of a unimodal regression 

function is sought. Blum (1954a) has proved that the method is certain to converge. It 

distinguishes between test or trial steps and work steps. With one variable, starting at the 

point x (k) , the value of the objective function is obtained at the two positions x (k) c (k) . 

The slope is then calculated as 

y (k) = F (x(k) + c (k) ) ; F (x (k) ; c (k) ) 

2c (k) 

Awork step follows from the recursion formula (for minimum searches) 

x (k+1) = x (k) ; 2a (k) y (k) 

The choice of the positive sequences c (k) and a (k) is important for convergence of the 

process. These should satisfy the relations 

1X 

k=1 

1X 

k=1 

One chooses for example the sequences 

lim 

k!1 c(k) ! 0 

1X 

k=1 

a (k) = 1 

a (k) c (k) < 1 

(k) a 

c (k) 

! 2 

< 1 

a (k) = a(0) 

k a(0) > 0 

c (k) = c(0) 

4p k c (0) > 0 k>0


This means that the work step length goes to zero very much faster than the test step 

length, in order to compensate for the growing in uence of the perturbations. 

Blum (1954b) and Dvoretzky (1956) describe how to apply this process to multidimensional 

problems. The increment in the objective function, hence an approximation to 

the gradient vector, is obtained from n +1 observations. Sacks (1958) uses 2 n trial steps. 

The stochastic approximation can thus be regarded, in a sense, as a particular gradient 

method. 

Yet other basic strategies have been proposed these adopt only the choice of step 

lengths from the stochastic approximation, while the directions are governed by other 

criteria. Thomas and Wilde (1964) for example, combine the stochastic approximation 

with the relaxation method of Southwell (1940, 1946). Kushner (1963) and Schmitt (1969) 

even take random directions into consideration. All the proofs of convergence of the 

stochastic approximation assume unimodal objective functions. A further disadvantage 

is that stability against perturbations is bought atavery high cost, especially if the 

number of variables is large. How many steps are required to achieve a given accuracy 

can only be stated if the probability density distribution of the stochastic perturbations 

is known. Many authors have tried to devise methods in which the basic procedure can 

be accelerated: e.g., Kesten (1958), who only reduces the step lengths after a change 

in direction of the search, or Odell (1961), who makes the lengths of the work steps 

dependent on measured values of the objective function. Other attempts are directed 

towards reducing the e ect of the perturbations (Venter, 1967 Fabian, 1967), for example 

by making only the direction and not the size of the gradients determine the step lengths. 

Bertram (1960) describes various examples of applications. More of such work is that of 

Krasulina (1972) and Engelhardt (1973). 

In this introduction many classes of possible or practically occurring optimization 

problems and methods have been sketched brie y, but the coverage is far from complete. 

No mention has been made, for example, of broken rational programming, norofgraphical 

methods of solution. In operations research especially (Henn and Kunzi, 1968) there are 

many special techniques for solving transport, allocation, routing, queuing, andwarehouse 

problems, such as network planning and other graph theoretical methods. This excursion 

into the vast realm of optimization problems was undertaken because some of the algorithms 

to be studied in more depth in what follows, especially the random methods of 

Chapter 4, owe their origin and nomenclature to other elds. It should also be seen to 

what extent methods of direct parameter optimization permeate the other branches of 

the subject, and how they are related to each other. An overall scheme of how thevarious 

branches are interrelated can be found in Saaty (1970). 

If there are two ormoreobjectives at the same time and occasion, and especially 

if these are not con ict-free, single solution points in the decision variable space can 

no longer give the full answer to an optimization question, not even in the otherwise 

simplest situation. How to look for the whole subset of e cient, non-dominated, or Paretooptimal 

solutions can be found under keywords like vector optimization, polyoptimization 

or multiple criteria decision making (MCDM) (e.g., Bell, Keeney, and Rai a, 1977 Hwang 

and Masud, 1979 Peschel, 1980 Grauer, Lewandowski, and Wierzbicki, 1982 Steuer, 

1986). Game theory comes into play when several decision makers have access to di erent

Other Special Cases 21 

parts of the decision variable set only (e.g., Luce and Rai a, 1957 Maynard Smith, 1982 

Axelrod, 1984 Sigmund, 1993). No consideration is given here to these special elds.

22 Problems and Methods of Optimization

Chapter 3 

Hill climbing Strategies 

In this chapter some of the direct, mathematical parameter optimization methods will 

be treated in more detail for static, non-discrete, non-stochastic, mostly unconstrained 

functions. They come under the general heading of hill climbing strategies because their 

manner of searching for a maximum corresponds closely to the intuitive way a sightless 

climber might feel his way from a valley up to the highest peak of a mountain. For 

minimum problems the sense of the displacements is simply reversed, otherwise uphill or 

ascent and downhill or descent methods (Bach, 1969) are identical. Whereas methods of 

mathematical programming are dominant in operations research and the special methods 

of functional optimization in control theory, the hill climbing strategies are most frequently 

applied in engineering design. Analytic methods often prove unsuitable in this eld 

Because the assumptions are not satis ed under which necessary conditions for 

extrema can be stated (e.g., continuity of the objective function and its derivatives) 

Because there are di culties in carrying out the necessary di erentiations 

Because a solution of the equations describing the conditions does not always lead 

to the desired optimum (it can be a local minimum, maximum, or saddle point) 

Because the equations describing the conditions, in general a system of simultaneous 

non-linear equations, are not immediately soluble 

To what extent hill climbing strategies take care of these particular characteristics 

depends on the individual method. Very thorough presentations covering some topics can 

be found in Wilde (1964), Rosenbrock and Storey (1966), Wilde and Beightler (1967), 

Kowalik and Osborne (1968), Box, Davies, and Swann (1969), Pierre (1969), Pun (1969), 

Converse (1970), Cooper and Steinberg (1970), Ho mann and Hofmann (1970), Beveridge 

and Schechter (1970), Aoki (1971), Zahradnik (1971), Fox (1971), Cea (1971), Daniel 

(1971), Himmelblau (1972b), Dixon (1972a), Jacoby, Kowalik and Pizzo (1972), Stark and 

Nicholls (1972), Brent (1973), Gottfried and Weisman (1973), Vanderplaats (1984), and 

Papageorgiou (1991). More variations or theoretical and numerical studies of older methods 

can be found as individual publications in a wide variety of journals, or in the volumes 

of collected articles such asGraves and Wolfe (1963), Blakemore and Davis (1964), Lavi 

23

24 Hill climbing Strategies 

and Vogl (1966), Klerer and Korn (1967), Abadie (1967, 1970), Fletcher (1969a), Rosen, 

Mangasarian, and Ritter (1970), Geo rion (1972), Murray (1972a), Lootsma (1972a), 

Szego (1972), and Sebastian and Tammer (1990). 

Formulated as a minimum problem without constraints, the task can be stated as 

follows: 

minfF 

(x) j x 2 IR 

x n g (3.1) 

The column vector x (at the extreme position) is required 

x = 

2 

6 

4 

x 1 

x 2 

. 

xn 3 

7 

5 =(x 1 x 2 :::x n )T 

and the associated extreme value F = F (x ) of the objective function F (x), in this case 

the minimum. The expression x 2 IR n means that the variables are allowed to take all 

real values x can thus be represented by any point inann-dimensional Euclidean space 

IR n . Di erent types of minima are distinguished: strong and weak, local and global. 

For a local minimum the following relationship holds: 

for 

and 

0 kx ; x k = 

F (x ) F (x) (3.2) 

vu 

u 

t nX 

i=1 

x 2 IR n 

(xi ; x i ) 2 " 

This means that in the neighborhood of x de ned by the size of " there is no vector 

x for which F (x) is smaller than F (x ). If the equality sign in Equation (3.2) only 

applies when x = x , the minimum is called strong, otherwise it is weak. An objective 

function that only displays one minimum (or maximum) is referred to as unimodal. In 

many cases, however, F (x) has several local minima (and maxima), which maybeof 

di erent heights. The smallest, absolute or global minimum (minimum minimorum) ofa 

multimodal objective function satis es the stronger condition 

F (x ) F (x) for all x 2 IR n 

This is always the preferred object of the search. 

If there are also constraints, in the form of inequalities 

or equalities 

(3.3) 

Gj(x) 0 for all j = 1(1)m (3.4) 

Hk(x) =0 for all k = 1(1)` (3.5)

One Dimensional Strategies 25 

then IR n in Equations (3.1) to (3.3) must either be replaced by the hopefully non-empty 

subset M 2 IR n to represent the feasible region in IR n de ned by Equation (3.4), or by 

IR n;` , the subspace of lower dimensionality spanned by the variables that now depend 

on each other according to Equation (3.5). If solutions at in nity are excluded, then 

the theorem of Weierstrass holds (see for example Rothe, 1959): \In a closed compact 

region a x b every function which iscontinuous there has at least one (i.e., an 

absolute) minimum and maximum." This can lie inside or on the boundary. In the case 

of discontinuous functions, every point of discontinuity is also a potential candidate for 

the position of an extremum. 

3.1 One Dimensional Strategies 

The search for a minimum is especially easy if the objective function only depends on one 

variable. 

F(x) 

a 

b 

c d e f g h 

Figure 3.1: Special points of a function of one variable 

a: local maximum at the boundary 

b: local minimum at a point of discontinuity ofFx(x) 

c: saddle point, or point of in ection 

d-e: weak local maximum 

f: local minimum 

g: maximum (may be global) at a point ofdiscontinuity ofF (x) 

h: global minimum at the boundary 

x


This problem would be of little practical interest, however, were it not for the fact that 

many of the multidimensional strategies make use of one dimensional minimizations in 

selected directions, referred to as line searches. Figure 3.1 shows some possible ways 

minima and other special points can arise in the one dimensional case. 

3.1.1 Simultaneous Methods 

One possible way of discovering the minimum of a function with one parameter is to 

determine the value of the objective function at a number of points and then to declare 

the point with the smallest value the minimum. Since in principle all trials can be carried 

out at the same time, this procedure is referred to as simultaneous optimization. How 

closely the true minimum is approached depends on the choice of the number and location 

of the trial points. The more trials are made, the more accurate the solution can be. One 

will be concerned, however, to obtain a result at the lowest cost in time and computation 

(or material). The two requirements of high accuracy and lowest cost are contradictory 

thus an optimum compromise must be sought. 

The e ectiveness of a search method is judged by the size of the largest remaining 

interval of uncertainty (in the least favorable case) relative to the position of the minimum 

for a given number of trials (the so-called minimax concept, see Wilde, 1964 Beamer and 

Wilde, 1973). Assuming that the points in the series of trials are so densely distributed 

that several at a time are in the neighborhood of a local minimum, then the length of 

the interval of uncertainty is the same as the distance between the two points in the 

neighborhood of the smallest value of F (x). The number of necessary trials can thus 

get very large unless one has at least some idea of whereabouts the desired minimum 

is situated. In practice one must limit investigation of the objective function to a nite 

interval [a b]. It is obvious, and it can be proved theoretically, that the optimal choice of 

all simultaneous search methods is the one in which the trial points are evenly distributed 

over the interval [a b] (Boas, 1962, 1963a{d). 

If N equidistant points are used, the interval of uncertainty is of length 

and the e ectiveness takes the value 

`N = 2 

(b ; a) 

N +1 

= 2 

N +1 

Put another way: To be sure of achieving an accuracy of " >0, the equidistant search 

(also called lattice, grid, or tabulation method ) requires N trials, where 

2(b ; a) 

" 

; 1


Even more e ectivesearchschemes can be devised if the objective function is unimodal 

in the interval [a b]. Wilde and Beightler (1967) describe a procedure, using evenly 

distributed pairs of points, which is also referred to as a simultaneous dichotomous search. 

The distance between two points of a pair must be chosen to be su ciently large that 

their objective function values are di erent. As ! 0 the dichotomous search with an 

even number of trials (even block search) is the best. The number of trials required is 

2(b ; a) 

" 

; 2


of favorable conditions for the next trial presupposes a more or less precise internal model 

of the objective function the better the model corresponds to reality, the better will be 

the results of the interpolation and extrapolation processes. The simplest assumption 

is that the objective function is unimodal, which means that local minima also always 

represent global minima. On this basis a number of sequential interval-dividing procedures 

have been constructed (Sect. 3.1.2.2). Iterativeinterpolation methods demand more 

\smoothness" of the objective function (Sect. 3.1.2.3). In the former case it is necessary, 

in the latter useful, to determine at the outset a suitable interval, [a (0) b (0) ], in which the 

desired extremum lies (Sect. 3.1.2.1). 

3.1.2.1 Boxing in the Minimum 

If there are no clues as to whereabouts the desired minimum might be situated, one can 

start with two points x (0) and x (1) = x (0) + s and determine the objective function there. 

If F (x (1) ) F(x (k) ) 

If, however, F (x (1) ) >F(x (0) ), one chooses the opposite direction: 


x (2) = x (0) ; s 

x (k+1) = x (k) ; s for k 2 

similarly, until a step past the minimum is taken, one has thus determined the minimum 

of the unimodal function to within an uncertainty interval of length 2 s (Beveridge and 

Schechter, 1970). 

In numerical optimization problems the values of the variables often run through 

several powers of 10, or alternatively they must be precisely determined at many points. 

In this case the boxing-in method with a very small xed step length is too costly. Box, 

Davies, and Swann (1969) therefore suggest starting with an initial step length s (0) and 

doubling it at each successful step. Their recursion formula is as follows: 

x (k+1) = x (0) +2 k s (0) 

It is applied as long as F (x (k+1) ) F (x (k) ) holds. As soon as F (x (k+1) ) > F(x (k) ) 

is registered, however, b (0) = x (k+1) is set as the upper bound to the interval and the 

starting point x (0) is returned to. The lower bound a (0) is found by a corresponding 

process with negative step lengths going in the opposite direction. In this way a starting 

interval [a (0) b (0) ] is obtained for the one dimensional search procedure to be described 

below. It can happen, because of the convention for equality oftwo function values, that 

the search for a bound to the interval does not end if the objective function reaches a 

constant horizontal level. It is therefore useful to specify a maximum step length that 

may not be exceeded.


The boxing-in method has also been proposed occasionally as a one dimensional optimization 

strategy (Rosenbrock, 1960 Berman, 1966) in its own right. In order not to 

waste too many trials far from the target when the accuracy requirement isvery high, it 

is useful to start with relatively large steps. Each time a loop ends with a failure the step 

length is reduced by a factor less than 0:5, e.g., 0:25. If the above rules for increasing 

and reducing the step lengths are combined, a very exible procedure is obtained. Dixon 

(1972a) calls it the success/failure routine. If a starting interval [a (0) b (0) ] is already at 

hand, however, there are signi cantly better strategies for successively reducing the size 

of the interval. 

3.1.2.2 Interval Division Methods 

If an equidistant division method is applied repeatedly, the interval of uncertainty is 

reduced at each stepby the same factor ,andthus for k steps by k . This exponential 

progression is considerably stronger than the linear dependence of the value of on the 

number of trials per step. Thus as few simultaneous trials as possible would be used. A 

comparison of two schemes, with two and three simultaneous trials, shows that except in 

the rst loop, only two new objective function values must be obtained at a time in both 

cases, since of three trial points in one step, one coincides with a point of the previous 

step. The total number of trials required with sequential application of the equidistant 

three point scheme is 

1+ 

2 log b;a 

" 

log 2 


fk = fk;1 + fk;2 for k 2 

An initial interval [a (0) b (0) ] is required, containing the extremum together with a number 

N, which represents the total number of intended interval divisions. If the general interval 

is called [a (k) b (k) ], the lengths 

s (k) = t (k) (b (k) ; a (k) )=(b (k+1) ; a (k+1) ) 

are subtracted from its ends, with the reduction factor 

giving 

t (k) = fN;k;1 

fN;k 

c (k) = a (k) + s (k) 

d (k) = b (k) ; s (k) 

(3.10) 

The values of the objective function at c (k) and d (k) are compared and whichever subinterval 

contains the better (in a minimum search, lower) value is taken as de ning the 

interval for the next step. 

If 

F (d (k) ) F(c (k) ) 

a (k+1) = d (k) 

b (k+1) = b (k) 

A consequence of the Fibonacci series is that, except for the rst interval division, at 

all of the following steps one of the two newpoints c (k+1) and d (k+1) is always already 

known. 

If 

F (d (k) ) F(c (k) ) 

d (k+1) = c (k)


F(x) 

a 

a 

a d 

d 

c 

d 

c 

b 

Figure 3.2: Interval division in the Fibonacci search 

c 

b 

b 

x 

Step k 

Step k+1 

Step k+2 

so that each time only one new value of the objective function needs to be obtained. 

Figure 3.2 illustrates two steps of the procedure. The process is continued until k = 

N ; 2. At the next division, because f 2 = 2 f 1d (k) and c (k) coincide. A further 

interval reduction can only be achieved by slightly displacing one of the test points. The 

displacement must be at least big enough for the two objective function values to still 

be distinguishable. Then the remaining interval after N trials is of length 

`N = 1 

fN 

(b (0) ; a (0) )+ 

As ! 0 the e ectiveness tends to f ;1 

N . Johnson (1956) and Kiefer (1957) show 

that this value is optimal in the sense of the -minimax concept, according to which the 

Fibonacci search is the best of all sequential interval division procedures. However, by 

taking account ofthe displacement, not only at the last but at all the steps, Oliver and 

Wilde (1964) give a recursion formula that for the same number of trials yields a slightly 

smaller residual interval. Avriel and Wilde (1966a) provide a proof of optimality. Ifone 

has a priori information about the structure of the objective function it can be exploited 

to advantage (Gal, 1971) in order to reduce further the number of trials. Overholt (1967a, 

1973) suggests that in general there is no a priori information available to x suitably, 

and it is therefore better to omit the nal division using a displacement rule and to choose 

N one bigger from the start. In order to obtain the minimum with accuracy ">0one


should choose N such that 

fN > b(0) ; a (0) 

" 

Then the e ectiveness of the procedure becomes 

and since (Lucas, 1876) 

fN = 1 p 5 

2 

4 

p !N +1 

1+ 5 

; 

2 

the number of trials is approximately 

= 2 

fN +1 

N ' log b(0) ;a (0) 

" +log p 5 

log 1+ 

p 

5 

2 

fN;1 

p !N +1 

1 ; 5 

2 

3 

5 ' 1 p 

5 

log b(0) ; a (0) 

" 

p !N +1 

1+ 5 

2 

(3.11) 

Overholt (1965) shows by means of numerical tests that the procedure must often be 

terminated prematurely as F (d (k+1) ) becomes equal to F (c (k+1) ), for example because of 

computing with a nite number of signi cant gures. Further divisions of the interval of 

uncertainty are then pointless. 

For the boxing-in method of determining the initial interval one would x an initial 

step length of about 10 " and a maximum step length of about 5 10 9 ", so that for a 36-bit 

computer the number range of integers is not exceeded by the largest required Fibonacci 

number. Finally, two further applications of the Fibonacci procedure may bementioned. 

By reversing the scheme, Wilde and Beightler (1967) obtain a method of boxing in the 

minimum. Kiefer (1957) shows how to proceed if values of the objective function can 

only be obtained at discrete, not necessarily equidistant, points. More about such lattice 

search problems can be found in Wilde (1964), and Beveridge and Schechter (1970). 

3.1.2.2.2 The Golden Section. It can sometimes be inconvenienttohave to specify 

in advance the number of interval divisions. In this case Kiefer (1953) and Johnson (1956) 

propose, instead of the reduction factor t (k) , which varies with the iteration number in 

the Fibonacci search, a constant factor 

t = 

2 

1+ p 5 ' 0:618 (positive rootof:t2 + t =1) (3.12) 

For large N ; k t (k) reduces to t. In addition, t is identical to the ratio of lengths a 

to b, which is obtained by dividing a total length of a + b into two pieces such that the 

smaller, a, has the same ratio to the larger, b, as the larger to the total. This harmonic 

division (after Euclid) is also known as the golden section, whichgave the procedure its 

name (Wilde, 1964). After N function calls the uncertainty interval is of length 

`N = t N;1 (b (0) ; a (0) )


For the limiting case N !1, since 

lim 

N!1 (tN;1 fN) =1:17 

the number of trials compared to the Fibonacci procedure is about 17% higher. Compared 

to the Fibonacci search without displacement, since 

lim 

N!1 

1 

2 tN;1 fN +1 

' 0:95 

the number of trials is about 5% lower. It should further be noted that, when using the 

Fibonacci method on digital computers, the Fibonacci numbers must rst be generated, 

or a su cient number of them must be provided and stored. The number of trials needed 

for a sequential golden section is 

N = 

2 

6 

log b(0) ;a (0) 

" 

log t 

3 

7 ; 1 log b(0) ; a (0) 

" 

(3.13) 

Other properties of the iteration sequence, including the criterion for termination at 

equal function values, are the same as those of the method of interval division according 

to Fibonacci numbers. Further details can be found, for example, in Avriel and Wilde 

(1968). Complete programs for the interval division procedures have been published by 

Pike and Pixner (1965), and Overholt (1967b,c) (see also Boothroyd, 1965 Pike, Hill, and 

James, 1967 Overholt, 1967a). 

3.1.2.3 Interpolation Methods 

In many cases one is dealing with a continuous function, the minimum of which isto 

be determined. If, in addition to the value of the objective function, its slope can be 

speci ed everywhere, many methods can be derived that may converge faster than the 

optimal elimination methods. One of the oldest schemes is based on the procedure named 

after Bolzano for determining the zeros of a function. Assuming that one has two points 

at which the slopes of the objective function have opposite signs, one bisects the interval 

between them and determines the slope at the midpoint. This replaces the interval end 

point, which has a slope of the same sign. The procedure can then be repeated iteratively. 

At each trial the interval is halved. If the slope has to be calculated from the di erence of 

two objective function values, the bisection or midpoint strategy becomes the sequential 

dichotomous search. Avriel and Wilde (1966b) propose, as a variant of the Bolzano search, 

evaluating the slope at two points in the interval so as to increase the reduction factor. 

They show thattheirdiblock strategy is slightly superior to the dichotomous search. 

If derivatives of the objective function are available, or at least if it can be assumed that 

these exist, i.e., the function F (x)iscontinuous and di erentiable, far better strategies for 

the minimum search can be devised. They determine analytically the minimum of a trial 

function that coincides with the objective function, and possibly also its derivatives, at 

selected argumentvalues. One distinguishes linear, quadratic, and cubic models according 

to the order of the trial polynomial. Polynomials of higher order are virtually never used.


They require too much information about the function F (x). Furthermore, it turns out 

that in contrast to all the methods referred to so far such strategies do not always converge, 

for reasons other than rounding error. 

3.1.2.3.1 Regula Falsi Iteration. Given two points a (k) and b (k) , with their function 

values F (a (k) )andF (b (k) ), the simplest approximation formula for a zero c (k) of F (x) is 

c (k) = a (k) ; F (a (k) ) 

b (k) ; a (k) 

F (b (k) ) ; F (a (k) ) 

This technique, known as regula falsi or regula falsorum, predicts the position of the zero 

correctly if F (x) depends linearly on x. For one dimensional minimization it can be 

applied to nd a zero of Fx(x) =dF (x)=dx: 

c (k) = a (k) ; Fx(a (k) ) 

b (k) ; a (k) 

Fx(b (k) ) ; Fx(a (k) ) 

(3.14) 

The underlying model here is a second order polynomial with linear slope. If Fx(a (k) ) 

and Fx(b (k) )have opposite sign, c (k) lies between a (k) and b (k) . If Fx(c (k) ) 6= 0, the 

procedure can be continued iteratively by using the reduced interval [a (k+1) b (k+1) ]= 

[a (k) c (k) ]ifFx(c (k) )andFx(b (k) )have the same sign, or using [a (k+1) b (k+1) ]=[c (k) b (k) ]if 

Fx(c (k) ) and Fx(a (k) )have the same sign. If Fx(a (k) )andFx(b (k) )have the same sign, c (k) 

must lie outside [a (k) b (k) ]. If Fx(c (k) ) has the same sign again, c (k) replaces the argument 

value at which jFxj is greatest. This extrapolation is also called the secant method. If 

Fx(c (k) ) has the opposite sign, one can continue using regula falsi to interpolate iteratively. 

As a termination criterion one can apply Fx(c (k) ) = 0 or jFx(c (k) )j ", " > 0. A 

minimum can only be found reliably in this way if the starting point of the search lies in 

its neighborhood. Otherwise the iteration sequence can also converge to a maximum, at 

which, of course, the slope also goes to zero if Fx(x) iscontinuous. 

Whereas in the Bolzano interval bisection method only the sign of the function whose 

zero is sought needs to be known at the argument values, the regula falsi method also 

makes use of the magnitude of the function. This extra information should enable it to 

converge more rapidly. As Ostrowski (1966) and Jarratt (1967, 1968) show, for example, 

this is only the case if the function corresponds closely enough to the assumed model. 

The simpler bisection method is better, even optimal (as a zero method in the minimax 

sense), if the function has opposite signs at the two starting points, is not linear and not 

convex. In this case the linear interpolation sometimesconverges very slowly. According to 

Stanton (1969), a cubic interpolation as a line search in the eccentric quadratic case often 

yields even worse results. Dixon (1972a) names two variants of the regula falsi recursion 

formula, but it is not known whether they lead to better convergence. Fox (1971) proposes 

a combination of the Bolzano method with the linear interpolation. Dekker (1969) (see 

also Forsythe, 1969) accredits this procedure with better than linear convergence. Even 

greater reliability and speed is attributed to the algorithm of Brent (1971), which follows 

Dekker's method by a quadratic interpolation process as soon as the latter promises to 

be successful.


It is inconvenient when dealing with minimization problems that the derivatives of 

the function are required. If the slopes are obtained from function values by a di erence 

method, di culties can arise from the nite accuracy of such a process. For this reason 

Brent (1973) combines regula falsi iteration with division according to the golden section. 

Further variations can be found in Schmidt and Trinkaus (1966), Dowell and Jarratt 

(1972), King (1973), and Anderson and Bjorck (1973). 

3.1.2.3.2 Newton-Raphson Iteration. Newton's interpolation formula for improving 

an approximate solution x (k) to the equation F (x) = 0 (see for example Madsen, 

1973) 

x (k+1) = x (k) ; F (x(k) ) 

Fx(x (k) ) 

uses only one argument value, but requires the value of the derivative of the function 

as well as the function itself. If F (x) is linear in x, the zero is correctly predicted here, 

otherwise an improved approximation is obtained at best, and the process must be repeated. 

Like regula falsi, Newton's recursion formula can also be applied to determining 

Fx(x) = 0, with of course the reservations already stated. The so-called Newton-Raphson 

rule is then 

x (k+1) = x (k) ; Fx(x (k) ) 

Fxx(x (k) ) 

(3.15) 

If F (x) is not quadratic, the necessary number of iterations must be made until a termination 

criterion is satis ed. Dixon (1972a) for example uses the condition jx (k+1) ; x (k) j


parabola as the model function (quadratic interpolation). Assuming that the three points 

are a (k) and F (c (k) ), 

the trial parabola P (x) hasavanishing rst derivative atthepoint 

d (k) = 1 

2 

[(b (k) ) 2 ; (c (k) ) 2 ] F (a (k) )+[(c (k) ) 2 ; (a (k) ) 2 ] F (b (k) )+[(a (k) ) 2 ; (b (k) ) 2 ] F (c (k) ) 

[b (k) ; c (k) ] F (a (k) )+[c (k) ; a (k) ] F (b (k) )+[a (k) ; b (k) ] F (c (k) ) 

(3.16) 

This point is a minimum only if the denominator is positive. Otherwise d (k) represents 

a maximum or a saddle point. In the case of a minimum, d (k) is introduced as a new 

argument value and one of the old ones is deleted: 

a (k+1) = a (k) 

b (k+1) = d (k) 

c (k+1) = b (k) 

a (k+1) = d (k) 

b (k+1) = b (k) 

c (k+1) = c (k) 

a (k+1) = b (k) 

b (k+1) = d (k) 

c (k+1) = c (k) 

a (k+1) = a (k) 

b (k+1) = b (k) 

c (k+1) = d (k) 

9 

>= 

> 

9 

>= 

> 

9 

>= 

> 

9 

>= 

> 

( if a (k)


F 

P 

F(x) 

a (k) 

Minimum P(x) 

P(x) 

d (k) b (k) 

(k+1) (k+1) (k+1) 

a b c 

Figure 3.3: Lagrangian quadratic interpolation 

the objective function and the trial function. In the most favorable case the objective 

function is also quadratic. Then one iteration is su cient. This is why it can be advantageous 

to use an interpolation method rather than an interval division method such as 

the optimal Fibonacci search. Dijkhuis (1971) describes a variant of the basic procedure 

in which four argument values are taken. The two inner ones and each of the outer ones 

in turn are used for two separate quadratic interpolations. The weighted mean of the two 

results yields a new iteration point. This procedure is claimed to increase the reliability 

of the minimum search for non-quadratic objective functions. 

3.1.2.3.4 Hermitian Interpolation. If one chooses, instead of a parabola, a third 

order polynomial as a test function, more information is needed to make it agree with the 

objective function. Beveridge and Schechter (1970) describe such acubic interpolation 

procedure. In place of four argument values and associated objective function values, two 

points a (k) and b (k) are enough, if, in addition to the values of the objective function, values 

of its slope, i.e., the rst order di erentials, are available. This Hermitian interpolation is 

mainly used in conjunction with gradient or quasi-Newton methods, because in any case 

they require the partial derivatives of the objective function, or they approximate them 

using nite di erence methods. 

The interpolation formula is: 

c (k) = a (k) +(b (k) ; a (k) ) 

w ; Fx(a (k) ) ; z 

2 w + Fx(b (k) ) ; Fx(a (k) ) 

c (k) 

x


where 


z = 3[F (a(k) ) ; F (b (k) )] 

(a (k) ; b (k) ) 

q 

2 w =+ z ; Fx(a (k) ) Fx(b (k) ) 

; Fx(a (k) ) ; Fx(b (k) ) (3.18) 

Recursive exchange of the argumentvalues takes place according to the sign of Fx(c (k) ) 

in a similar way to the Bolzano method. It should also be veri ed here that a (k) and b (k) 

always bound the minimum. Fletcher and Reeves (1964) use Hermitian interpolation in 

their conjugate gradient method as a subroutine to approximate a relative minimum in 

speci ed directions. They terminate the iteration as soon as ja (k) ; b (k) j

Multidimensional Strategies 39 

the second variable. Both end results are then used to reject one of the values of the 

rst variable that were held constant, and to reduce the size of the interval with respect 

to this parameter. By analogy, a three dimensional minimization consists of a recursive 

sequence of two dimensional Fibonacci searches. If the number of function calls to reduce 

the uncertainty interval [aibi] su ciently with respect to the variable xi is Ni, then the 

total number N also obeys Equation (3.19). The advantage compared to the grid method 

is simply that Ni depends logarithmically on the ratio of initial interval size to accuracy 

(see Equation (3.11)). Aside from the fact that each variable must be suitably xed in 

advance, and that the unimodality requirement of the objective function only guarantees 

that local minima are approached, there is furthermore no guarantee that a desired 

accuracy will be reached within a nite number of objective function calls (Kaupe, 1964). 

Other elimination procedures have been extended in a similar way tothemultivariable 

case, such as, for example, the dichotomous search (Wilde, 1965) and a sequential boxingin 

method (Berman, 1969). In each case the e ort rises exponentially with the number 

of variables. Another elimination concept for the multidimensional case, the method of 

contour tangents, is due to Wilde (1963) (see also Beamer and Wilde, 1969). It requires, 

however, the determination of gradient vectors. Newman (1965) indicates how to proceed 

in the two dimensional case, and also for discrete values of the variables (lattice search). 

He requires that F (x) beconvex and unimodal. Then the cost should only increase 

linearly with the number of variables. For n 3, however, no applications of the contour 

tangent method are as yet known. 

Transferring interpolation methods to the n-dimensional case means transforming the 

original minimum problem into a series of problems, in the form of a set of equations to 

be solved. As non-linear equations can only be solved iteratively, this procedure is limited 

to the special case of linear interpolation with quadratic objective functions. Practical 

algorithms based on the regula falsi iteration can be found in Schmidt and Schwetlick 

(1968) and Schwetlick (1970). The procedure is not widely used as a minimization method 

(Schmidt and Vetters, 1970). The slopes of the objective function that it requires are 

implicitly calculated from function values. The secant method described by Wolfe (1959b) 

for solving a system of non-linear equations also works without derivatives of the functions. 

From n +1 current argument values, it extracts the required information about the 

structure of the n equations. 

Just as the transition from simultaneous to sequential one dimensional search methods 

reduces the e ort required at the expense of global convergence, so each further 

acceleration in the multidimensional case is bought by a reduction in reliability. High 

convergence rates are achieved by gathering more information and interpreting it in the 

form of a model of the objective function. If assumptions and reality agree, then this 

procedure is successful if they do not agree, then extrapolations lead to worse predictions 

and possibly even to abandoning an optimization strategy. Figure 3.4 shows the contour 

diagram of a smooth two parameter objective function. 

All the strategies to be described assume a degree of smoothness in the objective 

function. They do not converge with certainty to the global minimum but at best to one 

of the local minima, or sometimes only to a saddle point.


x 

2 

c 

a 

d 

b 

c 

e 

x 

1 

a: Global minimum 

b: Local minimum 

c: Local maxima 

d,e: Saddle points 

Figure 3.4: Contour lines of a two parameter function F (x1x2) 

Various methods are distinguished according to the kind of information they need, 

namely: 

Direct search methods, which only need objective function values F (x) 

Gradient methods, which also use the rst partial derivatives rF (x) ( rst order 

strategies) 

Newton methods, which in addition make use of the second partial derivatives 

r 2 F (x) (second order strategies) 

The emphasis here will be placed on derivative-free strategies, that is on direct search 

methods, and on such higher order procedures as glean their required information about 

derivatives from a sequence of function values. The recursion scheme of most multidimensional 

strategies is based on the formula: 

x (k+1) = x (k) + s (k) v (k) (3.20) 

They di er from each other with regard to the choice of step length s (k) and search direction 

v (k) , the former being a scalar and the latter a vector of unit length. 

3.2.1 Direct Search Strategies 

Direct search strategies do without constructing a model of the objective function. Instead, 

the directions, and to some extent also step lengths, are xed heuristically, orby


ascheme of some sort, not always in an optimal way under the assumption of a speci ed 

internal model. Thus the risk is run of not being able to improve the objective function 

value at each step. Failures must accordingly be planned for, if something can also be 

\learned" from them. This trial character of search strategies has earned them the name 

of trial-and-error methods. The most important of them that are still in current use will 

be presented in the following chapters. Their attraction lies not in theoretical proofs of 

convergence and rates of convergence, but in their simplicity and the fact that they have 

proved themselves in practice. In the case of convex or quadratic unimodal objective 

functions, however, they are generally inferior to the rst and second order strategies to 

be described later. 

3.2.1.1 Coordinate Strategy 

The oldest of multidimensional search procedures trades under a variety of names (e.g., 

successive variation of the variables, relaxation, parallel axis search, univariate or univariant 

search, one-variable-at-a-time method, axial iteration technique, cyclic coordinate 

ascent method, alternating variable search, sectioning method, Gauss-Seidel strategy) and 

manifests itself in a large number of variations. 

The basic idea of the coordinate strategy, as it will be called here, comes from linear 

algebra and was rst put into practice by Gauss and Seidel in the single step relaxation 

method of solving systems of linear equations (see Ortega and Rocko , 1966 Ortega and 

Rheinboldt, 1967 VanNorton, 1967 Schwarz, Rutishauser, and Stiefel, 1968). As an 

optimization strategy it is attributed to Southwell (1940, 1946) or Friedmann and Savage 

(1947) (see also D'Esopo, 1959 Zangwill, 1969 Zadeh, 1970 Schechter, 1970). 

The parameters in the iteration formula (3.20) are varied in turn individually, i.e., the 

search directions are xed by the rule: 

v (k) = e` with ` = 

( n if k = pn p integer 

k (mod n) otherwise 

where e` is the unit vector whose components have thevalue zero for all i 6= `, and unity 

for i = `. In its simplest form the coordinate strategy uses a constant steplengths (k) . 

Since, however, the direction to the minimum is unknown, both positive and negative 

values of s (k) must be tried. In a rst and easy improvement on the basic procedure, a 

successful step is followed by further steps in the same direction, until a worsening of the 

objective function is noted. It is clear that the choice of step length strongly in uences 

the number of trials required on the one hand and the accuracy that can be achieved in 

the approximation on the other. 

One can avoid the problem of the choice of step length most e ectively by using a line 

search method each time to locate the relative optimum in the chosen direction. Besides 

the interval division methods, the Fibonacci search and the golden section, Lagrangian 

interpolation can also be used, since all these procedures work without knowledge of the 

partial derivatives of the objective function. A further strategy for boxing in the minimum 

must be added, in order to establish a suitable starting interval for each one dimensional 

minimization.


The algorithm can be described as follows: 

Step 0: (Initialization) 

Establish a starting point x (00) and choose an accuracy bound ">0 for the 

one dimensional search. 

Set k =0andi =1. 

Step 1: (Boxing in the minimum) 

Starting from x (ki;1) with an initial step length s = smin (e.g., smin =10"), 

box in the minimum in the direction ei. 

Double the step length at each successful trial, as long as s


e 2 

(0,0) 

x 

Start 

(0,1) 

x 

(0,2) (1,0) 

x x 

= 

(1,1) 

x 

(1,2) (2,0) 

x x 

= 

End 

x (2,1) 

x (2,2) (3,0) 

= x 

e 1 

Figure 3.5: Coordinate strategy 

Numbering 

Iteration index 

Direction index 

Variable 

values 

k i x 1 x 2 

(0) 0 0 0 9 

(1) 0 1 3 9 

(2) 0 2 3 5 

carried over 

(2) 1 0 3 5 

(3) 1 1 7 5 

(4) 1 2 7 3 

carried over 

(4) 2 0 7 3 

(5) 2 1 9 3 

(6) 2 2 9 2 

carried over 

(6) 3 0 9 2 

convergence are so small that the number of signi cant gures to which data are handled 

by the computer is insu cient for the variables to be signi cantly altered. 

Numerical tests with the coordinate strategy show that an exact determination of the 

relative minima is unnecessary, at least at distances far from the objective. It can even 

happen that one inaccurate line search can make the next one particularly e ective. This 

phenomenon is exploited in the procedures known as under- or overrelaxation (Engeli, 

Ginsburg, Rutishauser, and Stiefel, 1959 Varga, 1962 Schechter, 1962, 1968 Cryer, 

1971). Although the relative optimum is determined as before, either an increment is 

added on in the same direction or an iteration point is de ned on the route between the 

start and nish of the one dimensional search. The choice of the under- or overrelaxation 

factor requires assumptions about the structure of the problem. The necessary information 

is available for the problem of solving systems of linear equations with a positive de nite 

matrix of coe cients, but not for general optimization problems. 

Further possible variations of the coordinate strategy are obtained if the sequence of 

searches parallel to the axes is not made to follow the cyclic scheme. Southwell (1946), for 

example, always selects either the direction in which the slope of the objective function 

Fx i(x) = @F(x) 

is maximum, or the direction in which the largest step can be taken. Toevaluate the choice 

@xi


of direction, Synge (1944) uses the ratio Fx i=Fx ix i of rst to second partial derivatives at 

the point x (k) . Whether or not the additional e ort for this scheme is worthwhile depends 

on the particular topology of the contour surface. Adding directions other than parallel 

to the axes is also often found to accelerate the convergence (Pinsker and Tseitlin, 1962 

Elkin, 1968). 

Its great simplicity has always made the coordinate strategy attractive, despite its 

sometimes slow convergence. Rules for handling constraints{not counting here penalty 

function methods{have been devised, for example, by Singer (1962), Murata (1963), and 

Mugele (1961, 1962, 1966). Singer's maze method departs from the coordinate directions 

as soon as a constraint is violated and progresses into the feasible region or along the 

boundary. For this, however, the gradient of the active constraints must be known. 

Mugele's poor man's optimizer, a discrete coordinate strategy without line searches, not 

only handles active constraints, but can also cope with narrow valleys that do not run 

parallel to the coordinate axes. In this case diagonal steps are permitted. Similar to this 

strategy is the direct search methodofHooke and Jeeves, which because it has become 

very widely used will be treated in detail in the following chapter. 

3.2.1.2 Strategy of Hooke and Jeeves: Pattern Search 

The direct pattern search of Hooke and Jeeves (1961) was originally devised as an automatic 

experimental strategy (see Hooke, 1957 Hooke andVanNice, 1959). It is nowadays 

much more widely used as a numerical parameter optimization procedure. 

The method by which the direct pattern search works is characterized by two types 

of move. At each iteration there is an exploratory move, which represents a simpli ed 

Gauss-Seidel variation with one discrete step per coordinate direction. No line searches 

are made. On the assumption that the line joining the rst and last points of the exploratory 

move represents an especially favorable direction, an extrapolation is made along 

it (pattern move) before the variables are varied again individually. The extrapolations 

do not necessarily lead to an improvement in the objective function value. The success of 

the iteration is only checked after the following exploratory move. The length of the pattern 

step is thereby increased each time, while the optimal search direction only changes 

gradually. Thispays o to most advantage where there are narrow valleys. An ALGOL 

implementation of the strategy is due to Kaupe (1963). It was improved by Bell and Pike 

(1966), as well as by Smith (1969) (see also DeVogelaere, 1968 Tomlin and Smith, 1969). 

In the rst case, the sequence of plus and minus exploratory steps in the coordinate directions 

is modi ed to suit the conditions at any instant. The second improvement aims 

at permitting a retrospective scaling of the variables as the step lengths can be chosen 

individually to be di erent from each other. 

The algorithm runs as follows: 


Choose a starting point x (00) = x (;1n) , an accuracy bound ">0, and initial 

step lengths s (0) 

i 6= 0 for all i = 1(1)n (e.g., s (0) 

1 = 1 if no more plausible 

values are at hand). 

Set k =0andi =1.


Step 1: (Exploratory move) 

Construct x0 = x (ki;1) + s (k) 

i ei (discrete step in positive direction). 

If F (x0 )


Step 10: (Iteration loop) 

Increase k k +1, set i = 1, and go to step 1. 

Figure 3.6, together with the following table, presents a possible sequence of iteration 

points. From the starting point (0), a successful step (1) and (3) is taken in each coordinate 

direction. Since the end point of this exploratory move is better than the starting point, 

it serves as a basis for the rst extrapolation. This leads to (4). It is not checked here 

whether or not any improvement over (3) has occurred. At the next exploratory move, 

from (4) to (5), the objective function value can only be improved in one coordinate 

direction. It is now checked whether the condition (5) is better than that of point (3). 

This is the case. The next extrapolation step, to (8), has a changed direction because 

of the partial failure of the exploration, but maintains its increased length. Now it will 

be assumed that, starting from (8) with the hitherto constant exploratory step length, 

no success will be scored in any coordinate direction compared to (8). The comparison 

with (5) shows that a reduction in the value of the objective function has nevertheless 

occurred. Thus the next extrapolation to (13) remains the same as the previous one with 

respect to direction and step length. The next exploratory move leads to a point (15), 

which although better than (13) is worse than (8). Now there is a return to (8). Only 

after the exploration again has no success here, are the step lengths halved in order to 

make further progress possible. The fact that at some points in this case the objective 

function was tested several times is not typical for n>2. 

(0) 

(2) 

(3) 

(1) 

(4) 

(7) 

(5) 

(21) 

(12) 

(22) 

(18) 

(17) 

(9) 

(6) 

(10) 

(19) 

(23) 

(24) 

(8) 

(25) (11) 

(20) 

Starting point 

Success 

Failure 

Extrapolation 

Final point 

(16) 

(15) 

(13) (14) 

Figure 3.6: Strategy of Hooke and Jeeves


Numbering Iteration Direction Variable Comparison Step Remarks 

index index values point lengths 

k i x 1 x 2 s 1 s 2 

(0) 0 0 0 9 - 2 2 starting point 

(1) 0 1 2 9 (0) success 

(2) 0 2 2 11 (1) failure 

(3) 0 2 2 7 (1) success 

(4) 1 0 4 5 - 2 ;2 extrapolation 

(5) 1 1 6 5 (4),(3) success, success 

(6) 1 2 6 3 (5) failure 

(7) 1 2 6 7 (5) failure 

(8) 2 0 10 3 -,(5) 2 ;2 extrapolation, 

success 

(9) 2 1 12 3 (8) failure 

(10) 2 1 8 3 (8) failure 

(11) 2 2 10 1 (8) failure 

(12) 2 2 10 5 (8) failure 

(13) 3 0 14 1 - 2 ;2 extrapolation 

(14) 3 1 16 1 (13) failure 

(15) 3 1 12 1 (13),(8) success, failure 

(16) 3 2 12 ;1 (15) failure 

(17) 3 2 12 3 (15) failure 

(8) 4 0 10 3 - 2 ;2 return 

(18) 4 1 12 3 (8) failure 

(19) 4 1 8 3 (8) failure 

(20) 4 2 10 1 (8) failure 

(21) 4 2 10 5 (8) failure 

(8) 5 0 10 3 - 1 ;1 step lengths 

halved 

(22) 5 1 11 3 (8) failure 

(23) 5 1 9 3 (8) success 

(24) 5 2 9 2 (23),(8) success, success 

(25) 6 0 8 1 - ;1 ;1 extrapolation 

A proof of convergence of the direct search ofHooke and Jeeves has been derived by 

Cea (1971) it is valid under the condition that the objective function F (x) is strictly 

convex and continuously di erentiable. The computational operations are very simple 

and even in unforeseen circumstances cannot lead to invalid arithmetical manipulations 

such as, for example, division by zero. A further advantage of the strategy is its small 

storage requirement. It is of order O(n). The selected pattern accelerates the search 

in valleys, provided they are not sharply bent. The extrapolation steps follow, in an 

approximate way, the gradient trajectory. However, the limitation of the trial steps to 

coordinate directions can also lead to a premature termination of the search here, as in 

the coordinate strategy. 

Further variations on the method, which have notachieved such popularity, are due 

to, among others, Wood (1960, 1962, 1965 see also Weisman and Wood, 1966 Weisman,


Wood, and Rivlin, 1965), Emery and O'Hagan (1966 spider method), Fend and Chandler 

(1961 moment rosetta search), Bandler and MacDonald (1969 razor search see also 

Bandler, 1969a,b), Pierre (1969 bunny-hop search), Erlicki and Appelbaum (1970), and 

Houston and Hu man (1971). A more detailed enumeration of older methods can be 

found in Lavi and Vogl (1966). Some of these modi cations allow constraints in the form 

of inequalities to be taken into account directly. Similar to them is a program designed 

by M.Schneider (see Drenick, 1967). Aside from the fact that in order to use it one must 

specify which of the variables enter the individual constraints, it does not appear to work 

very e ectively. Excessively long computation times and inaccurate results, especially 

with many variables, made it seem reasonable to omit M. Schneider's procedure from the 

strategy comparison (see Chap. 6). The problem of howtotakeinto account constraints in 

a direct search has also been investigated by Klingman and Himmelblau (1964) and Glass 

and Cooper (1965). The resulting methods, to a greater or lesser extent, transform the 

original problem. They have nowadays been superseded by the general penalty function 

methods. Automatic \optimizators" for on-line optimization of chemical processes, which 

once were well known under the names Opcon ( Bernard and Sonderquist, 1959) and 

Optimat (Weiss, Archer, and Burt, 1961), also apply modi ed versions of the direct search 

method. Another application is described by Sawaragi at al. (1971). 

3.2.1.3 Strategy of Rosenbrock: Rotating Coordinates 

Rosenbrock's idea (1960) was to remove the limitation on the number of search directions 

in the coordinate strategy so that the search steps can move parallel to the axes of a 

coordinate system that can rotate in the space IR n . One of the axes is set to point in 

the direction that appears most favorable. For this purpose the experience of successes 

and failures gathered in the course of the iterations is used in the manner of Hooke and 

Jeeves' direct search. The remaining directions are xed normal to the rst and mutually 

orthogonal. 

To start with, the search directions comprise the unit vectors 

v (0) 

i = ei for all i = 1(1)n 

Starting from the point x (00) , a trial is made in each direction with the discrete initial 

step lengths s (00) 

i for all i =1(1)n. When a success is scored (including equality of the 

objective function values), the changed variable vector is retained and the step length is 

multiplied by a positivefactor >1 for a failure, the vector of variables is left unchanged 

and the step length is multiplied by a negative factor ;1 <


where 


wi = 

8 

>< 

>: 

ai 

P 

ai ; i;1 

(a 

j=1 

T i v (k+1) 

j 

ai = 

nX 

j=i 

) v (k+1) 

 

j 

d (k) 

j v (k) 

j for all i = 1(1)n 

for i =1 

for i = 2(1)n 

(3.21) 

A scalar d (k) 

i represents the distance covered in direction v (k) 

i in the kth iteration. 

Thus v (k+1) 

1 points in the overall successful direction of the step k. It is expected that a 

particularly large search step can be taken in this direction at the next iteration. The 

requirement ofwaiting for at least one success in each direction has the e ect that no 

direction is lost, and the v (k) 

i always span the full n-dimensional Euclidean space. The 

termination rule or convergence criterion is determined by the lengths of the vectors 

a (k) 

1 

and a (k) 

2 . Before each orthonormalization there is a test whether ka (k) 

1 k < " and 

ka (k) 

2 k > 0:3 ka (k) 

1 k. When this condition is satis ed in six consecutive iterations, the 

search is ended. The second condition is designed to ensure that a premature termination 

of the search does not occur just because the distances covered have become small. More 

signi cantly, the requirement is also that the main success direction changes su ciently 

rapidly something that Rosenbrock regards as a sure sign of the proximity of a minimum. 

As the strategy comparison will show (see Chap. 6), this requirement is often too strong. 

It even hinders the ending of the procedure in many cases. 

In his original publication Rosenbrock has already given detailed rules as to how 

inequality constraints can be treated. His procedure for doing this can be viewed as 

a partial penalty function method, since the objective function is only altered in the 

neighborhood of the boundaries. Immediately after each variation of the variables, the 

objective function value is tested. If the comparison is unfavorable, a failure is registered 

as in the unconstrained case. For equality oranimprovement, however, if the iteration 

point lies near a boundary of the region, the success criterion changes. For example, for 

constraints of the form Gj(x) 0 for all j = 1(1)m, the extended objective function 

~F (x) takes the form (this is one of several suggestions of Rosenbrock): 

in which 


'j(x) = 

~F(x) =F (x)+ 

8 

>< 

>: 

mX 

j=1 

0 

3 ; 4 2 +2 3 

1 

'j(x)(fj ; F (x)) 

=1; 1 Gj(x) 

if Gj(x) 

if 0


for constraints of the form aj(x) Gj(x) bj(x) this kind of double sided bounding is 

not always given however). The basis of the procedure is fully described in Rosenbrock 

and Storey (1966). 

Using the notations 

xi object variables 

si step sizes 

vi direction components 

di distances travelled 

i success/failure indications 

the extended algorithm of the strategy runs as follows: 


Choose a starting point x (00) such that Gj(x (00) ) 

Choose an accuracy parameter ">0 

>0 for all j = 1(1)m. 

(Rosenbrock takes " =10 ;4 =10 ;4 ). 

Set v (0) 

i = ei for all i = 1(1)n. 

Set k =0 (outer loop counter), 

=0 (inner loop counter). 

If there are constraints (m >0), 

set fj = F (x (00) ) for all j =1(1)m. 

Step 1: (Initialization of step sizes, distances travelled, and indicators) 

Set s (k0) 

i =0:1, 

d (k) 

i = 0, and 

(k) 

i = ;1 for all i = 1(1)n. 

Set ` =0andi =1. 

Step 2: (Trial step) 

Construct x 0 = x (kn` + i ; 1) + s (k`) 

i 

v (k) 

i . 

If F (x 0 ) >F(x (kn` + i ; 1) ), go to step 6 

otherwise 

( 

=0 go to step 5 

if m 

6= 0 set ~ F = F (x0 ) and j =1: 

Step 3: (Test of feasibility) 

If Gj(x 0 ) 

8 

>< 

>: 

0 go to step 7 

set fj = F (x 0 ) and go to step 4 

otherwise, replace ~ F ~ F + 'j(x 0 )(fj ; ~ F ) as Equation (3.22). 

If ~F > F(x (kn` + i ; 1) ) go to step 6. 

Step 4: (Constraints loop) 

If j < mincrease j j +1 and go to step 3.


Step 5: (Store the success and update the internal memory) 

Set x (kn` + i) = x0 (k` +1) 

s i =3s (k`) 

i 

(k) 

(k) 

If i = ; 1 set i =0. 

Go to step 7. 

Step 6: (Internal memory update in case of failure) 

Set x (kn` + i) = x (kn` + i ; 1) , 

(k` +1) 

s i = ; 1 

2 s(k`) i . 

If =0 set =1. 

(k) 

i 

(k) 

i 

, and replace d (k) 

i d (k) 

i 

Step 7: (Main loop) 

(k) 

If j = 1 for all j =1(1)n, gotostep8 

otherwise 

( 


(5) 

(0) 

(0) 

v 

2 

v 

1 

(0) 

(1) 

v 

2 

(2) 

(6) 

(1) 

(4) 

v 

1 

(1) 

(8) 

(7) 

(10) 

(2) 

v 

2 

(9) 

v 

(2) 

Figure 3.7: Strategy of Rosenbrock 

1 

(3) 


Success 

Failure 

Overall success 

(11)


Numbering Iteration Test index/ Variable Step Remarks 

index iteration values lengths 

k nl + i x 1 x 2 s 1 s 2 

(0) 0 0 0 9 2 2 starting point 

(1) 0 1 2 9 2 - success 

(2) 0 2 2 11 - 2 failure 

(3) 0 3 8 9 6 - failure 

(4) 0 4 2 8 - ;1 success 

(5) 0 5 ;1 8 ;3 - failure 

(6) 0 6 2 5 - ;3 failure 

(4) 1 0 2 8 2 2 transformation 

and orthogonalization 

(7) 1 1 3.8 7.1 2 - success 

(8) 1 2 2.9 5.3 - 2 success 

(9) 1 3 8.3 2.6 6 - success 

(10) 1 4 5.6 ;2.7 - 6 failure 

(11) 1 5 24.4 ;5.4 18 - failure 

(9) 2 0 8.3 2.6 2 2 transformation 

and orthogonalization 

In Figure 3.7, including the following table, a few iterations of the Rosenbrock strategy 

for n = 2 are represented geometrically. At the starting point x (00) the search directions 

are the same as the unit vectors. After three runs through (6 trials), the trial steps in each 

direction have led to a success followed by a failure. At the best condition thus attained, 

are generated. Five further trials 

(4) at x (04) = x (10) , new direction vectors v (1) 

1 

and v (1) 

2 

lead to the best point, (9) at x (13) = x (20) , of the second iteration, at which a new choice 

of directions is again made. The complete sequence of steps can be followed, if desired, 

with the help of the accompanying table. 

Numerical experiments show that within a few iterations the rotating coordinates 

become oriented such that one of the axes points along the gradient direction. The 

strategy thus allows sharp valleys in the topology of the objective function to be followed. 

Like the method of Hooke and Jeeves, Rosenbrock's procedure needs no information about 

partial derivatives and uses no line search method for exact location of relative minima. 

This makes it very robust. It has, however, one disadvantage compared to the direct 

pattern search: The orthogonalization procedure of Gram and Schmidt is very costly. 

It requires storage space of order O(n 2 ) for the matrices A = faijg and V = fvijg, 

and the number of computational operations even increases with O(n 3 ). At least in 

cases where the objective function call costs relatively little, the computation time for 

the orthogonalization with many variables becomes highly signi cant. Besides this, the 

number of parameters is in any case limited by the high storage space requirement. 

If there are constraints, care must be taken to ensure that the starting point is inside 

the allowed region and su ciently far from the boundaries. Examples of the application of


Rosenbrock's strategy can be found in Storey (1962), and Storey and Rosenbrock (1964). 

Among them is also a discretized functional optimization problem. For unconstrained 

problems there exists the code of Machura and Mulawa (1973). The Gram-Schmidt 

orthogonalization has been programmed, for example, by Clayton (1971). 

Lange-Nielsen and Lance (1972) have proposed, on the basis of numerical experiments, 

two improvements in the Rosenbrock strategy. The rst involves not setting constant 

step lengths at the beginning of a cycle or after each orthogonalization, but rather 

modifying them and simultaneously scaling them according to the successes and failures 

during the preceding cycle. The second improvement concerns the termination criterion. 

Rosenbrock's original version is replaced by the simpler condition that, according to the 

achievable computational accuracy, several consecutive trials yield the same value of the 

objective function. 

3.2.1.4 Strategy of Davies, Swann, and Campey (DSC) 

A combination of the Rosenbrock idea of rotating coordinates with one dimensional search 

methods is due to Swann (1964). It has become known under the name Davies-Swann- 

Campey (abbreviated DSC) strategy. The description of the procedure given by Box, 

Davies, and Swann (1969) di ers somewhat from that in Swann, and so several versions 

of the strategy have arisen in the subsequent literature. Preference is given here to the 

original concept of Swann, which exhibits some features in common with the method of 

conjugate directions of Smith (1962) (see also Sect. 3.2.2). Starting from x (00) , a line 

search is made in each of the unit directions v (0) 

i 

= ei for all i =1(1)n. This process is 

followed by a one dimensional minimization in the direction of the overall success so far 

achieved 

v (0) 

n+1 = x(0n) ; x (00) 

kx (0n) ; x (00) k 

with the result x (0n +1) . 

The orthogonalization follows this, e.g., by the Gram-Schmidt method. If one of the 

line searches was unsuccessful the new set of directions would no longer span the complete 

parameter space. Therefore only those old direction vectors along which a prescribed 

minimum distance has been moved are included in the orthogonalization process. The 

other directions remain unchanged. The DSC method, however, places a further hurdle 

before the coordinate rotation. If the distance covered in one iteration is smaller than 

the step length used in the line search, the latter is reduced by a factor 10, and the next 

iteration is carried out with the old set of directions. 

After an orthogonalization, one of the new directions (the rst) coincides with that of 

the (n+1)-th line search of the previous step. This can therefore also be interpreted as the 

rst minimization in the new coordinate system. Only n more one dimensional searches 

need be made to nish the iteration. As a termination criterion the DSC strategy uses 

the length of the total vector between the starting point andendpoint of an iteration. 

The search is ended when it is less than a prescribed accuracy bound.


The algorithm runs as follows: 


Specify a starting point x (00) and an initial step length s (0) 

(the same for all directions). 

De ne an accuracy requirement ">0. 

Choose as a rst set of directions v (0) 

i = ei for all i = 1(1)n. 

Set k =0andi =1. 

Step 1: (Line search) 

Starting from x (ki;1) , seek the relative minimum x (ki) 

in the direction v (k) 

i 

such that 

F (x (ki) )=F (x (ki;1) + d (k) 

i v (k) 

i ) = min 

d fF (x (ki;1) + dv (k) 

i )g: 

Step 2: (Main 8 loop) 

>< : 

= n 

= n +1 

go to step 3 

go to step 4. 

Step 3: (Eventually one more line search) 

Construct z = x (kn) ; x (k0) . 

If kzk > 0, set v (k) 

n+1 = z=kzk i = n +1 , and go to step 1 

otherwise set x (kn+1) = x (kn) d (k) 

n+1 = 0 , and go to step 5. 

Step 4: (Check appropriateness of step length) 

If kx (kn+1) ; x (k0) k s (k) , go to step 6. 

Step 5: (Termination criterion) 

Set s (k+1) =0:1 s (k) . 

If s (k+1) " end the search 

otherwise set x (k+10) = x (kn+1) , 

increase k k +1, set i = 1, and go to step 1. 

Step 6: (Check appropriateness of orthogonalization) 

Reorder the directions v (k) 

i and associated distances d (k) 

i such that 

jd (k) 

( 

>"for all i = 1(1)p 

i j 

" for all i = p +1(1)n: 

If p


No geometric representation has been attempted here, since the ne deviations from 

the Rosenbrock method would hardly be apparent on a simple diagram. 

The line search procedure of the DSC method has been described in detail by Box, 

Davies, and Swann (1969). It boxes in the minimum in the chosen direction using three 

equidistant points and then applies a single Lagrangian quadratic interpolation. The 

authors state that, in their experience, this is more economical with regard to the number 

of objective function calls than an exact line search with a sequence of interpolations. 

The algorithm of the line search is: 


Specify a starting point x 0, a step length s, and a direction v 

(all given from the main program). 

Step 1: (Step forward) 

Construct x = x 0 + sv. 

If F (x) F (x 0), go to step 3. 

Step 2: (Step backward) 

Replace x x ; 2 sv and s ;s. 

If F (x) F (x 0), go to step 3 

otherwise (both rst trials without success) go to step 5. 

Step 3: (Further steps) 

Replace s 2 s and set x 0 = x. 


If F (x) F (x 0), repeat step 3. 

Step 4: (Prepare interpolation) 

Replace s 0:5 s. 


Of the four points just generated, x 0 ; s x 0x 0 + s, andx 0 +2s, reject 

the one which is furthest from the point that has the smallest value of the 

objective function. 

Step 5: (Interpolation) 

De ne the three available equidistant points x 1


Anumerical strategy comparison by M.J. Box (1966) shows the method to be a very 

e ective optimization procedure, in general superior both to the Hooke and Jeeves and 

the Rosenbrock methods. However, the tests only refer to smooth objective functions with 

few variables. If the number of parameters is large, the costly orthogonalization process 

makes its inconvenient presence felt also in the DSC strategy. 

Several suggestions have been made to date as to how to simplify the Gram-Schmidt 

procedure and to reduce its susceptibilitytonumerical rounding error (Rice, 1966 Powell, 

1968a Palmer, 1969 Golub and Saunders, 1970, Householder method). 

Palmer replaces the conditions of Equation (3.21) by: 

v (k+1) 

i 

= 

8 

>< 

>: 

v (k) 

i if 

nP 

j=1 

nP 

j=1 

dj v (k) 

s 

nP 

j = d 

j=1 

2 

j for i =1 

nP 

di;1 

j=i 

dj v (k) 

j 

; v (k) 

i;1 

nP 

j=i 

d 2 

j 

! 

s 

nP 

= 

j=i 

d 2 

j 

nP 

j=i;1 

d 2 

j for i = 2(1)n 

d 2 

j = 0 otherwise 

He shows that even if no success was obtained in one of the directions v (k) 

i , that is di =0, 

the new vectors v (k+1) 

i 

v (k+1) 

i+1 

for all i = 1(1)n still span the complete parameter space, because 

is set equal to ;v (k) 

i .Thus the algorithm does not need to be restricted to directions 

for which di >", as happens in the algorithm with Gram-Schmidt orthogonalization. 

The signi cant advantage of the revised procedure lies in the fact that the number 

of computational operations remains only of the order O(n 2 ). The storage requirement 

is also somewhat less since one n n matrix as an intermediate storage area is omitted. 

For problems with linear constraints (equalities and inequalities) Box, Davies, and Swann 

(1969) recommend a modi cation of the orthogonalization procedure that works in a 

similar way to the method of projected gradients of Rosen (1960, 1961) (see also Davies, 

1968). Non-linear constraints (inequalities) can be handled with the created response 

surface technique devised by Carroll (1961), which is one of the penalty function methods. 

Further publications on the DSC strategy, also with comparison tests, are those of 

Swann (1969), Davies and Swann (1969), Davies (1970), and Swann (1972). Hoshino 

(1971) observes that in a narrow valley the search causes zigzag movements. His remedy 

for this is to add a further search, again in direction v (k) 

1 , after each setofn line searches. 

With the help of two examples, for n =2andn =3,heshows the accelerating e ect of 

this measure. 

3.2.1.5 Simplex Strategy of Nelder and Mead 

There are a group of methods called simplex strategies that work quite di erently to 

the direct search methods described so far. In spite of their common name they have 

nothing to do with the simplex method of linear programming of Dantzig (1966). The 

idea (Spendley, Hext, and Himsworth, 1962) originates in an attempt to reduce, as much 

as possible, the number of simultaneous trials in the experimental identi cation procedure


of factorial design (see for example Davies, 1954). The minimum number according to 

Brooks and Mickey (1961) is n +1. Thus instead of a single starting point, n +1vertices 

are used. They are arranged so as to be equidistant from each other: for n = 2 in an 

equilateral triangle for n =3atetrahedron and in general a polyhedron, also referred to 

as a simplex. The objective function is evaluated at all the vertices. The iteration rule is: 

Replace the vertex with the largest objective function value by a new one situated at its 

re ection in the midpoint of the other n vertices. This rule aims to locate the new point 

at an especially promising place. If one lands near a minimum, the newest vertex can 

also be the worst. In this case the second worst vertex should be re ected. If the edge 

length of the polyhedron is not changed, the search eventually stagnates. The polyhedra 

rotate about the vertex with the best objective function value. A closer approximation to 

the optimum can only be achieved by halving the edge lengths of the simplex. Spendley, 

Hext, and Himsworth suggest doing this whenever a vertex is common to more than 

1:65 n +0:05 n 2 consecutive polyhedra. Himsworth (1962) holds that this strategy is 

especially advantageous when the number of variables is large and the determination of 

the objective function prone to error. 

To this basic procedure, various modi cations have been proposed by, among others, 

Nelder and Mead (1965), Box (1965), Ward, Nag, and Dixon (1969), and Dambrauskas 

(1970, 1972). Richardson and Kuester (1973) have provided a complete program. The 

most common version is that of Nelder and Mead, in which the main di erence from the 

basic procedure is that the size and shape of the simplex is modi ed during the run to 

suit the conditions at each stage. 

The algorithm, with an extension by O'Neill (1971), runs as follows: 


Choose a starting point x (00) , initial step lengths s (0) 

i for all i = 1(1)n 

(if no better scaling is known, s (0) 

i = 1), and an accuracy parameter " > 0 

(e.g., " =10 ;8 ). Set c =1andk =0. 

Step 1: (Establish the initial simplex) 

x (k ) = x (k0) + cs (0) e for all = 1(1)n. 

Step 2: (Determine worst and best points for the normal re ection) 

Determine the indices w (worst point) and b (best point) such that 

F (x (kw) ) = maxfF (x (k ) ) = 0(1)ng 

F (x (kb) ) = minfF (x (k ) ) = 0(1)ng 

Construct x = 1 

n 

nP 

=0 6=w 

If F (x 0 ) < 

>: 

> 1 set x (k+1w) = x 0 and go to step 8 

=1 go to step 5 

=0 go to step 6:


Step 4: (Expansion) 

Construct x 00 =2x 0 ; x. 

If F (x 00 )


(1) (2) 

(3) (4) 

(6) 

(14) 

(5) 

(7) 

(17) 

(13) 

(12) 

(15) 

(16) 

(9) 

(11) 

(8) 


Vertex point 

First and last simplex 

(10) 

Figure 3.8: Simplex strategy of Nelder and Mead


Iteration Simplex vertices 

index worst best Remarks 

0 1 2 3 start simplex 

2 3 4 re ection 

2 3 5 expansion (successful) 

1 2 3 5 


2 3 6 5 


6 8 5 expansion (unsuccessful) 

3 6 5 7 


4 5 9 7 


9 11 7 partial outside contraction 

5 9 11 7 


11 13 7 expansion (unsuccessful) 

6 11 7 12 


15 7 12 partial inside contraction 

17 16 12 total contraction 

7 17 16 12 

The main di erence between this program and the original strategy of Nelder and Mead 

is that after a normal ending of the minimization there is an attempt to construct a new 

starting simplex. To this end, small trial steps are taken in each coordinate direction. 

If just one of these tests is successful, the search is started again but with a simplex 

of considerably reduced edge lengths. This restart procedure recommends itself because, 

especially for a large number of variables, the simplex tends to no longer span the complete 

parameter space, i.e., to collapse, without reaching the minimum. 

For few variables the simplex method is known to be robust and reliable, but also to be 

relatively costly. There are n +1 parameter vectors to be stored and the re ection requires 

anumber of computational operations of order O(n 2 ). According to Nelder and Mead, the 

number of function calls increases approximately as O(n 2:11 ) however, this empirical value 

is based only on test results with up to 10 variables. Parkinson and Hutchinson (1972a,b) 

describe a variant of the strategy in which the real storage requirement can be reduced 

by about half (see also Spendley, 1969). Masters and Drucker (1971) recommend altering 

the expansion or contraction factor after consecutive successes or failures respectively. 

3.2.1.6 Complex Strategy of Box 

M. J. Box (1965) calls his modi cation of the polyhedron strategy the complex method, 

an abbreviation for constrained simplex, since he conceived it also for problems with


inequality constraints. The starting point of the search does not need to lie in the feasible 

region. For this case Box suggests locating an allowed point by minimizing the function 

with 

until 

~F(x) =; 

j(x) = 

mX 

j=1 

Gj(x) j(x) 

( 0 if Gj(x) 0 

1 otherwise 

~F(x) =0 

(3.23) 

The two most important di erences from the Nelder-Mead strategy are the use of more 

vertices and the expansion of the polyhedron at each normal re ection. Both measures 

are intended to prevent the complex from eventually spanning only a subspace of reduced 

dimensionality, especially at active constraints. If an allowed starting point isgiven or 

has been found, it de nes one of the n +1 N 2 n vertices of the polyhedron. The 

remaining vertex points are xed by a random process in which each vector inside the 

closed region de ned by the explicit constraints has an equal probability of selection. If an 

implicit constraint is violated, the new point is displaced stepwise towards the midpoint 

of the allowed vertices that have already been de ned until it satis es all the constraints. 

Implicit constraints Gj(x) 0 are dealt with similarly during the course of the minimum 

search. If an explicit boundary is crossed, xi 

back in the allowed region to a value near the boundary. 

The details of the algorithm are as follows: 

ai, the o ending variable is simply set 


Choose a starting point x (0) andanumber of vertices N n +1 (e.g., 

N =2n). Number the constraints such that the rst j m 1 each depend 

only on one variable, x`j (Gj(x`j), explicit form). 

Test whether x (0) satis es all the constraints. 

If not, then construct a substitute objective function according to Equation 

(3.23). 

Set up the initial complex as follows: 

x (01) = x (0) 

and x (0 ) = x (0) + nP 

zi ei for = 2(1)N, 

i=1 

where the zi are uniformly distributed random numbers from the range 

( [aibi], if constraints are given in the form ai xi bi 

otherwise h x (0) 

i 

; 0:5 s x (0) 

i 

If Gj(x (0 ) ) < 0foranyj m1 > 1 

(0 ) 

replace x `j 2 x (01) (0 ) 

`j ; x `j : 

+0:5 s] where, e.g., s =1: 

If Gj(x (0 ) ) < 0foranyj m > 1 

replace x (0 ) 0:5[x (0 ) + 1 P;1 

x ;1 

=1 

(0 ) ].


(If necessary repeat this process until Gj(x (0 ) ) 0 for all j =1(1)m.) 

Set k =0. 

Step 1: (Re ection) 

Determine the index w (worst vertex) such that 

F (x (kw) ) = maxfF (x (k ) ) =1(1)Ng: 

Construct x = 1 

N;1 

NP 

=1 

6=w 

(k ) x 

and x 0 = x + (x ; x (kw) ) (over-re ection factor =1:3): 

Step 2: (Check for constraints) 

If m = 0, go to step 7 otherwise set j =1. 

If m 1 = 0 , go to step 5. 

Step 3: (Set vertex back into bounds for explicit constraints) 

Obtain g = Gj(x0 )=Gj(x0 `j). 

If g 0, go to step 4 

otherwise replace x0 `j x0 `j + g + " (backwards length " =10 ;6 ). 

If Gj(x0 ) < 0, replace x0 `j x0`j ; 2(g + "). 

Step 4: (Explicit constraints loop) 

Increase j j +1. 

If 

8 

>< 

>: 

j m 1 go to step 3 

m 1 m go to step 7: 

Step 5: (Check implicit constraints) 

If Gj(x 0 ) 0, go to step 6 

otherwise go to step 8, unless the same constraint caused a failure ve times 

in a row without its function value Gj(x 0 ) being changed. In this case go to 

step 9. 

Step 6: (Implicit constraints loop) 

If j < m, increase j j +1 and go to step 5. 

Step 7: (Check for improvement) 

If F (x0 )


Step 9: (Termination) 

Determine the index b (best vertex) such that 

F (x (kb) )=minfF (x (k ) ) = 1(1)Ng: 

End the search with the result x (kb) and F (x (kb) ): 

Box himself reports that in numerical tests his complex strategy gives similar results 

to the simplex method of Nelder and Mead, but both are inferior to the method of 

Rosenbrock with regard to the number of objective function calls. He actually uses his 

own modi cation of the Rosenbrock method. Investigation of the e ect of the number of 

vertices of the complex and the expansion factor (in this case 2 n and 1.3 respectively) 

lead him to the conclusion that neither value has a signi cant e ect on the e ciency of 

the strategy. For n>5 he considers that a number of vertices N =2n is unnecessarily 

high, especially when there are no constraints. 

The convergence criterion appears very reliable. While Nelder and Mead require that 

the standard deviation of all objective function values at the polyhedron vertices, referred 

to its midpoint, must be less than a prescribed size, the complex search is only ended 

when several consecutive values of the objective function are the same to computational 

accuracy. 

Because of the larger number of polyhedron vertices the complex method needs even 

more storage space than the simplex strategy. The order of magnitude, O(n 2 ), remains 

the same. No investigations are known of the computational e ort in the case of many 

variables. Modi cations of the strategy are due to Guin (1968), Mitchell and Kaplan 

(1968), and Dambrauskas (1970, 1972). Guin de nes a contraction rule with which an 

allowed point can be generated even if the allowed region is not convex. This is not always 

the case in the original method because the midpointtowhichtheworst vertex is re ected 

is not tested for feasibility. 

Mitchell nds that the initial con guration of the complex in uences the results obtained. 

It is therefore better to place the vertices in a deterministic way rather than to 

make a random choice. Dambrauskas combines the complex method with the step length 

rule of the stochastic approximation. He requires that the step lengths or edge lengths of 

the polyhedron go to zero in the limit of an in nite number of iterations, while their sum 

tends to in nity. This measure may well increase the reliability of convergence however, 

it also increases the cost. Beveridge and Schechter (1970) describe how the iteration rules 

must be changed if the variables can take only discrete values. A practical application, 

in which a process has to be optimized dynamically, is described by Tazaki, Shindo, and 

Umeda (1970) this is the original problem for which Spendley, Hext, and Himsworth 

(1962) conceived their simplex EVOP (evolutionary operation) procedure. 

Compared to other numerical optimization procedures the polyhedra strategies have 

the disadvantage that in the closing phase, near the optimum, they converge rather slowly 

and sometimes even stagnate. The direction of progress selected by the re ection then 

no longer coincides at all with the gradient direction. To remove this di culty it has 

been suggested that information about the topology of the objective function, as given by 

function values at the vertices of the polyhedron, be exploited to carry out a quadratic 

interpolation. Such surface tting is familiar from the related methods of test planning and


evaluation (lattice search, factorial design), in which the task is to set up mathematical 

models of physical or other processes. This territory is entered for example byG.E.P.Box 

(1957), Box and Wilson (1951), Box andHunter (1957), Box and Behnken (1960), Box 

and Draper (1969, 1987), Box et al. (1973), and Beveridge and Schechter (1970). It will 

not be covered in any more detail here. 

3.2.2 Gradient Strategies 

The Gauss-Seidel strategy very straightforwardly uses only directions parallel to the coordinate 

axes to successively improve the objective function value. All other direct search 

methods strive toadvance more rapidly by taking steps in other directions. To doso 

they exploit the knowledge about the topology of the objective function gleaned from 

the successes and failures of previous iterations. Directions are viewed as most promising 

in which the objective function decreases rapidly (for minimization) or increases rapidly 

(for maximization). Southwell (1946), for example, improves the relaxation by choosing 

the coordinate directions, not cyclically, but in order of the size of the local gradient in 

them. If the restriction of parallel axes is removed, the local best direction is given by 

the (negative) gradient vector 

with 

rF (x)=(Fx1(x)Fx2(x):::Fxn(x)) T 

Fxi(x) = @F 

(x) for all i = 1(1)n 

@xi 

at the point x (0) . All hill climbing procedures that orient their choice of search directions 

v (0) according to the rst partial derivatives of the objective function are called gradient 

strategies. They can be thought of as analogues of the total step procedure of Jacobi for 

solving systems of linear equations (see Schwarz, Rutishauser, and Stiefel, 1968). 

So great is the number of methods of this type which have been suggested or applied 

up to the present, that merely to list them all would be di cult. The reason lies in 

the fact that the gradient represents a local property of a function. To follow the path 

of the gradient exactly would mean determining in general a curved trajectory in the ndimensional 

space. This problem is only approximately soluble numerically and is more 

di cult than the original optimization problem. With the help of analogue computers 

continuous gradient methods have actually been implemented (Bekey and McGhee, 1964 

Levine, 1964). They consider the trajectory x(t) as a function of time and obtain it as 

the solution of a system of rst order di erential equations. 

All the numerical variants of the gradient method di er in the lengths of the discrete 

steps and thereby also with regard to how exactly they follow the gradient trajectory. 

The iteration rule is generally 

x (k+1) = x (k) ; s (k) rF (x(k) ) 

krF (x (k) )k 

It assumes that the partial derivatives everywhere exist and are unique. If F (x) is continuously 

di erentiable then the partial derivatives exist and F (x) iscontinuous.


A distinction is sometimes drawn between short step methods, which evaluate the gradients 

again after a small step in the direction rF (x (k) ) (for maximization) or ;rF (x (k) ) 

(for minimization), and their equivalent long step methods. Since for nite step lengths 

s (k) it is not certain whether the new variable vector is really better than the old, after the 

step the value of the objective function must be tested again. Working with small steps 

increases the number of objective function calls and gradientevaluations. Besides F (x) n 

partial derivatives must be evaluated. Even if the slopes can be obtained analytically and 

can be speci ed as functions, there is no reason to suppose that the number of computational 

operations per function call is much less than for the objective function itself. 

Except in special cases, the total cost increases roughly as the weighting factor (n +1) 

and the number of objective function calls. This also holds if the partial derivatives are 

approximated by di erential quotients obtained by means of trial steps 

Fxi(x) = @F(x) 

= 

@xi 

F (x + ei) ; F (x) 2 

+ O( ) for all i =1(1)n 

Additional di culties arise here since for values of that are too small the subtraction 

is subject to rounding error, while for trial steps that are too large the neglect of terms 

O( 2 ) leads to incorrect values. The choice of suitable deviations requires special care 

in all cases (Hildebrand, 1956 Curtis and Reid, 1974). 

Cauchy (1847), Kantorovich (1940, 1945), Levenberg (1944), and Curry (1944) are the 

originators of the gradient strategy, which started life as a method of solving equations 

and systems of equations. It is rst referred to as an aid to solving variational problems 

by Hadamard (1908) and Courant (1943). Whereas Cauchy works with xed step lengths 

s (k) , Curry tries to determine the distance covered in the (not normalized) direction 

v (k) = ;rF (x (k) ) so as to reach a relative minimum (see also Brown, 1959). In principle, 

any one of the one dimensional search methods of Section 3.1 can be called upon to nd 

the optimal value for s (k) : 

F (x (k) + s (k) v (k) ) = min 

s fF (x (k) + sv (k) )g 

This variant of the basic strategy could thus be called a longest step procedure. It is 

better known however under the name optimum gradient method, ormethod ofsteepest 

descent (for maximization, ascent). Theoretical investigations of convergence and rate 

of convergence of the method can be found, e.g., in Akaike (1960), Goldstein (1962), 

Ostrowski (1967), Forsythe (1968), Elkin (1968), Zangwill (1969), and Wolfe (1969, 1970, 

1971). Zangwill proves convergence based on the assumptions that the line searches are 

exact and the objective function is continuously twice di erentiable. Exactness of the one 

dimensional minimization is not, however, a necessary assumption (Wolfe, 1969). It is 

signi cant that one can only establish theoretically that a stationary point willbereached 

(rF (x) = 0) or approached (krF (x)k < "" > 0). The stationary point is a minimum, 

only if F (x) is convex and three times di erentiable (Akaike, 1960). Zellnik, Sondak, and 

Davis (1962), however, show that saddle points are in practice an obstacle, only if the 

search is started at one, or on a straight gradient trajectory passing through one. In other 

cases numerical rounding errors ensure that the path to a saddle point is unstable.


The gradient strategy, however, cannot distinguish global from local minima. The 

optimum at which it aims depends only on the choice of the starting point for the search. 

The only chance of nding absolute extrema is to start su ciently often from various 

initial values of the variables and to iterate each timeuntil the convergence criterion is 

satis ed (Jacoby, Kowalik, and Pizzo, 1972). The termination rules usually recommended 

for gradient methods are that the absolute value of the vector 

or the di erence 

krF (x (k) )k


and those derived from previous iteration points 

v (k) = x (k) ; x (k;3) for k 2 even (with x (;1) = x (0) ) 

For quadratic functions the minimum is reached after at most 2 n ; 1 line searches 

(Shah, Buehler, and Kempthorne, 1964). This desirable property of converging after a 

nite number of iterations, that is also called quadratic convergence, is only shared by 

strategies that apply conjugate gradients, of which thePartan methods can be regarded 

as forerunners (Pierre, 1969 Sorenson, 1969). 

In the fties, simple gradient strategies were very popular, especially the method of 

steepest descent. Today they are usually only to be found as components of program 

packages together with other hill climbing methods, e.g., in GROPE of Flood and Leon 

(1966), in AID of Casey and Rustay (1966), in AESOP of Hague and Glatt (1968), and in 

GOSPEL of Huelsman (1968). McGhee (1967) presents a detailed ow diagram. Wasscher 

(1963a,b) has published two ALGOL codings (see also Haubrich, 1963 Wallack, 1964 

Varah, 1965 Wasscher, 1965). The partial derivatives are obtained numerically. A comprehensive 

bibliography by Leon (1966b) names most of the older versions of strategies 

and gives many examples of their application. Numerical comparison tests have been carried 

out by Fletcher (1965), Box (1966), Leon (1966a), Colville (1968, 1970), and Kowalik 

and Osborne (1968). They show the superiority of rst (and second) order methods over 

direct search strategies for objective functions with smooth topology. Gradient methods 

for solving systems of di erential equations are described for example by Talkin (1964). 

For such problems, as well as for functional optimization problems, analogue and hybrid 

computers have often been applied (Rybashov, 1965a,b, 1969 Sydow, 1968 Fogarty 

and Howe, 1968, 1970). A literature survey on this subject has been compiled by Gilbert 

(1967). For the treatmentofvariational problems see Kelley (1962), Altman (1966), Miele 

(1969), Bryson and Ho (1969), Cea (1971), Daniel (1971), and Tolle (1971). 

In the experimental eld, there are considerable di culties in determining the partial 

derivatives. Errors in the values of the objective function can cause the predicted direction 

of steepest descent to lie almost perpendicular to the true gradient vector (Kowalik and 

Osborne, 1968). Box and Wilson (1951) attempt to compensate for the perturbations 

by repeating the trial steps or increasing their number above the necessary minimum of 

(n +1). With 2 n trials, for example, a complete factorial design can be constructed 

(e.g., Davies, 1954). The slope in one direction is obtained by averaging the function 

value di erences over 2 n;1 pairs of points (Lapidus et al., 1961). Another possibility isto 

determine the coe cients of a linear polynomial such that the sum of the squares of the 

errors between measured and model function values at N n +1 points is a minimum. 

The linear function then represents the tangent plane of the objective function at the point 

under consideration. The cost of obtaining the gradients when there are many variables 

is too great for practical application, and only justi ed if the aim is rather to set up a 

mathematical model of the system than simply to perform the optimization. 

In the EVOP (acronym for evolutionary operation) scheme,G.E.P.Box (1957) has 

presented a practical simpli cation of this gradient method. It actually counts as a direct 

search strategy because it does not obtain the direction of the gradient but only one of a 

nite number of especially good directions. Spendley, Hext, and Himsworth (1962) have


devised a variant of the procedure (see also Sections 3.2.1.5 and 3.2.1.6). Lowe (1964) 

has gathered together the various schemes of trial steps for the EVOP strategy. The 

philosophy oftheEVOP strategy is treated in detail by Box and Draper (1969). Some 

examples of applications are given by Kenworthy (1967). The e ciency of methods of 

determining the gradient in the case of stochastic perturbations is dealt with by Mlynski 

(1964a,b, 1966a,b), Sergiyevskiy and Ter-Saakov (1970), and others. 

3.2.2.1 Strategy of Powell: Conjugate Directions 

The most important idea for overcoming the convergence di culties of the gradient strategy 

is due to Hestenes and Stiefel (1952), and again comes from the eld of linear algebra 

(see also Ginsburg, 1963 Beckman, 1967). It trades under the names conjugate directions 

or conjugate gradients. The directions fvi i = 1(1)ng are said to be conjugate with 

respect to a positive de nite n n matrix A if (Hestenes, 1956) 

v T 

i Avj =0 for all i j = 1(1)ni 6= j 

A further property of conjugate directions is their linear independence, i.e., 

nX 

i=1 

i vi =0 

only holds if all the constants f i i =1(1)ng are zero. If A is replaced by the unit matrix, 

A = I, thenthevi are mutually orthogonal. With A = r 2 F (x) (Hessian matrix) the 

minimum of a quadratic function is obtained exactly in n line searches in the directions 

vi. This is a factor two better than the gradient Partan method. For general non-linear 

problems the convergence rate cannot be speci ed. As it is frequently assumed, however, 

that many problems behave roughly quadratically near the optimum, it seems worthwhile 

to use conjugate directions. The quadratic convergence of the search with conjugate 

directions comes about because second order properties of the objective function are 

taken into account. In this respect it is not, in fact, a rst order gradient method, but 

a second order procedure. If all the n rst and n 

(n +1) second partial derivatives are 

2 

available, the conjugate directions can be generated in one process corresponding to the 

Gram-Schmidt orthogonalization (Kowalik and Osborne, 1968). It calls for expensive 

matrix operations. Conjugate directions can, however, be constructed without knowledge 

of the second derivatives: for example, from the changes in the gradient vector in the 

course of the iterations (Fletcher and Reeves, 1964). Because of this implicit exploitation 

of second order properties, conjugate directions has been classi ed as a gradient method. 

The conjugate gradients method of Fletcher and Reeves consists of a sequence of line 

searches with Hermitian interpolation (see Sect. 3.1.2.3.4). As a rst search direction v (0) 

at the starting point x (0) , the simple gradient direction 

v (0) = ;rF (x (0) ) 

is used. The recursion formula for the subsequent iterations is 

v (k) = (k) v (k;1) ;rF (x (k) ) for all k = 1(1)n (3.25)


with the correction factor 

(k) = rF (x (k) ) T rF (x (k) ) 

rF (x (k;1) ) T rF (x (k;1) ) 

For a quadratic objective function with a positive de nite Hessian matrix, conjugate 

directions are generated in this way and the minimum is found with n line searches. Since 

at any time only the last direction needs to be stored, the storage requirement increases 

linearly with the number of variables. This often signi es a great advantage over other 

strategies. In the general, non-linear, non-quadratic case more than n iterations must be 

carried out, for which the method of Fletcher and Reeves must be modi ed. Continued 

application of the recursion formula (Equation (3.25)) can lead to linear dependence of 

the search directions. For this reason it seems necessary to forget from time to time the 

accumulated information and to start afresh with the simple gradient direction (Crowder 

and Wolfe, 1972). Various suggestions have been made for the frequency of this restart 

rule (Fletcher, 1972a). Absolute reliability of convergence in the general case is still not 

guaranteed by this approach. If the Hessian matrix of second partial derivatives has 

points of singularity, then the conjugate gradient strategy can fail. The exactness of 

the line searches also has an important e ect on the convergence rate (Kawamura and 

Volz, 1973). Polak (1971) de nes conditions under which the method of Fletcher and 

Reeves achieves greater than linear convergence. Fletcher (1972c) himself has written a 

FORTRAN program. 

Other conjugate gradient methods have been proposed by Powell (1962), Polak and 

Ribiere (1969) (see also Klessig and Polak, 1972), Hestenes (1969), and Zoutendijk (1970). 

Schley (1968) has published a complete FORTRAN program. Conjugate directions are 

also produced by theprojected gradient methods (Myers, 1968 Pearson, 1969 Sorenson, 

1969 Cornick and Michel, 1972) and the memory gradient methods (Miele and Cantrell, 

1969, 1970 see also Cantrell, 1969 Cragg and Levy, 1969 Miele, 1969 Miele, Huang, 

and Heidemann, 1969 Miele, Levy, and Cragg, 1971 Miele, Tietze, and Levy, 1972 

Miele et al., 1974). Relevant theoretical investigations have been made by, among others, 

Greenstadt (1967a), Daniel (1967a, 1970, 1973), Huang (1970), Beale (1972), and Cohen 

(1972). 

Conjugate gradient methods are encountered especially frequently in the elds of 

functional optimization and optimal control problems (Daniel, 1967b, 1971 Pagurek and 

Woodside, 1968 Nenonen and Pagurek, 1969 Roberts and Davis, 1969 Polyak, 1969 

Lasdon, 1970 Kelley and Speyer, 1970 Kelley and Myers, 1971 Speyer et al., 1971 

Kammerer and Nashed, 1972 Szego and Treccani, 1972 Polak, 1972 McCormick and 

Ritter, 1974). Variable metric strategies are also sometimes classi ed as conjugate gradient 

procedures, but more usually as quasi-Newton methods. For quadratic objective 

functions they generate the same sequence of points as the Fletcher-Reeves strategy and 

its modi cations (Myers, 1968 Huang, 1970). In the non-quadratic case, however, the 

search directions are di erent. With the variable metric, but not with conjugate directions, 

Newton directions are approximated. 

For many practical problems it is very di cult if not impossible to specify the partial 

derivatives as functions. The sensitivity of most conjugate gradient methods to imprecise


speci cation of the gradient directions makes it seem inadvisable to apply nite di erence 

methods to approximate the slopes of the objective function. This is taken into account 

by some procedures that attempt to construct conjugate directions without knowledge 

of the derivatives. The oldest of these was devised by Smith (1962). On the basis of 

numerical tests by Fletcher (1965), however, the version of Powell (1964) has proved to 

be better. It will be brie y presented here. It is arguable whether it should be counted 

as a gradient strategy. Its intermediate position between direct search methods that only 

use function values, and Newton methods that make use of second order properties of the 

objective function (if only implicitly), nevertheless makes it come close to this category. 

The strategy of conjugate directions is based on the observation that a line through the 

minimum of a quadratic objective function cuts all contours at the same angle. Powell's 

idea is then to construct such special directions by a sequence of line searches. The 

unit vectors are taken as initial directions for the rst n line searches. After these, a 

minimization is carried out in the direction of the overall result. Then the rst of the old 

direction vectors is eliminated, the indices of the remainder are reduced by one and the 

direction that was generated and used last is put in the place freed by the nth vector. As 

shown by Powell, after n cycles, each ofn +1 line searches, a set of conjugate directions 

is obtained provided the objective function is quadratic and the line searches are carried 

out exactly. 

Zangwill (1967) indicates how this scheme might fail. If no success is obtained in 

one of the search directions, i.e., the distance covered becomes zero, then the direction 

vectors are linearly dependent and no longer span the complete parameter space. The 

same phenomenon can be provoked by computational inaccuracy. To prevent this, Powell 

has modi ed the basic algorithm. First of all, he designs the scheme of exchanging 

directions to be more exible, actually by maximizing the determinant of the normalized 

direction vectors. It can be shown that, assuming a quadratic objective function, it is 

most favorable to eliminate the direction in which the largest distance was covered (see 

Dixon, 1972a). Powell would also sometimes leave the set of directions unchanged. This 

depends on how thevalue of the determinant would change under exchange of the search 

directions. The objective function is here tested at the position given by doubling the 

distance covered in the cycle just completed. Powell makes the termination of the search 

depend on all variables having changed by less than 0:1 " within an iteration, where " 

represents the required accuracy. Besides this rst convergence criterion, he o ers a second, 

stricter one, according to which the state reached at the end of the normal procedure 

is slightly displaced and the minimization repeated until the termination conditions are 

again ful lled. This is followed by a line search in the direction of the di erence vector 

between the last two endpoints. The optimization is only nally ended when the result 

agrees with those previously obtained to within the allowed deviation of 0:1 " for each 

component. 

The algorithm of Powell runs as follows: 


Specify a starting point x (0) 

and accuracy requirements "i > 0 for all i =1(1)n.


Step 1: (Specify rst set of directions) 

Set v (0) 

i = ei for all i = 1(1)n 

and set k =0. 

Step 2: (Start outer loop) 

Set x (k0) = x (k) and i =1. 

Step 3: (Line search) 

Determine x (ki) such that 

F (x (ki) )=minfF 

(x 

s (ki;1) + sv (k) 

i )g: 

Step 4: (Inner loop) 

If i


otherwise set x (0) = y (3) v (0) 

1 

v (0) 

i 

= v (k) 

i 

= y (3) ; y (1) 

for i = 2(1)n, k = 0, and go to step 2. 

Figure 3.9 illustrates a few iterations for a hypothetical two parameter function. Each 

of the rst loops consists of n +1 = 3 line searches and leads to the adoption of a new 

search direction. If the objective function had been of second order, the minimum would 

certainly have been found by the last line search of the second loop. In the third and 

fourth loops it has been assumed that the trial steps have led to a decision not to exchange 

directions, thus the old direction vectors, numbered v 3 and v 4 are retained. Further loops, 

e.g., according to step 9, are omitted. 

The quality of the line searches has a strong in uence on the construction of the 

conjugate directions. Powell uses a sequence of Lagrangian quadratic interpolations. It is 

terminated as soon as the required accuracy is reached. For the rst minimization within 

an iteration three points and Equation (3.16) are used. The argument values taken in 

direction vi are: x (the starting point), x + si vi, and either x +2si vi or x ; si vi, 

according to whether F (x + si vi)


i = @2 

@s 2(P (x + svi)) = ;2 (b ; c) Fa +(c ; a) Fb +(a ; b) Fc 

(b ; c)(c ; a)(a ; b) 

Powell uses this quantity i for all subsequent interpolations in the direction vi as a scale 

for the second partial derivative of the objective function. He scales the directions vi, 

which in his case are not normalized, by1= p i. This allows the possibility of subsequently 

carrying out a simpli ed interpolation with only two argument values, x and x + si vi. 

It is a worthwhile procedure, since each direction is used several times. The predicted 

minimum, assuming that the second partial derivatives have value unity, is then 

x 0 = x + 1 

2 si ; 1 

si 

[F (x + si vi) ; F (x)] vi 

For the trial step lengths si, Powell uses the empirical recursion formula 

s (k) 

q 

(k) (k;1) 

i =0:4 F (x ) ; F (x ) 

Because of the scaling, all the step lengths actually become the same. A more detailed 

justi cation can be found in Ho mann and Hofmann (1970). 

Contrary to most other optimization procedures, Powell's strategy is available as a 

precise algorithm in a tested code (Powell, 1970f). As Fletcher (1965) reports, this method 

of conjugate directions is superior for the case of a few variables both to the DSC method 

and to a strategy of Smith, especially in the neighborhood of minima. For manyvariables, 

however, the strict criterion for adopting a new direction more frequently causes the old 

set of directions to be retained and the procedure then converges slowly. A problem which 

had a singular Hessian matrix at the minimum made the DSC strategy look better. In 

a later article, Fletcher (1972a) de nes a limit of n = 10 to 20, above whichthePowell 

strategy should no longer be applied. This is con rmed by the test results presented 

in Chapter 6. Zangwill (1967) combines the basic idea of Powell with relaxation steps 

in order to avoid linear dependence of the search directions. Some results of Rhead 

(1971) lead to the conclusion that Powell's improved concept is superior to Zangwill's. 

Brent (1973) also presents a variant of the strategy without derivatives, derived from 

Powell's basic idea, which is designed to prevent the occurrence of linear dependence of 

the search directions without endangering the quadratic convergence. After every n +1 

iterations the set of directions is replaced by an orthogonal set of vectors. So as not to 

lose all the information, however, the unit vectors are not chosen. For quadratic objective 

functions the new directions remain conjugate to each other. This procedure requires 

O(n 3 ) computational operations to determine orthogonal eigenvectors. As, however, they 

are only performed every O(n 2 ) line searches, the extra cost is O(n) per function call and 

is thus of the same order as the cost of evaluating the objective function itself. Results of 

tests by Brent con rm the usefulness of the strategy. 

3.2.3 Newton Strategies 

Newton strategies exploit the fact that, if a function can be di erentiated any number of 

times, its value at the point x (k+1) can be represented by a series of terms constructed at


another point x (k) : 

where 

F (x (k+1) )=F (x (k) )+h T rF (x (k) )+ 1 

2 hT r 2 F (x (k) ) h + ::: (3.26) 

h = x (k+1) ; x (k) 

In this Taylor series, as it is called, all the terms of higher than second order are 

zero if F (x) is quadratic. Di erentiating Equation (3.26) with respect to h and setting 

the derivative equal to zero, one obtains a condition for the stationary points of a second 

order function: 

or 

rF (x (k+1) )=rF (x (k) )+r 2 F (x (k) )(x (k+1) ; x (k) )=0 

x (k+1) = x (k) ; [r 2 F (x (k) )] ;1 rF (x (k) ) (3.27) 

If F (x) is quadratic and r 2 F (x (0) )ispositive-de nite, Equation (3.27) yields the 

solution x (1) in a single step from any starting point x (0) without needing a line search. If 

Equation (3.27) is taken as the iteration rule in the general case it represents the extension 

of the Newton-Raphson method to functions of several variables (Householder, 1953). It 

is also sometimes called a second order gradient method with the choice of direction and 

step length (Crockett and Cherno , 1955) 

v (k) = ;[r 2 F (x (k) )] ;1 rF (x (k) ) 

s (k) = 1 (3.28) 

The real length of the iteration step is hidden in the non-normalized Newton direction 

v (k) . Since no explicit value of the objective function is required, but only its derivatives, 

the Newton-Raphson strategy is classi ed as an indirect or analytic optimization method. 

Its ability to predict the minimum of a quadratic function in a single calculation at rst 

sight looksvery attractive. This single step, however, requires a considerable e ort. Apart 

from the necessityofevaluating n rst and n(n 

+1) second partial derivatives, the Hessian 

2 

matrix r 2 F (x (k) )must be inverted. This corresponds to the problem of solving a system 

of linear equations 

r 2 F (x (k) ) 4 x (k) = ;rF (x (k) ) (3.29) 

for the unknown quantities 4x (k) . All the standard methods of linear algebra, e.g., Gaussian 

elimination (Brown and Dennis, 1968 Brown, 1969) and the matrix decomposition 

method of Cholesky (Wilkinson, 1965), need O(n 3 ) computational operations for this 

(see Schwarz, Rutishauser, and Stiefel, 1968). For the same cost, the strategies of conjugate 

directions and conjugate gradients can execute O(n) steps. Thus, in principle, the 

Newton-Raphson iteration o ers no advantage in the quadratic case. 

If the objective function is not quadratic, then 

v (0) does not in general point towards a minimum. The iteration rule (Equation 

(3.27)) must be applied repeatedly.


s (k) = 1 may lead to a point withaworse value of the objective function. The 

search diverges, e.g., when r 2 F (x (k) ) is not positive-de nite. 

It can happen that r 2 F (x (k) ) is singular or almost singular. The Hessian matrix 

cannot be inverted. 

Furthermore, it depends on the starting point x (0) whether a minimum, a maximum, 

or a saddle point is approached, or the whole iteration diverges. The strategy itself does 

not distinguish the stationary points with regard to type. 

If the method does converge, then the convergence is better than of linear order 

(Goldstein, 1965). Under certain, very strict conditions on the structure of the objective 

function and its derivatives even second order convergence can be achieved (e.g., Polak, 

1971) that is, the number of exact signi cant gures in the approximation to the minimum 

solution doubles from iteration to iteration. This phenomenon is exhibited in the solution 

of some test problems, particularly in the neighborhood of the desired extremum. 

All the variations of the basic procedure to be described are aimed at increasing the 

reliability of the Newton iteration, without sacri cing the high convergence rate. A distinction 

is made here between quasi-Newton strategies, which donotevaluate the Hessian 

matrix explicitly, andmodi ed Newton methods, for which rst and second derivatives 

must be provided at each point. The only strategy presently known which makes use of 

higher than second order properties of the objective function is due to Biggs (1971, 1973). 

The simplest modi cation of the Newton-Raphson scheme consists of determining the 

step length s (k) by a line search in the Newton direction v (k) (Equation (3.28)) until the 

relative optimum is reached (e.g., Dixon, 1972a): 

F (x (k) + s (k) v (k) ) = min 

s fF (x (k) + sv (k) )g (3.30) 

To save computational operations, the second partial derivatives can be redetermined 

less frequently and used for several iterations. Care must always be taken, however, that 

v (k) always points \downhill," i.e., the angle between v (k) and ;rF (x (k) ) is less than 

90 0 . The Hessian matrix must also be positive-de nite. If the eigenvalues of the matrix 

are calculated when it is inverted, their signs show whether this condition is ful lled. 

If a negative eigenvalue appears, Pearson (1969) suggests proceeding in the direction of 

the associated eigenvector until a point isreached with positive-de nite r 2 F (x). Greenstadt 

(1967a) simply replaces negative eigenvalues by their absolute value and vanishing 

eigenvalues by unity. Other proposals have been made to keep the Hessian matrix positivede 

nite by addition of a correction matrix (Goldfeld, Quandt, and Trotter, 1966, 1968 

Shanno, 1970a) or to include simple gradient steps in the iteration scheme (Dixon and 

Biggs, 1972). Further modi cations, which operate on the matrix inversion procedure 

itself, have beensuggestedby Goldstein and Price (1967), Fiacco and McCormick (1968), 

and Matthews and Davies (1971). A good survey has been given by Murray (1972b). 

Very few algorithms exist that determine the rst and second partial derivatives numerically 

from trial step operations (Whitley, 1962 see also Wasscher, 1963c Wegge, 

1966). The inevitable approximation errors too easily cancel out the advantages of the 

Newton directions.


3.2.3.1 DFP: Davidon-Fletcher-Powell Method 

(Quasi-Newton Strategy, Variable Metric Strategy) 

Much greater interest has been shown for a group of second order gradient methods that 

attempt to approximate the Hessian matrix and its inverse during the iterations only from 

rst order data. This now extensive class of quasi-Newton strategies has grown out of 

the work of Davidon (1959). Fletcher and Powell (1963) improved and translated it into 

a practical procedure. The Davidon-Fletcher-Powell or DFP method and some variants 

of it are also known as variable metric strategies. They are sometimes also regarded 

as conjugate gradient methods, because in the quadratic case they generate conjugate 

directions. For higher order objective functions this is no longer so. Whereas the variable 

metric concept is to approximate Newton directions, this is not the case for conjugate 

gradient methods. The basic recursion formula for the DFP method is 

with 


x (k+1) = x (k) + s (k) v (k) 

v (k) = ;H (k)T rF (x (k) ) 

H (0) = I 

H (k+1) = H (k) + A (k) 

The correction A (k) to the approximation for the inverse Hessian matrix, H (k) ,is 

derived from information collected during the last iteration thus from the change in the 

variable vector 

y (k) = x (k+1) ; x (k) = s (k) v (k) 

and the change in the gradient vector 

it is given by 

z (k) = rF (x (k+1) ) ;rF (x (k) ) 

A (k) = y(k) y (k)T 

y (k)T z (k) ; H (k) z (k) (H (k) z (k) ) T 

z (k)T H (k) z (k) 

(3.31) 

The step length s (k) is obtained by a line search along v (k) (Equation (3.30)). Since 

the rst partial derivatives are needed in any case they can be made use of in the one 

dimensional minimization. Fletcher and Powell do so in the context of a cubic Hermitian 

interpolation (see Sect. 3.1.2.3.4). A corresponding ALGOL program has been 

published by Wells (1965) (for corrections see Fletcher, 1966 Hamilton and Boothroyd, 

1969 House, 1971). The rst derivatives must be speci ed as functions, which is usually 

inconvenient and often impossible. The convergence properties of the DFP method have 

been thoroughly investigated, e.g., by Broyden (1970b,c), Adachi (1971), Polak (1971), 

and Powell (1971, 1972a,b,c). Numerous suggestions have thereby been made for improvements. 

Convergence is achieved if F (x) isconvex. Under stricter conditions it can 

be proved that the convergence rate is greater than linear and the sequence of iterations


converges quadratically, i.e., after a nite number (maximum n) of steps the minimum of 

a quadratic function is located. Myers (1968) and Huang (1970) show that, if the same 

starting point ischosen and the objective function is of second order, the DFP algorithm 

generates the same iteration points as the conjugate gradient method of Fletcher and 

Reeves. 

All these observations are based on the assumption that the computational operations, 

including the line searches, are carried out exactly. Then H (k) always remains positivede 

nite if H (0) was positive-de nite and the minimum search is stable, i.e., the objective 

function is improved at each iteration. Numerical tests (e.g., Pearson, 1969 Tabak, 

1969 Huang and Levy, 1970 Murtagh and Sargent, 1970 Himmelblau, 1972a,b), and 

theoretical considerations (Bard, 1968 Dixon, 1972b) show that rounding errors and 

especially inaccuracies in the one dimensional minimization frequently cause stability 

problems the matrix H (k) can easily lose its positive-de niteness without this being due 

to a singularity intheinverse Hessian matrix. The simplest remedy for a singular matrix 

H (k) , or one of reduced rank, is to forget from time to time all the experience stored within 

H (k) and to begin again with the unit matrix and simple gradient directions (Bard, 1968 

McCormick and Pearson, 1969). To do so certainly increases the number of necessary 

iterations, but in optimization as in other activities it is wise to put safety before speed. 

Stewart (1967) makes use of this procedure. His algorithm is of very great practical 

interest since he obtains the required information about the rst partial derivatives from 

function values alone by means of a cleverly constructed di erence scheme. 

3.2.3.2 Strategy of Stewart: 

Derivative-free Variable Metric Method 

Stewart (1967) focuses his attentiononchoosing the length of the trial step d (k) 

i 

approximation 

g (k) 

i ' Fx i x (k) = @F(x) 

@xi x (k) 

for the 

to the rst partial derivatives in such away astominimize the in uence of rounding 

errors on the actual iteration process. Two di erence schemes are available: 


g (k) 

i 

g (k) 

i = 1 

d (k) 

i 

= 

1 

2 d (k) 

i 

h F (x (k) + d (k) 

i ei) ; F (x (k) ) i 

h F (x (k) + d (k) 

i ei) ; F (x (k) ; d (k) 

i ei) i 

(forward di erence) 

(central di erence) (3.32) 

Application of the one sided (forward) di erence (Equation (3.32)) is preferred, since 

it only involves one extra function evaluation. To simplify the computation, Stewart 

introduces the vector h (k) ,whichcontains the diagonal elements of the matrix (H (k) ) ;1 

representing information about the curvature of the objective function in the coordinate 

directions. 

The algorithm for determining the g (k) 

i 

i = 1(1)n runs as follows:


Step 0: 

Step 1: 

Step 2: 

( 

Set =max"b 

"c 

jjx (k) 

i j 

F (x (k) ) 

jg (k;1) 

i 

) 

("b represents an estimate of the error in the calculation of F (x). Stewart sets 

"b =10 ;10 and "c =5 10 ;13 :) 

If g (k;1) 

i 

2 

de ne 0 i =2 

otherwise de ne 

0 

vu 

u 

i =2 3 

Set d 0 (k) 

and d (k) 

i 

If h(k;1) i 

h (k;1) 

i F (x (k) ) 

v uuut F (x (k) ) 

h (k;1) 

i 

t F (x(k) ) g (k;1) 

i 

0 

and i = 0 @1 i ; 

(h (k;1) 

i ) 2 and i = 0 i 

i = i sign(h (k;1) 

i 

= 

d (k) 

i 

2 g (k;1) 

i 

( 0 

(k) 

di if d 0 (k) 

i 

replace d (k) 

i 1 

h (k;1) 

i 

) sign(g (k;1) 

) 

6= 0 

d (k;1) 

i if d 0 (k) 

i =0: 

i 

0 

@1 ; 

3 0 i h (k;1) 

i 

3 0 i h (k;1) 

i 

10 ; , use Equation (3.32) otherwise 

; g (k;1) 

i 

and use Equation (3.33). (Stewart chooses =2.) 

Stewart's main algorithm takes the following form: 

+ 

0 

i h (k;1) 

i 

2 g (k;1) 

i 

+4 g (k;1) 

i 

+4 g (k;1) 

i 

1 

A : 

1 

A 

r 

(g (k;1) 

i ) 2 +2 10 F (x (k) ) h (k;1) 

i 


Choose an initial value x (0) , accuracy requirements "ai > 0i = 1(1)n, and 

initial step lengths d (0) 

for the gradient determination, e.g., 

d (0) 

i 

= 

8 

< 

: 0:05 x(0) 

i if x (0) 

i 

0:05 if x (0) 

i 

i 

6= 0 

=0: 

Calculate the vector g (0) from Equation (3.32) using the step lengths d (0) 

i : 

Set H (0) = Ih (0) 

i =1foralli = 1(1)n and k =0: 

Step 1: (Prepare for line search) 

Determine v (k) = ;H (k) g (k) : 

If k = 0, go to step 3. 

If g (k)T v (k) < 0, go to step 3. 

If h (k) 

i > 0 for all i = 1(1)n, go to step 3.


Step 2: (Forget second order information) 

Replace H (k) H (0) = I 

h (k) 

i h (0) 

i 

and v (k) ;g (k) : 

=1foralli = 1(1)n 

Step 3: (Line search and eventual break-o ) 

Determine x (k+1) such that 

F (x (k+1) ) = min 

s fF (x (k) + sv (k) )g: 

If F (x (k+1) ) F (x (k) ), end the search with result x (k) and F (x (k) ). 

Step 4: (Prepare for inverse Hessian update) 

Determine g (k+1) by the above di erence scheme. 

Construct y (k) = x (k+1) ; x (k) and z (k) = g (k+1) ; g (k) : 

If k>nand v (k) 

i


Brown and Dennis (1972) and Gill and Murray (1972)have suggested other schemes 

for obtaining the partial derivatives numerically from values of the objective function. 

Stewart himself reports tests that show the usefulness of his rules insofar as the results are 

completely comparable to others obtained with the help of analytically speci ed derivatives. 

This may be simply because rounding errors are in any case more signi cant here, 

due to the matrix operations, than for example in conjugate gradient methods. Kelley 

and Myers (1971), therefore, recommend carrying out the matrix operations with double 

precision. 

3.2.3.3 Further Extensions 

The ability of the quasi-Newton strategy of Davidon, Fletcher, and Powell (DFP) to 

construct Newton directions without needing explicit second partial derivatives makes 

it very attractive from a computational point of view. All e orts in the further rapid 

and intensive development of the concept have been directed to modifying the correction 

Equation (3.31) so as to reduce the tendency to instability because of rounding errors 

and inexact line searches while retaining as far as possible the quadratic convergence. 

There has been a spate of corresponding proposals and both theoretical and experimental 

investigations on the subject up to about 1973, for example: 

Adachi (1973a,b) 

Bass (1972) 

Broyden (1967, 1970a,b,c, 1972) 

Broyden, Dennis, and More (1973) 

Broyden and Johnson (1972) 

Davidon (1968, 1969) 

Dennis (1970) 

Dixon (1972a,b,c, 1973) 

Fiacco and McCormick (1968) 

Fletcher (1969a,b, 1970b, 1972b,d) 

Gill and Murray (1972) 

Goldfarb (1969, 1970) 

Goldstein and Price (1967) 

Greenstadt (1970) 

Hestenes (1969) 

Himmelblau (1972a,b) 

Hoshino (1971) 

Huang (1970, 1974) 

Huang and Chambliss (1973, 1974) 

Huang and Levy (1970) 

Jones (1973) 

Lootsma (1972a,b) 

Mamen and Mayne (1972) 

Matthews and Davies (1971) 

McCormick andPearson (1969)


McCormick and Ritter (1972, 1974) 

Murray (1972a,b) 

Murtagh (1970) 

Murtagh and Sargent (1970) 

Oi, Sayama, and Takamatsu (1973) 

Oren (1973) 

Ortega and Rheinboldt (1972) 

Pierson and Rajtora (1970) 

Powell (1969, 1970a,b,c,g, 1971, 1972a,b,c,d) 

Rauch (1973) 

Ribiere (1970) 

Sargent and Sebastian (1972, 1973) 

Shanno and Kettler (1970a,b) 

Spedicato (1973) 

Tabak (1969) 

Tokumaru, Adachi, and Goto (1970) 

Werner (1974) 

Wolfe (1967, 1969, 1971) 

Many of the di erently sophisticated strategies, e.g., the classes or families of similar 

methods de ned by Broyden (1970b,c) and Huang (1970), are theoretically equivalent. 

They generate the same conjugate directions v (k) and, with an exact line search, the same 

sequence x (k) of iteration points if F (x) is quadratic. Dixon (1972c) even proves this 

identity for more general objective functions under the condition that no term of the 

sequence H (k) is singular. 

The important nding that under certain assumptions convergence can also be achieved 

without line searches is attributed to Wolfe (1967). A recursion formula satisfying these 

conditions is as follows: 

H (k+1) = H (k) + B (k) 

where 

B (k) = (y(k) ; H (k) z (k) )(y (k) ; H (k) z (k) ) T 

(y (k) ; H (k) z (k) ) z (k)T 

(3.33) 

The formula was proposed independently by Broyden (1967), Davidon (1968, 1969), Pearson 

(1969), and Murtagh and Sargent (1970) (see Powell, 1970a). The correction matrix 

B (k) has rank one, while A (k) in Equation (3.31) is of rank two. Rank one methods, also 

called variance methods byDavidon, cannot guarantee that H (k) remains positive-de nite. 

It can also happen, even in the quadratic case, that H (k) becomes singular or B (k) increases 

without bound. Hence in order to make methods of this type useful in practice 

anumber of additional precautions must be taken (Powell, 1970a Murray, 1972c). The 

following compromise proposal 

H (k+1) = H (k) + A (k) + (k) B (k) 

(3.34) 

where the scalar parameter (k) > 0 can be freely chosen, is intended to exploit the advantages 

of both concepts while avoiding their disadvantages (e.g., Fletcher, 1970b). Broyden


(1970b,c), Shanno (1970a,b), and Shanno and Kettler (1970) give criteria for choosing suitable 

(k) .However, the mixed correction, also known as BFS or Broyden-Fletcher-Shanno 

formula, cannot guarantee quadratic convergence either unless line searches are carried 

out. It can be proved that there will merely be a monotonic decrease in the eigenvalues of 

the matrix H (k) .From numerical tests, however, it turns out that the increased number 

of iterations is usually more than compensated for by thesaving in function calls made 

by dropping the one dimensional optimizations (Fletcher, 1970a). Fielding (1970) has designed 

an ALGOL program following Broyden's work with line searches (Broyden, 1965). 

With regard to the number of function calls it is usually inferior to the DFP method but 

it sometimes also converges where the variable metric method fails. Dixon (1973) de nes 

a correction to the chosen directions, 

where 


v (k) = ;H (k) rF (x (k) )+w (k) 

w (0) =0 

w (k+1) = w (k) + (x(k+1) ; x (k) ) T rF (x (k+1) ) 

(x (k+1) ; x (k) ) T z (k) 

(x (k+1) ; x (k) ) 

by which, together with a matrix correction as given by Equation (3.35), quadratic convergence 

can be achieved without line searches. He shows that at most n +2 function 

calls and gradient calculations are required each time if, after arriving at v (k) = 0, an 

iteration 

x (k+1) = x (k) ; H (k) rF (x (k) ) 

is included. Nearly all the procedures de ned assume that at least the rst partial derivatives 

are speci ed as functions of the variables and are therefore exact to the signi cant 

gure accuracy of the computer used. The more costly matrix computations should 

wherever possible be executed with double precision in order to keep down the e ect of 

rounding errors. 

Just two more suggestions for derivative-free quasi-Newton methods will be mentioned 

here: those of Greenstadt (1972) and of Cullum (1972). While Cullum's algorithm, like 

Stewart's, approximates the gradient vector by function value di erences, Greenstadt attempts 

to get away from this. Analogously to Davidon's idea of approximating the Hessian 

matrix during the course of the iterations from knowledge of the gradients, Greenstadt 

proposes approximating the gradients by using information from objective function values 

over several subiterations. Only at the starting point must a di erence scheme for the rst 

partial derivatives be applied. Another interesting variable metric technique described by 

Elliott and Sworder (1969a,b, 1970) combines the concept of the stochastic approximation 

for the sequence of step lengths with the direction algorithms of the quasi-Newton 

strategy. 

Quasi-Newton strategies of degree one are especially suitable if the objective function 

is a sum of squares (Bard, 1970). Problems of minimizing a sum of squares arise 

for example from the problem of solving systems of simultaneous, non-linear equations,


or determining the parameters of a mathematical model from experimental data (nonlinear 

regression and curve tting). Such objective functions are easier to handle because 

Newton directions can be constructed straight away without second partial derivatives, 

as long as the Jacobian matrix of rst derivatives of each term of the objective function 

is given. The oldest iteration procedure constructed on this basis is variously known as 

the Gauss-Newton (Gauss, 1809) method, generalized least squares method, or Taylor 

series method. It has all the advantages and disadvantages of the Newton-Raphson strategy. 

Improvements on the basic procedure are given by Levenberg (1944) and Marquardt 

(1963). Wolfe's secant method (Wolfe, 1959b see also Jeeves, 1958) is the forerunner of 

many variants which do not require the Jacobian matrix to be speci ed at the start but 

construct it in the course of the iterations. Further details will not be described here the 

reader is referred to the specialist literature, again up to 1973: 

Barnes, J.G.P. (1965) 

Bauer, F.L. (1965) 

Beale (1970) 

Brown and Dennis (1972) 

Broyden (1965, 1969, 1971) 

Davies and Whitting (1972) 

Dennis (1971, 1972) 

Fletcher (1968, 1971) 

Golub (1965) 

Jarratt (1970) 

Jones (1970) 

Kowalik and Osborne (1968) 

Morrison (1968) 

Ortega and Rheinboldt (1970) 

Osborne (1972) 

Peckham (1970) 

Powell (1965, 1966, 1968b, 1970d,e, 1972a) 

Powell and MacDonald (1972) 

Rabinowitz (1970) 

Ross (1971) 

Smith and Shanno (1971) 

Spath (1967) (see also Silverman, 1969) 

Stewart (1973) 

Vitale and Taylor (1968) 

Zeleznik (1968) 

Brent (1973) gives further references. Peckham's strategy is perhaps of particular 

interest. It represents a modi cation of the simplex method of Nelder and Mead (1965) 

and Spendley (1969) and in tests it proves to be superior to Powell's strategy (1965) 

with regard to the number of function calls. It should be mentioned here at least that 

non-linear regression, where parameters of a model that enter the model in a non-linear 

way (e.g., as exponents) have to be estimated, in general requires a global optimization


method because the squared sum of residuals de nes a multimodal function. 

Reference has been made to a number of publications in this and the preceding chapter 

in which strategies are described that can hardly be called genuine hill climbing methods 

they would fall more naturally under the headings of mathematical programming or 

functional optimization. It was not, however, the intention to give anintroduction to the 

basic principles of these two very wide subjects. The interested reader will easily nd out 

that although a nearly exponentially increasing number of new books and journals have 

become available during the last three decades, she or he will look in vain for new direct 

search strategies in that realm. Such methods form the core of this book.

86 Hill climbing Strategies

Chapter 4 

Random Strategies 

One group of optimization methods has been completely ignored in Chapter 3: methods in 

which the parameters are varied according to probabilistic instead of deterministic rules 

even the methods of stochastic approximation are deterministic. As indicated by the title 

there is not one random strategy but many, someofwhich di er considerably from each 

other. 

It is common to resort to random decisions in optimization whenever deterministic 

rules do not have the desired success, or lead to a dead end on the other hand random 

strategies are often supposed to be essentially more costly. The opinion is widely held that 

with careful thought leading to cleverly constructed deterministic rules, better results can 

always be achieved than with decisions that are in some way made randomly. The strategies 

that follow should show that randomness is not, however, the same as arbitrariness, 

but can also be made to obey very re ned rules. Sometimes only this kind of method 

solves a problem e ectively. 

Profound considerations do not underlie all the procedures used in hill climbing strategies. 

The cyclic choice of coordinate directions in the Gauss-Seidel strategy could just as 

well be replaced by a random sequence. One can also consider increasing the number of 

directions used. Since there is no good reason for preferring to search for the optimum 

along directions parallel to the axes, one could also use, instead of only n di erent unit 

vectors, any numberofrandomlychosen direction vectors. In fact, suggestions along 

these lines have been made (Brooks, 1958) in order to avoid a premature termination of 

the minimum search innarrow oblique valleys (compare Chap. 3, Sect. 3.2.1.1). Similar 

concepts have been developed for example by O'Hagan and Moler (after Wilde and 

Beightler, 1967), Emery and O'Hagan (1966), Lawrence and Steiglitz (1972), and Beltrami 

and Indusi (1972), to improve the pattern search ofHookeandJeeves (1961, see 

Chap. 3, Sect. 3.2.1.2). The limitation to a nite number of search directions is not only 

a disadvantage in narrow oblique valleys but also at the border of the feasible region 

as determined by inequality constraints. All the deterministic remedies against prematurely 

ending the iteration sequence assume that more information can be gathered, for 

example in the form of partial derivatives of the constraint functions (see Klingman and 

Himmelblau, 1964 Glass and Cooper, 1965 Paviani and Himmelblau, 1969). Providing 

this information usually means a high extra cost and is sometimes not possible at all. 

87

88 Random Strategies 

Random directions that are not oriented with respect to the structure of the objective 

function and the allowed region also imply a higher cost because they do not take optimal 

single steps. They can, however, be applied in every case. 

Many deterministic optimization methods, especially those which are guided by the 

gradient of the objective function, haveconvergence di culties at points where the partial 

derivatives are discontinuous. On the contour diagram of a two parameter objective 

function, of which the maximum is sought, such positions correspond to sharp ridges 

leading to the summit (e.g., Zwart, 1970). Anarrowvalley{the geometric picture in 

the case of minimization{leads to the same problem if the nite step lengths are greater 

than its width. Then all attempts fail to make improvements in the coordinate directions 

or, from trial steps in these directions, fail to predict a locally best direction in which 

to continue (gradient direction). The same phenomenon can also occur when the partial 

derivatives are speci ed analytically, because of the rounding errors involved in computing 

with a nite number of signi cant gures. To avoid premature termination of a search in 

such cases, Norkin (1961) has suggested the following procedure. When the optimization 

according to the conventional scheme has ended, a step is taken away from the supposed 

optimum in an arbitrary coordinate direction. The extremum is sought again, excluding 

this one variable, and the search is only nally ended when deviations in all directions 

have led back to the same point. This rule should also prevent stagnation at saddle points. 

Even the simplex method of linear programming makes random decisions if the search 

for the extremum threatens to be endless because the problem is degenerate. Then following 

Dantzig's suggestion (1966) the iteration scheme should be interrupted in favor of a 

random exchange step. A problem is only degenerate, however, because the general rules 

do not cover the special case (see also Chap. 6, Sect. 6.2). A further example of resorting 

to chance when a dead end has been reached is Brent's modi cation of the strategy with 

conjugate directions (Brent, 1973). Powell's algorithm (Powell, 1964) when applied to 

problems in many dimensions tends to generate linearly dependent directions and then 

to proceed within a subspace of IR n . For this reason Brent now and then interrupts the 

line searches with steps in randomly chosen directions (see also Chap. 3, Sect. 3.2.2.1). 

One very frequently comes across proposals to let chance take control when the problem 

is to nd global minimaofmultimodal objective functions. Such problems frequently 

crop up in process design (Motskus, 1967 Mockus, 1971) but can also be the result of recasting 

discrete problems into continuous form (Katkovnik and Shimelevich, 1972). Practically 

all sequential search procedures can only lead to a local optimum{as a rule, the one 

nearest to the starting point. There are a few proposals for ensuring global convergence of 

sequential optimization methods (e.g., Motskus and Feldbaum, 1963 Chichinadze, 1967, 

1969 Goldstein and Price, 1971 Ueing, 1971, 1972 Branin and Hoo, 1972 McCormick, 

1972 Sutti, Trabattoni, and Brughiera, 1972 Treccani, Trabattoni, and Szego, 1972 

Brent, 1973 Hesse, 1973 Opacic, 1973 Ritter and Tui as mentioned by Zwart, 1973). 

They are often in the form of additional, heuristic rules. Gran (1973), for example, considers 

gradient methods that are supposed to achieve globalconvergence by the addition 

of a random process to the deterministic changes. Hill (1964 see also Hill and Gibson, 

1965) suggests subdividing the interval to be explored and gathering su cient information 

in each section to carry out a cubic interpolation. The best of the results for the

Random Strategies 89 

parts is taken as an approximation to the global optimum. However, for n-dimensional 

interpolations the cost increases rapidly with n thisscheme thus looks impractical for 

more than two variables. To work with several, randomly chosen starting points and to 

compare each of the local minima (or maxima) obtained is usually regarded as the only 

course of action for determining the global optimum with at least a certain probability 

(so-called multistart techniques). Proposals along these lines have beenmadeby, among 

others, Gelfand and Tsetlin (1961), Bromberg (1962), Bocharov andFeldbaum (1962), 

Zellnik, Sondak, and Davis (1962), Krasovskii (1962), Gurin and Lobac (1963), Flood and 

Leon (1964, 1966), Kwakernaak (1965), Casey and Rustay (1966), Weisman and Wood 

(1966), Pugh (1966), McGhee (1967), Crippen and Scheraga (1971), and Brent (1973). 

A further problem faces deterministic strategies if the calculated or measured values 

of the objective function are subject to stochastic perturbations. In the experimental 

eld, for example in the on-line optimum search, or for control of the optimal conditions 

in processes, perturbations must be taken into account from the start (e.g., Tovstucha, 

1960 Feldbaum, 1960, 1962 Krasovskii, 1963 Medvedev, 1963, 1968 Kwakernaak, 1966 

Zypkin, 1967). However, in computational optimization too, where the objective function 

is analytically speci ed, a similar e ect arises because of rounding errors (Brent, 1973), 

especially if one uses hybrid analogue computers for solving functional optimization problems 

(e.g., Gilbert, 1967 Korn and Korn, 1964 Bekey and Karplus, 1971). A simple, if 

expensive (in the sense of cost in computations or trials) method of dealing with this is the 

repetition of measurements until a de nite conclusion is possible. This is the procedure 

adopted by Box and Wilson (1951) in the experimental gradient method, and by Box 

(1957) in his EVOP strategy. Instead of a xed number of repetitions, which while on 

the safe side may be unnecessarily high, one can follow the concept of sequential analysis 

of statistical data (Wald, 1966 see also Zigangirov, 1965 Schumer, 1969 Kivelidi and 

Khurgin, 1970 Langguth, 1972), which istomake only as many trials as the trial results 

seem to make absolutely necessary. More detailed investigations on this subject havebeen 

made, for example, by Mlynski (1964a,b, 1966a,b). 

As opposed to attempting to improve the decisive data, Brooks and Mickey (1961) 

have found that one should work with the minimum number of n + 1 comparison points 

in order to determine a gradient direction, even if this is a perturbed one. One must 

however depart from the requirement thateach step should yield a success, or even the 

locally greatest success. The motto that following locally the best possible route seldom 

leads to the best overall result is true not only for rst order gradient strategies but also for 

Newton and quasi-Newton methods . Harkins (1964), for example, maintains that inexact 

line searches not only do not worsen the convergence of a minimization procedure but in 

some cases actually improve it. Similar experiences led Davies, Swann, and Campey in 

their strategy (see Chap. 3, Sect. 3.2.1.4) to make only one quadratic interpolation in 

each direction. Also Spendley, Hext, and Himsworth (1962), in the formulation of their 

simplex method, which generates only near-optimal directions, work on the assumption 

that random decisions are not necessarily a total disadvantage (see also Himsworth, 1962). 

Based on similar arguments, the modi cation of this strategy by M.J.Box (1965) sets 

up the initial simplex or complex by means of random numbers. Imamura et al. (1970) 

even go so far as to superimpose arti cial stochastic variations on an objective function


in order to prevent convergence to inferior local optima. 

The rigidity of an algorithm based on a xed internal model of the objective function, 

with which the information gathered during the iterations is interpreted, is advantageous 

if the objective function corresponds closely enough to the model. If this is not the case, 

the advantage disappears and may even turn into a disadvantage. Second order methods 

with quadratic models seem more sensitive in this respect than rst order methods with 

only linear models. Even more robust are the direct search strategies that work without 

an explicit model, such as the strategy of Hooke and Jeeves (1961). It makes no use of 

the sizes of the changes in the objective functionvalues, but only of their signs. 

A method that uses a kind of minimal model of the objective function is the stochastic 

approximation (Schmetterer, 1961 see also Chap. 2, Sect. 2.3). This purely deterministic 

method assumes that the measured or calculated function values are samples of a normally 

distributed random quantity, of which the expectation value is to be minimized or 

maximized. The method feels its way to the optimum with alternating exploratory and 

work steps, whose lengths form convergent series with prescribed bounds and sums. In 

the multidimensional case this standard concept can be the basis of various strategies for 

choosing the directions of the work steps (Fabian, 1968). Usually gradient methods show 

themselves to best advantage here. The stochastic approximation itself is very versatile. 

Constraintscanbetaken into account (Kaplinskii and Propoi, 1970), and problems of 

functional optimization can be treated (Gersht and Kaplinskii, 1971) as well as dynamic 

problems of maintaining or seeking optima (Chang, 1968). Tsypkin (1968a,b,c, 1970a,b 

see also Zypkin, 1966, 1967, 1970) discusses these topics very thoroughly. There are also, 

however, arguments against the reliability of convergence for certain types of objective 

function (Aizerman, Braverman and Rozonoer, 1965). The usefulness of the strategy in 

the multidimensional case is limited by its high cost. Hence there has been no shortage 

of attempts to accelerate the convergence (Fabian, 1967 Berlin, 1969 Saridis, 1968, 

1970 Saridis and Gilbert, 1970 Janac, 1971 Kwatny, 1972 see also Chap. 2, Sect. 2.3). 

Ideas for using random directions look especially promising some of the many investigations 

of this topic which have been published are Loginov (1966), Stratonovich (1968, 

1970), Schmitt (1969), Ermoliev (1970), Svechinskii (1971), Tsypkin (1971), Antonov and 

Katkovnik (1972), Berlin (1972), Katkovnik and Kulchitskii (1972), Kulchitskii (1972), 

Poznyak (1972), and Tsypkin and Poznyak (1972). 

The original method is not able to determine global extrema reliably. Extensions of 

the strategy in this direction are due to Kushner (1963, 1972) and Vaysbord and Yudin 

(1968). The sequence of work steps is so designed that the probability of the following 

state being the global optimum is maximized. In contrast to the gradient concept, the 

information gathered is not interpreted in terms of local but of global properties of the 

objective function. In the case of two local minima, the e ort of the search is gradually 

concentrated in their neighborhood and only when one of them is signi cantly better is 

the other abandoned in favor of the one that is also a global minimum. In terms of the 

cost of the strategy, the acceleration of the local search and the reliability of the global 

search are diametrically opposed. Hill and Gibson (1965) show that their global strategy 

is superior to Kushner's, as well as to one of Bocharov and Feldbaum. However, they only 

treat cases with n 2 parameters. More recent research results have been presented by


Pardalos and Rosen (1987), Torn and Zilinskas (1989), Floudas and Pardalos (1990), 

Zhigljavsky (1991), and Rudolph (1991, 1992b). Now thereareeven specialized journals 

established in the eld, see Horst (1991). 

All the strategies mentioned so far are fundamentally deterministic. They only resort 

to chance in dead-end situations, or they operate on the assumption that the objective 

function is stochastically perturbed. Jarvis (1968), who compares deterministic and 

probabilistic optimization methods, nds that random methods that do not stick toany 

particular model are most suitable when an optimum must be located under particularly 

di cult conditions, such as a perturbed objective function or a \pathological" problem 

structure with several extrema, discontinuities, plateaus, forbidden regions, etc. The 

homeostat of Ashby (1960) is probably the oldest example of the application of a random 

strategy. Its objective istomaintain a condition of equilibrium against stochastic 

disturbances. It may happen that no optimum is sought, but only a point in an allowed 

region (today one calls such taskaconstraints satisfaction problem or CSP). Nevertheless, 

corresponding solution methods are closely tied to optimization, and there are a series of 

various heuristic planning methods available (e.g., Weinberg and Zehnder, 1969). Ashby's 

strategy, which he calls a blind homeostatic process, becomes activewhenever the apparatus 

strays from equilibrium. Then the controllable parameters are randomly varied until 

the desired condition is restored. The nite number (in this case) of discrete settings of 

the variables all enter the search process with equal probability. Chichinadze (1960) later 

constructed an electronic model on the same principle and used it for synthesizing simple 

optimal control systems. 

Brooks (1958), probably stimulated by R. L. Anderson (1953), is generally regarded as 

the initiator of the use of random strategies for optimization problems. He describes the 

simple, later also called blind or pure random search for nding a minimum or maximum 

in the experimental eld. In a closed interval a x b several points are chosen at 

random. The probability density w(x) is constant everywhere within the region and zero 

outside. 

( 

1=V for all a x b 

w(x) = 

0 otherwise 

V , the volume of the cube with corners ai and bi for i =1(1)n, is given by 

V = 

nY 

i=1 

(bi ; ai) 

The value of the objective function must be determined at all selected points. The point 

that has the lowest or highest function value is taken as optimum. How well the true 

extremum is approximated depends on the numberoftrialsaswell as on the actual 

random results. Thus one can only give a probability p that the optimum will be found 

within a given number N of trials with a prescribed accuracy. 

p =1; (1 ; v=V ) N 

(4.1) 

The volume v < V < 1 contains all points that satisfy the accuracy requirement. By


rearranging Equation (4.1), the number of trials is obtained 

N = 

ln (1 ; p) 

ln (1 ; v 

V ) 

(4.2) 

that is required in order to place with probability p at least one trial in the volume v. 

Brooks concludes from this that the cost is independent ofthenumber of variables. In 

their criticism Hooke and Jeeves (1958) point out that it is not feasible to consider the 

accuracy in terms of the volume ratio for problems with many variables. For n = 100 

parameters, a volume ratio of v 

=0:1 corresponds to a length ratio of the side length 

V 

D of V and d of v of 

s 

d n v 

= ' 0:98 

D V 

This means that the uncertainty in the variables xi is 98% of the original interval [aibi], 

although the volume containing the optimum has been reduced to one tenth of the original. 

Shimizu (1969) makes the same mistake as Brooks and attempts to implement the strategy 

for problems with more general constraints. 

A comparison of the pure random search and deterministic search methods known at 

the time for experimental optimization problems (Brooks, 1959) also shows no advantage 

of the stochastic strategy. The test only covers four di erent objective functions, each 

with two variables. Brooks then recommends applying his random method if the number 

of parameters is large or if the determination of objective function values is subject to 

large perturbations. McArthur (1961) concludes on the basis of numerical experiments 

that the random strategy is also preferable for complicated problem structures. Just this 

circumstance has led to the use, even today, of the pure random search, often called 

the Monte-Carlo method, for example in computer optimization of building construction 

(Golinski and Lesniak, 1966 Lesniak, 1970 Hupfer, 1970). 

In principle, all the trials of the simple random strategy can be made simultaneously. 

It is thus numbered among the simultaneous optimization methods. The decision to 

choose a particular state vector of variables does not depend on the results of preceding 

trials, since the probability of scoring according to the uniform distribution is the same at 

all times. However, in applications on the traditional, serially operating computers, the 

trials must be made sequentially. This can be used to advantage by storing the current 

best value of the objective function and its associated variable value. In Chapter 3, 

Section 3.1.1 and 3.2 the grid or tabulation method was referred to as optimal in the 

minimax sense. The blind random strategy should thus not be any better. De ning the 

interval length Di = bi ; ai for the variable xi, with required accuracy di, and assuming 

that all the Di = D and di = d for i =1(1)n, then for the volume ratio in Equations 

(4.1) and (4.2) 

v 

V = 

If v 

is small, which when there are many variables must be the case, one can use the 

V 

approximation 

ln (1 + y) ' y for y 1 

d 

D 

! n


to write the number of required trials as 

Assuming that D 

d 

N ';ln(1 ; p) D 

d 

is an integer, the grid method requires 

N = D 

d 

trials (compare Chap. 3, Sect. 3.2, Equation (3.19)). The value is the same for both 

procedures if p ' 0:63. Supposing that the probability of at least one score of the 

required accuracy is p =0:90, then the random strategy results in 

N ' 2:3 D 

d 

which is clearly worse than the grid strategy (Spang, 1962). The reason for the extra 

cost, however, should not be attributed to the randomness of decisions itself, but to the 

fact that for an equiprobable, continuous selection of variables, the trials can be very 

close together or, in the discrete case, they can repeat themselves. If one can avoid that, 

the disadvantage would no longer exist. A randomized sequence of trials even might hit 

upon the optimal result earlier than an ordered one. Nevertheless Spang's proof has for 

some time brought all random methods, not only the simple Monte-Carlo strategy, into 

disrepute. 

Nowadays the term Monte-Carlo methods is understood to cover, in general, simulation 

methods that have to do with stochastic events. They are applied e ectively to 

solving di cult di erential equations (Little, 1966) or for evaluating integrals (Cowdrey 

and Reeves, 1963 McGhee and Walford, 1968). Besides the simple hit-or-miss scheme, 

however, greatly improved variants have been developed (e.g., W. F. Bauer, 1958 Hammersley 

and Handscomb, 1964 Korn, 1966, 1968 Hull, 1967 Brandl, 1969). Amann 

(1968a,b) reports a Monte-Carlo method with information storage and a sequential extension 

for the solution of a linear boundary value problem, and Curtiss (1956) describes 

a Monte-Carlo procedure for solving systems of linear equations. Both are supposed to be 

less costly than comparable deterministic strategies. Pinkham (1964) and Pincus (1970) 

describe modi cations for the problems of nding zeros of a non-linear function and of constrained 

optimization. Since only relatively few publications treat random optimization 

methods in any depth (Karnopp, 1961, 1963 Idelsohn, 1964 Dickinson, 1964 Rastrigin, 

1963, 1965a,b, 1966, 1967, 1968, 1969, 1972 Lavi and Vogl, 1966 Schumer, 1967 Jarvis, 

1968 Heydt, 1970 Cockrell, 1970 White, 1970, 1971 Aoki, 1971 Kregting and White, 

1971), the improved strategies will be brie y presented here. They all operate with sequential 

and sometimes bothsimultaneous and sequential random trials and in one way 

or another exploit the information from preceding trials to accelerate the convergence. 

Brooks himself already suggests several improvements. Thus to exclude repetitions or 

closely situated trials, the volume to be investigated can be subdivided into, for example, 

cubic subspaces, into each of which only one random trial is placed. According to one's 

n 

n 

n


knowledge of the approximate position of the optimum, the subspaces will be assigned 

di erent sizes (Idelsohn, 1964). The original uniform distribution is thereby replaced by 

one with a greater density in the neighborhood of the expected optimum. Karnopp (1961, 

1963, 1966) has treated this problem in detail without, however, giving any practical 

procedure. Mathematically based investigations of the same topic are due to Motskus 

(1965), Hupfer (1970), Pluznikov, Andreyev, and Klimenko (1971), Yudin (1965, 1966, 

1972), Vaysbord (1967, 1968, 1969), Taran (1968a,b), Karumidze (1969), and Meerkov 

(1972). If after several (simultaneous) samples the search is continued in an especially 

promising looking subregion, the procedure becomes sequential in character. Suggestions 

of this kind have been made for example by McArthur (1961), Motskus (1965), and 

Hupfer (1970) (shrinkage random search). Zakharov (1969, 1970) applies the stochastic 

approximation for the successive shrinkage of the region in which Monte-Carlo samples 

are placed. The most thoroughly worked out strategy is that of McMurtry and Fu (1966, 

probabilistic automaton see also McMurtry, 1965). The problem considered is to adjust 

the variable parameters of a control system for a dynamic process in such away that the 

optimum of the system is found and maintained despite perturbations and (slow) drift 

(Hill, McMurtry, and Fu, 1964 Hill and Fu, 1965). Initially the probabilities are equal 

for all subregions, at the center of which the function values are measured (assumed to be 

stochastically perturbed). In the course of the iterations the probability matrix is altered 

so that regions with better objective function values are tested more often than others. 

The search ends when only one subregion remains: the one with the highest probability 

of containing the global optimum. McMurtry and Fu use a so-called linear intensi cation 

to adjust the probability matrix. Suggestions for further improving the convergence rate 

have been made by Nikolic and Fu (1966), Fu and Nikolic (1966), Shapiro and Narendra 

(1969), Asai and Kitajima (1972), Viswanathan and Narendra (1972), and Witten (1972). 

Strongin (1970, 1971) treats the same problem from the point of view of decision theory. 

All these methods lay great emphasis on the reliability of global convergence. The 

quality of the approximation depends to a large extent on the number of subdivisions 

of the n-dimensional region under investigation. High accuracy requirements cannot be 

met for many variables since, at least initially, the number of subregions to investigate 

rises exponentially with the number of parameters. To improve the local convergence 

properties, there are suggestions for replacing the midpoint tests in a subvolume by the 

result of an extreme value search. This could be done with one of the familiar search 

strategies such as a gradient method (Hill, 1969) or any other purely sequential random 

search method (Jarvis 1968, 1970) with a high convergence rate, even if it were only 

guaranteed to converge locally. Application, however, is limited to problems with at most 

seven or eight variables, as reported. 

Another possibility for giving a sequential character to random methods consists of 

gradually shifting the expectation value of a random variable with a restricted probability 

density distribution. Brooks (1958) calls his proposal of this type the creeping random 

search. Suitable random numbers are provided for example by a Gaussian distribution 

with expectation value and standard deviation . Starting from a chosen initial condition 

x (0) , several simultaneous trials are made, which most likely fall in the neighborhood of the 

starting point ( = x (0) ). The coordinates of the point with the best function value form


the expectation value for the next set of random trials. In contrast to other procedures, the 

data from the other trials are not exploited to construct a linear or even quadratic model 

from which to calculate a best possible step (e.g., Brooks and Mickey, 1961 Aleksandrov, 

Sysoyev, and Shemeneva, 1968 Pugachev, 1970). For small and a large number of 

samples, the best value will in any case fall in the locally most favorable direction. In 

order to approach a solution with high accuracy, the variance 2 must be successively 

reduced. Brooks, however, gives no practical rule for this adjustment. Many algorithms 

have since been published that are extensions of Brooks' basic concept of the creeping 

random search. Most of them no longer choose the best of several trials they accept each 

improvement and reject each worsening (Favreau and Franks, 1958 Munson and Rubin, 

1959 Wheeling, 1960). 

The iteration rule of a creeping random search is, for the minimum search: 

x (k+1) = 

( x (k) + z (k) if F (x (k) + z (k) ) F (x (k) ) (success) 

x (k) otherwise (failure) 

The random vector z (k) ,which in this notation e ects the change in the state vector x, 

belongs to an n-dimensional (0 2 ) normal distribution with the expectation value =0 

and the variance 2 , which in the simplest case is the same for all components. One can 

thus regard , or better p n, as a kind of average step length. The direction of z (k) is uniformly 

distributed in IR n , i.e., purely random. Gaussian distributions for the increments 

are also used by Bekey et al. (1966), Stewart, Kavanaugh, and Brocker (1967), and De 

Graag (1970). Gonzalez (1970) and White (1970) use instead of a normal distribution a 

uniform distribution that covers a small region in the form of an n-dimensional cube centered 

on the starting point. This clearly favors the diagonal directions, in which the total 

step lengths are on average a factor p n greater than in the coordinate directions. Pierre 

(1969) therefore restricts the uniformly distributed random probe to an n-dimensional 

hypersphere of xed radius. Rastrigin (1960{1972) gives the total step length 

s = 

vu 

u 

t nX 

a xed value. Instead of the normal distribution he thus obtains a circumferential or 

hypersphere-surface distribution. In addition, he repeats the evaluation of the objective 

function when there is a failure in order to reduce the e ect of stochastic perturbations. 

Taking two model functions 

F1(x) = F1(x1:::xn) = 

F2(x) = F2(x1:::xn) = 

i=1 

nX 

xi 

i=1 

vu 

u 

t nX 

i=1 

z 2 i 

x 2 i 

(inclined plane) 

(hypercone) 

he investigates the average convergence rate of his strategy and compares it with that 

of an experimental gradient method, in which the partial derivatives are approximated 

by quotients of di erences obtained from exploratory steps . He shows that for a linear


problem structure like F1 the random strategy needs only O( p n) trials, whereas the 

gradient strategy needs O(n) trials to cover a prescribed distance. For n>3, the random 

strategy is always superior to the deterministic method. Whereas Rastrigin shows that the 

random search always does better than the gradient searchin the spherically symmetric 

eld F2, Movshovich (1966) maintains the opposite. The discrepancy can be traced to 

di ering assumptions about the choice of step length (see also Yvon, 1972 Gaviano and 

Fagiuoli, 1972). 

To choose suitable step lengths or variances poses the same problems for sequential 

random searches as are familiar from deterministic strategies. Here too, a closely related 

problem is to achieve global convergence with reference to a suitable termination rule, the 

convergence criterion, and with a degree of reliability. Khovanov (1967) has conceived 

an individual manner of controlling the random step lengths. He accepts every random 

change, irrespective of success or failure, increases the variance at each failure and reduces 

it otherwise. The objective is to increase the probability of lingering in the more promising 

regions and to abandon states that are irrelevant to the optimum search. No applications 

of the strategy are known to the author. Favreau and Franks (1958), Bekey et al. (1966), 

and Adams and Lew (1966) use a constant ratio between i and xi for i =1(1)n. This 

measure does have the e ect of continuously altering the \step lengths," but its merit is 

not obvious. Just because a variable value xi is small in no way indicates that it is near to 

the extreme position being sought. Karnopp (1961) was the rst to propose a step length 

rule based on the number of successes or failures, according to which the i or s are all 

uniformly reduced or enlarged such that a success always occurs after two or three trials. 

Schumer (1967), and Schumer and Steiglitz (1968), submit Rastrigin's circumferential 

random direction method to a thorough examination by probability theory. For the 

model 

F3(x) = 

nX 

i=1 

x 2 

i = r 2 

with the condition n 1 and the continuously optimal step length 

s ' 1:225 r 

p n 

they obtain a rate of progress ', which istheaverage distance covered in the direction of 

the objective (minimum) per random step: 

' ' 0:203 r 

n 

and a success rate ws which istheaverage number of successes per trial: 

ws ' 0:270 

They are only able to treat the general quadratic case theoretically for n = 2. Their 

result can be interpreted in the sense that ' is dependent on the smallest radius of 

curvature of the elliptic contour passing through r. Since neither r nor s can be assumed 

to be known in advance, it is not clear how tokeep to the optimal step length. Schumer 

and Steiglitz (1968) give an adaptive method with which the correct size of s can be


maintained at least approximately during the course of the iterations. At the starting 

point x (0) two random changes are made with step lengths s (0) and s (0) (1 + a), where 

0 < a 1. If both samples are successful, for the next iteration s (1) = s (0) (1 + a) is 

taken, i.e., the greater value. If only one sample yields an improvement in the objective 

function, its step length is taken nally if no success is scored, s (1) remains equal to s (0) . 

A reduction in s is only made if several consecutive trials are unsuccessful. This is 

also the procedure of Maybach (1966). This adjustment to the local conditions assists the 

strategy in achieving high convergence rates but reduces the chances of locating global 

optima among several local ones. For this reason a sample with a signi cantly larger step 

length (a > 1) should be included from time to time. Numerical tests show that the 

computation cost, or number of trials, actually only increases linearly with the number 

of variables. Schumer and Steiglitz have tested this using the model functions F3 and 

F4(x) = 

A comparison with a Newton-Raphson strategy, inwhich the partial rst and second 

derivatives are determined numerically and the cost increases as O(n 2 ), favors the random 

method when n>78 for F3 and when n>2 for F4. For the second, biquadratic model 

function, Nelder and Mead (1965) state that the numberoftrialsorfunctionevaluations 

in their simplex strategy grows as O(n 2:11 ), so that the sequential random method is 

superior from n>10. White and Day (1971) report numerical tests in which the cost in 

iterations with Schumer's strategy increases more sharply than linearly with n, whereas 

a modi cation by White (1970) shows exact linear dependence. A comparison with the 

strategy of Fletcher and Powell (1963) favors the latter, especially for truly quadratic 

functions. 

Rechenberg (1973), with an n-dimensional normal distribution (see Chap. 5, Sect. 5.1), 

reaches almost the same theoretical results as Schumer for the circumferential distribution, 

if one notes that the overall step length 

tot = 

vu 

u 

t nX 

i=1 

nX 

i=1 

x 4 

i 

2 

i = p n 

for equal variances 2 

i = 2 in each random component zi is proportional to the square 

root of the number of variables. The reason for this lies in the property of Euclidean 

space that, as the number of dimensions increases, the volume of a hypersphere becomes 

concentrated more and more in the boundary region near the surface. Rechenberg's adaptation 

rule is founded on the relation between optimal variance and probability of success 

derived from two essentially di erent models of the objective function. The adaptation 

rule which is thereby formulated makes the frequency and size of the increments respectively 

dependent on the number of variables and independent of the structure of the 

objective function. This will be discussed in more detail in Chapter 5, Section 5.1. 

Convergence proofs for the sequential random strategy have been given by Matyas 

(1965, 1967) and Rechenberg (1973) only for the case of constant variance 2 . Gurin 

(1966) has proved convergence also for stochastically perturbed objective functions. The


convergence rate is still reduced by perturbations (Gurin and Rastrigin, 1965), but not 

as much as in gradient methods. Global convergence can be achieved if the reference 

value of the objective function is measured more than once at the comparison point 

(Saridis and Gilbert, 1970). As soon as any attempt is made to achieve higher rates of 

convergence by adjusting the variances or step lengths, the chance of nding a global 

optimum diminishes. Then the random strategy itself becomes a path-oriented instead 

of a volume-oriented strategy. The probability of global convergence still always remains 

nite it may simply become very small, especially in the case of many dimensions. 

Apart from adjusting the step lengths, one can consider modifying the directions. Several 

proposals of this kind have been published: Satterthwaite (1959a following McArthur, 

1961), Wheeling (1960), Smith and Rudd (1964 following Dickinson, 1964), Matyas (1965, 

1967), Bekey et al. (1966), Stewart, Kavanaugh, and Brocker (1967), De Graag (1970), 

and Lawrence and Emad (1973). They are all heuristic in nature. In the simplest case 

of a directed random search, a successful random direction is maintained until a failure 

occurs (Satterthwaite). Bekey, Lawrence, and Rastrigin actually make use of each random 

direction. If the rst step leads to a failure, they use the opposite direction (positive 

and negative absolute biasing). Smith and Rudd store the two currently best points from 

a larger series of samples and obtain from their separation a step length for continuing 

the optimization. Wheeling's history vector method adds to each random increment a 

deterministic portion, derived from experience. This additional vector is initially zero. It 

is increased at each success by a fraction of the increment vector, and correspondingly 

decreased at each failure. Such alearning and forgetting process also forms the basis of 

the algorithms of De Graag and Matyas. The latter has received the most attention, 

in spite of the fact that it gives no precise guidance on how tochoose the variances. 

Schrack and Borowski (1972), who apply their own step length rule in Matyas' strategy, 

were able to show bynumerical tests that the simple algorithm of Schumer and Steiglitz, 

without direction orientation, is at least as good as Matyas' for unperturbed as well as 

perturbed measurements of the objective function. A quite di erent kind of method, due 

to Kjellstrom (1965), in which the random search takes place in varying three dimensional 

subspaces of the space IR n , shows itself here to be very much worse. 

Another method that sets out to accept only especially favorable directions is the 

threshold strategy of Stewart, Kavanaugh and Brocker (1967), in which only those random 

changes are accepted that result in a speci ed minimum improvement in the objective 

function value. A more recent version of the same idea has been given by Dueck and 

Scheuer (1990). The simultaneous adjustment of step lengths and directions has seldom 

been attempted. The suggestions of Favreau and Franks (1958) and Matyas (1965, 1967) 

remain too imprecise to be practicable. Gaidukov (1966 see also Hupfer, 1970) and 

Furst, Muller, and Nollau (1968) provide more exact information for this purpose, based 

on either the concepts of Rastrigin or Matyas. Modi cation of the expectation values and 

variances of the random vectors is made according to the success or failure of iterations. No 

applications of the strategy are known, however, so that for the time being the observation 

of Schrack and Borowski (1972) still stands, namely that a careful choice of the step lengths 

is the most important prerequisite for the rapid convergence of a random method. 

A method devised by Rastrigin (1965a,b, 1968) and developed further by Heydt (1970)


works entirely with a restricted choice of directions. With a xed step length, a direction 

can be randomly selected only from within an n-dimensional hypercone. The angle subtended 

by the cone and its height (andthus the overall step length) are controlled in an 

adaptiveway. For a spherical objective function, e.g., the model functions F2 (hypercone), 

F3 (hypersphere), or F4 (something intermediate between hypersphere and hypercube), 

there is no improvement in the convergence behavior. Advantages can only be gained 

if the search has to follow a particular direction for a long time along a narrow valley. 

Sudden changes in direction present a problem, however, which leads Heydt to consider 

substituting for the cone con guration a hyper-parabolic or hyper-hyperbolic distribution, 

with which at least small step lengths would retain su cient freedom of direction. 

In every case the striving for rapid convergence is directly opposed to the reliabilityof 

global convergence. This has led Jarvis (1968, 1970) to investigate a combination of the 

method of Matyas (1965, 1967) with that of McMurtry and Fu (1966). Numerical tests 

by Cockrell (1969, 1970 see also Fu and Cockrell, 1970) show that even here the basic 

strategy of Matyas (1965) or Schumer and Steiglitz (1967) is clearly the better alternative. 

It o ers high convergence rates besides a fair chance of locating global optima, at least 

for a small number of variables. In the case of many dimensions, every attempt to reach 

global reliability isthwarted by the excessive cost. This leaves the globally convergent 

stochastic approximation method of Vaysbord and Yudin (1968) far behind the rest of 

the eld. Furthermore, the sequential or creeping random search is the least susceptible 

if perturbations act on the objective function. 

Users of random strategies always draw attention to their simplicity, exibility and 

resistance to perturbations. These properties are especially important if one wishes to 

construct automatic optimalizers (e.g., Feldbaum, 1958 Herschel, 1961 Medvedev and 

Ruban, 1967 Krasnushkin, 1970). Rastrigin actually built the rst optimalizer with a 

random search strategy, whichwas designed for automatic frequency control of an electric 

motor. Mitchell (1964) describes an extreme value controller that consists of an analogue 

computer with a permanently wired-in digital part. The digital part serves for storage and 

ow control, while the analogue part evaluates the objective function. The developmentof 

hybrid analogue computers, in which the computational inaccuracy is determined by the 

system, has helped to bring random methods, especially of the sequential type, into more 

general use. For examples of applications besides those of the authors mentioned above, 

the following publications can be referred to: Meissinger (1964), Meissinger and Bekey 

(1966), Kavanaugh, Stewart, and Brocker (1968), Korn and Kosako (1970), Johannsen 

(1970, 1973), and Chatterji and Chatterjee (1971). Hybrid computers can be applied to 

best advantage for problems of optimal control and parameter identi cation, because they 

are able to carry out integrations and di erentiations more rapidly than digital computers. 

Mutseniyeks and Rastrigin (1964) have devised a special algorithm for the dynamic control 

problem of keeping an optimum. Instead of the variable position vector x,avelocityvector 

with components @xi=@t is varied. A randomly chosen combination is retained as long as 

the objective function is decreasing in value (for minimization @F=@t < 0). As soon as 

it begins to increase again, a new velocity vector is chosen at random. 

It is always striking, if one observes living beings, how well adapted they are in shape, 

function, and lifestyle . In many cases, biological structures, processes, and systems even


surpass the capabilities of highly developed technical systems. Recognition of this has for 

years led many authors to suspect that nature is in possession of optimal solutions to her 

problems. In some cases the optimality of biological subsystems can even be demonstrated 

mathematically, for example for the ratios of diameters in branching arteries (Cohn, 1954), 

for the hematocrit value (the volume fraction of solid particles in the blood Lew, 1972), 

and the position of branch points in a level system of blood vessels (Kamiya andTogawa, 

1972 see also Grassmann, 1967, 1968 Rosen, 1967 Rein and Schneider, 1971). 

According to the theory of the descent of the species, all organisms that exist today 

are the (intermediate) result of a long process of development: evolution. Based on the 

multitude of nds of transitional species that have since become extinct, paleontology is 

providing a gradually more complete picture of this development. Leaving aside supernatural 

explanations, one must assume that the development of optimal or at least very 

good structures is a property ofevolution, i.e., evolution is, or possesses, an optimization 

(or better, meliorization) strategy. 

In evolution, the mechanism of variation is the occurrence of random exchanges, even 

\errors," in the transfer of genetic information from one generation to the next. The selection 

criterion favors the better suited individuals in the so-called struggle for existence. 

The similarityofvariation and selection to the iteration rules of direct optimization methods 

is, in fact, striking. This analogy is most often drawn for random strategies, since 

mutations can best be interpreted as random changes. Thus Ashby (1960) regards as 

mutations the stochastic parameter variations in his blind homeostatic process. For many 

variables, however, the pure or blind random search requires so many trials that it offers 

no acceptable explanation of the capabilities of natural structures, processes, and 

systems. With the highest possible physical rate of transfer of information, as given by 

Bremermann (1962 see also Ashby, 1965, 1968) of 10 47 bits per second and gram of computer 

mass, the mass of the earth and the extent of its lifetime up to now would not 

be su cient to solve even simple combinatorial problems by complete enumeration or a 

blind random search, never mind to determine the optimal con guration of the 10 4 to 10 5 

genes with their information content of around 10 10 bits (Bremermann, 1963). Evolution 

must rather be considered as a sequential process that exploits the information from preceding 

successes and failures in order to follow a trajectory, although not a completely 

deterministic one, in the n-dimensional parameter space. Brooks (1958) and Favreau and 

Franks (1958) are therefore right to compare their creeping random search with biological 

evolution. Yet it is also certainly a very much simpli ed imitation of the natural process 

of development. In the 1960s, two proposals that consciously think of higher evolution 

principles as optimization rules to be simulated are due to Rechenberg (1964, 1973) and 

Bremermann (1962, 1963, 1967, 1968a,b,c, 1970, 1971, 1973a,b see also Bremermann, 

Rogson, and Sala , 1965, 1966 Bremermann and Lam, 1970). Bremermann reasons from 

the (nowadays!) low mutation rates observed in nature that only one component of the 

variable vector should be varied at a time. He then encounters with this scheme the 

same di culties as arise in the coordinate method. On the basis of his failure with the 

mutation-selection scheme, for example on linear programming problems, he comes to the 

conclusion that ecological niches are actually only stagnation points in development, and 

they do not represent optimal states of adaptation. None of his many attempts to invoke


the principles of population, sexual inheritance, recombination, dominance, and recessiveness 

to improve the convergence behavior yield the hoped for breakthrough. He thus 

eventually resigns himself to a largely deterministic strategy. In the linear programming 

problem, he chooses from the starting point several random directions and follows these in 

turn up to the boundary of the feasible region. The best states on the individual bounding 

hyperplanes are used to determine a new starting point by taking the arithmetic mean 

of the component parameters. Because of the convexity of the allowed region, the new 

starting point isalways within it. The simultaneous choice of several search directions 

is supposed to be the analogue of the population principle and the construction of the 

average the analogue of recombination in sexual propagation. To tackle the problem of 

nding the minimum or maximum of an unconstrained, non-linear function, Bremermann 

even applies a ve point Lagrangian interpolation to determine relative extrema in the 

random directions. 

Rechenberg's evolution strategy changes all the components of the variable vector at 

each mutation. In his case, the low mutation rate for many dimensions is expressed by 

choosing small values for the step lengths, or the spread in the random changes. On the 

basis of theoretical work with two model functions he nds that the standard deviations 

of the random components are set optimally when they are inversely proportional to the 

number of parameters. His two membered evolution strategy resembles the scheme of 

Schumer and Steiglitz (1968), which isacknowledged to be particularly good, except that 

a(0 2 ) normally distributed random quantity replaces the xed step length s. He has 

also added to it a step length modi cation rule, again derived from theory, whichmakes 

this look a very promising search method. It is re ned in Chapter 5, Section 5.1 to meet 

the requirements of numerical optimization with digital computers. Amultimembered 

strategy is treated in Section 5.2, which follows the same basic concept however, by imitating 

the principles of population and recombination, it can operate without external 

control of the step lengths. Incorporating more than one descendant at a time and forgetting 

\parental wisdom" at the end of each iteration loop has provoked erce objections 

against a more natural evolution strategy. 

Box (1957) also considers that his EVOP (evolutionary operation) strategy resembles 

the biological mutation-selection process. He regards the vertices of his pattern of 

trial points, of which the best becomes the center of the next pattern, as individuals of 

a population, of which only the best \survives." The \o spring" are, however, generated 

by purely deterministic rules. Random decisions, as used by Satterthwaite (1959a 

after Lowe, 1964) in his REVOP (random evolutionary operation) variant, are actually 

explicitly rejected by Box(seeYouden et al., 1959 Satterthwaite, 1959b Budne, 1959 

Anscombe, 1959). 

From a biological or cybernetic pointofview,Pask (1962, 1971), Schmalhausen (1964), 

Berg and Timofejew-Ressowski (1964), Dobzhansky (1965), Moran (1967), and Kussul and 

Luk (1971) among others have examined the analogy between optimization and evolution. 

The fact that no practical algorithms have come out of this is no doubt because the 

evolutionary processes are described only verbally. Although they sometimes even include 

their more subtle e ects, they have so far not produced a really quantitative, predictive 

theory. Exceptions, such as the work of Eigen (1971 see also Schuster, 1972), Merzenich


(1972), and Papentin (1972) are so di erent in emphasis that they are not applicable to 

the kind of problems considered here. The ways in which a process of mathematization 

can be implemented in theoretical biology are documented in for example the books by 

Waddington (1968) and Locker (1973), whichcontain a number of contributions of interest 

from the optimization point of view, as well as many articles in the journal Mathematical 

Biosciences, which has been published by R. W. Bellman since 1967, and some papers 

from two Berkeley symposia (LeCam and Neyman, 1967 LeCam, Neyman, and Scott, 

1972). Whereas many modern books on biology, such as Riedl (1976) and Roughgarden 

(1979), still give mainlyverbal explanations of organic evolution, in general, this is no 

longer the case. Physicists like Ebeling and Feistel (see Feistel and Ebeling, 1989) and 

biologists like Maynard Smith (1982, 1989) meanwhile have contributed mathematical 

models. The following two paragraphs thus no longer represent the actual situation, but 

before we add some new aspects they will be presented, nevertheless, to characterize the 

situation as perceived by the author in the early 1970s (Schwefel, 1975a): 

Relationships have been seen between random strategies and biological evolution 

on the one hand and the psychology of recognition processes on the other, 

for example, by Campbell (1960) and Khovanov (1967). The imitation of such 

processes{the catch phrase is arti cial intelligence{always leads to the problem 

of choosing or designing a suitable search algorithm, which should rather 

be heuristic than strictly deterministic. Their simplicity, reliability (even in 

extreme, unfamiliar situations), and exibility give the random strategies a 

special rôle in this eld. The topic will not be discussed more fully here, except 

to mention some publications that explicitly deal with the relationship 

to optimization strategies: Friedberg (1958), Friedberg, Dunham, and North 

(1959), Minsky (1961), Samuel (1963), J. L. Barnes (1965), Vagin and Rudelson 

(1968), Thom (1969), Minot (1969), Ivakhnenko (1970), Michie (1971), 

and Slagle (1972). A particularly impressive example is given by the work of 

Fogel, Owens, and Walsh (1965, 1966a,b), in which imitation of the biological 

evolutionary principles of mutation and selection gives a (mathematical) 

automaton the ability to recognize prescribed sequences of numbers. 

It may be that in order to match the capabilities of the human brain{and 

to understand them{there must be a move away from the digital methods of 

present serial computers to quite di erent kinds of switching elements and 

coupling principles. Such concepts, as pursued in neurocybernetics and neurobionics, 

are described, for example, by Brajnes and Svecinskij (1971). The 

developmentoftheperceptron by Rosenblatt (1958) can be seen as a rst step 

in this direction. 

Two research teams that have emphasized the adaptive capacity ofevolutionary procedures 

and who have shown interesting computer simulations are Allen and McGlade 

(1986), and Galar, Kwasnicka, and Kwasnicki (see Galar, Kwasnicka, and Kwasnicki, 

1980 Galar, 1994). In terms of the optimization tasks looked at throughout this book, 

one might call their point of view dynamic or on-line optimization, including optimum 

holding against environmental changes. As Schwefel and Kursawe (1992)have pointed


out, a limited life span of all individuals is an important ingredient in such cases (principle 

of forgetting). 

Two others who have tried to explain brain processes on an evolutionary, at least 

selectionist, basis are Edelman (1987) and Conrad (1988). Though their approach has 

not yet been embraced by the main stream of neural network research, this mighthappen 

in the near future (e.g., Banzhaf and Schmutz, 1992). 

An even more general paradigm shift in the eld of arti cial intelligence (AI) has 

emerged under the label arti cial life (AL see Langton, 1989, 1994a,b Langton et al., 

1992 Varela and Bourgine, 1992). Whereas Lindenmayer (see Prusinkiewicz and Lindenmayer, 

1990) demonstrates the possibility of (re-)creating plant forms by means of 

rather simple computer algorithms, the AL community tries to imitate animal behavior 

computationally. In most cases the goal is to design \intelligent" robots, sometimes called 

knowbots or animats (Meyer and Wilson, 1991 Meyer, 1992 Meyer, Roitblat, and Wilson, 

1993). 

The attraction of even simple evolutionary models (re-)producing fairly complex behavior 

of multi-individual systems simulated on computers is already spreading across 

the narrow bounds of computer science as such. New ideas are emerging from evolutionary 

computation, not only towards the organization of software development (Huberman, 

1988), but also into the eld of economics (e.g., Witt, 1992 Nissen, 1993, 1994) and even 

beyond (Schwefel, 1988 Haefner, 1992). It may be questionable whether worthwhile conclusions 

from the new ndings can reach as far as that, but ecology at least should be a 

eld that could bene t from a consequent use of evolutionary thinking (see Wol , Soeder, 

and Drepper, 1988). 

Computers have opened a third way of systems analysis aside from the classical mathematical/analytical 

and experimental/empirical main roads: i.e., numerical and/or symbolical 

simulation experiments. There is some hope that we may learn this way quickly 

enough so that we can maintain life on earth before we more or less unconsciously destroy 

it. Real evolution always had to deal with unpredictable environmental changes, not only 

those resulting from exogenous in uences, but also self-induced endogenous ones. The 

landscape is some kind of n-dimensional trampoline, and every good imitation of organic 

evolution, whether it be called adaptive or meliorizing, must be able to work properly under 

such hard conditions. The multimembered evolution strategy (see Chap. 5, Sect. 5.2) 

with limited life span of the individuals ful lls that requirement to some extent.

104 Random Strategies

Chapter 5 

Evolution Strategies for Numerical 

Optimization 

The task of mimicking biological structures and processes with the object of solving 

technical problems is as old as engineering itself. Mimicry itself, as a natural \strategy", 

is even older than mankind. The legend of Daedalus and Icarus bears early witness 

to such human endeavor. A sign of its scienti c coming of age is the formation of the 

distinct branch of science known as bionics (e.g., Hertel, 1963 Gerardin, 1968 Beier 

and Gla , 1968 Nachtigall, 1971 Heynert, 1972 Zerbst, 1987), which is concerned with 

the recognition of existing biological solutions to problems that also happen to arise 

in engineering, and with the adequate emulation of these examples. It is always thereby 

supposed that evolution has found particularly good, perhaps even optimal solutions. This 

assumption has often proved to be correct, or at any rate useful. Only a few attempts to 

imitate the actual methods of natural development are known (Ashby, 1960 Bremermann, 

1962{1973 Rechenberg, 1964, 1973 Fogel, Owens, and Walsh, 1965, 1966a,b Holland, 

1975 see also Chap. 4) since they are curiously regarded a priori as being especially bad, 

meaning costly. 

Rechenberg proposed the hypothesis \that the method of organic evolution represents 

an optimal strategy for the adaptation of living things to their environment," and he 

concludes \it should therefore be worthwhile to take over the principles of biological 

evolution for the optimization of technical systems." 

5.1 The Two Membered Evolution Strategy 

Rechenberg's two membered evolution scheme, suggested in similar form by other authors 

as a random strategy (see Chap. 4) will be expressed in this chapter as an algorithm for 

solving non-discrete, non-stochastic, parameter optimization problems. As in Chapter 3, 

the problem is 

F (x) ! min 

where x 2 IR n . In the constrained case x has to be in an allowed region G IR n , where 

G = fx 2 IR n j Gj(x) 0 j = 1(1)nGj restriction functionsg 

105

106 Evolution Strategies for Numerical Optimization 

In this, as in all direct search methods, it is not possible to deal with constraints in the 

form of equalities. 

5.1.1 The Basic Algorithm 

The two membered scheme is the minimal concept for an imitation of organic evolution. 

The two principles of mutation and selection, which Darwin (1859) recognized to be most 

important, are taken as rules for variation of the parameters and for ltering during the 

iteration sequence respectively. 

In the language of biology, the rules are as follows: 


A given population consists of two individuals, one parent and one descendant. 

They are each identi ed by their genotype according to a set of n genes. Only 

the parental genotype has to be speci ed as starting point. 

Step 1: (Mutation) 

The parent E (g) of the generation g produces a descendant N (g) , whose genotype 

is slightly di erent from that of the parent. The deviations refer to the 

individual genes and are random and independent of each other. 

Step 2: (Selection) 

Because of their di erent genotypes, the two individuals have a di erent capacity 

for survival (in the same environment). Only one of them can produce 

further descendants in the next generation, namely the one which represents 

the higher survival value. It becomes the parent E (g+1) of the generation g +1. 

Thus the simplest possible assumptions are made: 

The population size remains constant 

An individual has in principle an in nitely long life span and capacity for producing 

descendants (asexually) 

No di erence exists between genotype (encoding) and phenotype (appearance), or 

that one is unambiguously and reproducibly associated with the other 

Only point mutations occur, independently of each other at all single parameter 

locations 

The environment andthus the criterion of survival is constant over time 

This minimal concept takes no accountoftheevolutionary factors familiar to the modern 

synthetic evolution theory (e.g., Stebbins, 1968 Cizek and Hodanova, 1971 Osche, 

1972), such aschromosome mutations, bisexuality, recombination, diploidy, dominance 

and recessiveness, population size, niching, isolation, migration, etc. Even the concepts 

of mutation and selection are not applied here with their full biological meaning. Natural 

selection does not simply mean the struggle between just two individuals in which the

The Two Membered Evolution Strategy 107 

better survives, but far more accurately that an individual with more favorable properties 

produces on average more descendants than one less well adapted to the environment. 

Neither does the present work go more deeply into the connections between cause and 

e ect in the transmission of inherited information, of which somuch has been revealed 

by molecular biology. Mutation is used in the widest biological sense as a synonym for 

all types of alteration of the substance of inheritance. In his book Evolutionsstrategie, 

Rechenberg (1973) examines in more detail the analogy between natural evolution and 

technical optimization. He compares in particular the biological with the technical parameter 

space, and interprets mutations as steps in the nucleotide space. 

Expressed in mathematical language, the rules are as follows: 


There should be storage allocated in a (digital) computer for two points of 

an n-dimensional Euclidean space. Each point ischaracterized by a position 

vector consisting of a set of n components. 

Step 1: (Variation) 

Starting from point E (g) , with position vector x (g) 

E , in iteration g, a second 

point N (g) , with position vector x (g) 

N 

, is generated, the components x(g) 

Ni of 

which di er only slightly from the x (g) 

Ei. The di erences come about by the 

addition of (pseudo) random numbers z (g) 

i ,whicharemutually independent. 

Step 2: (Filtering) 

The two points or vectors are associated with di erent values of an objective 

function F (x). Only one of them serves as a starting point for the new 

variation in the next iteration g + 1: namely the one with the better (for 

minimization, smaller) value of the objective function. 

Taking account of constraints in the form of a barrier penalty function, this algorithm 

can be formalized as follows: 


De ne x (0) 

E = fx (0) 

Ei i = 1(1)ng T , such that Gj(x (0) 

E ) 0 for all j =1(1)m. 

Set g =0. 


Construct x (g) 

N 

x (g) 

Ni = x (g) 

Ei + z (g) 

i 

= x(g) 

E + z(g) with components 

for all i = 1(1)n. 


Decide 

x (g+1) 

( (g) 

(g) 

x N if F (x(g) 

N ) F (x(g) 

E )andGj(x N ) 0 for all j = 1(1)m 

E = 

x (g) 

E otherwise: 

Increase g g + 1 and go to step 1 as long as the termination criterion does 

not hold.


The question remains of how tochoose the random vectors z (g) .Thischoice has the 

rôle of mutation. Mutations are understood nowadays to be random, purposeless events, 

which furthermore only occur very rarely. Ifoneinterprets them, as is done here, as a sum 

of many individual events, it is natural to choose a probability distribution according to 

which small changes occur frequently, but large ones only rarely (the central limit theorem 

of statistics). For discrete variations one can use a binomial distribution, for example, for 

continuous variations a Gaussian or normal distribution. 

Two requirements then arise together by analogy with natural evolution: 

That the expectation value i for a component zi has the value zero 

2 

That the variance i ,theaverage squared deviation from the mean, is small 

The probability density function for normally distributed random events is (e.g., Heinhold 

and Gaede, 1972): 

w(zi) = 

1 

p 

2 i 

exp 

! 

(5.1) 

; (zi ; i) 2 

2 2 i 

2 

If i = 0, one obtains a so-called (0 i ) normal distribution. There are still however a 

total of n free parameters f i i= 1(1)ng with which to specify the standard deviations 

of the individual random components. By analogy with other, deterministic search strategies, 

the i can be called step lengths, in the sense that they represent average values of 

the lengths of the random steps. 

For the occurrence of a particular random vector z = fzi i = 1(1)ng, with the 

2 

independent (0 i ) distributed components zi, the probability density function is 

w(z1z2:::zn) = 

nY 

i=1 

w(zi) = 

(2 ) n 2 

or more compactly, if i = for all i =1(1)n, 

w(z) = 

1 

p2 

! n 

1 

nQ 

i=1 

exp 

i 

exp 

;zz T 

2 2 

! 

; 1 

2 

nX 

i=1 

zi 

i 

2 ! 

(5.2) 

(5.3) 

q Pn 

i=1 z 2 i a p 2 distribution is ob- 

For the length of the overall random vector S = 

2 tained. The distribution with n degrees of freedom approximates, for large n, to 

a( q n ; 1 

2 2 

) normal distribution. Thus the expectation value for the total length 

2 

of the random vector for many variables is E(S) = p n,thevariance is D2 (S) = 

E((S ; E(S)) 2 )= 2 

, and the coe cient ofvariation is 

2 

D(S) 

E(S) 

= 1 

p 2 n 

This means that the most probable value for the length of the random vector at constant 

increases as the square root of the number of variables and the relative width of variation 

decreases with the reciprocal square root of parameters.


x 2 

Line of equal 

probability 

density 

(g) 

x 

N 

(g) 

N 

x (g) 

E 

z (g) 

x 

E 

(g+2) 

(g) (g+1) 

E = E 

z (g+1) 

Opt. 

N (g+1) 

= E (g+2) 

x 

1 

Figure 5.1: Two membered evolution strategy 

Contours 

F (x) = const. 

E : Parent 

N : Descendant 

(g) : Generation index 

The geometric locus of equally likely changes in variation of the variables can be 

derived immediately from the probability density function, Equation (5.2). It is an ndimensional 

hyperellipsoid (n-fold variance ellipse) with the equation 

nX 

i=1 

zi 

i 

2 

= const: 

referred to its center, which is the starting point x (g) 

E . In the multidimensional case, the 

random changes can be regarded as a vector ending on the surface of a hyperellipsoid 

with the semi-axes i orif i = for all i = 1(1)n, in the language of two dimensions 

they are distributed circumferentially. Figure 5.1 serves to illustrate two iteration steps 

of the evolution strategy on a two dimensional contour diagram. Whereas in other, 

fully deterministic search strategies both the direction and length of the search step are 

determined in the procedure in a xed manner, or on the basis of previously gathered 

information and plausible assumptions about the topology of the objective function, in 

the evolution strategy the direction is purely random and the step length{except for 

a small number of variables{is practically xed. This should be emphasized again to 

distinguish this random method from Monte-Carlo procedures, in which the selected trial 

point isalways fully independent of the previous choice and its outcome. Darwin (1874) 

himself emphasized that the evolution of living things is not a purely random process. Yet 

against his theory of descendancy, a polemic is still waged in which the impossibility is 

demonstrated that life could arise by a purely random process (e.g., Jordan, 1970). Even 

at the level of the simplest imitation of organic evolution, a suitable choice of the step 

lengths or variances turns out to be of fundamental signi cance.


5.1.2 The Step Length Control 

In experimental optimization, the appropriate step lengths can frequently be predicted. 

The values of the variables usually have to be determined exactly at only a few points. 

Thus constant values of the variances are often all that is required to complete an extreme 

value search. It is a matter of fact that in most experimental applications of the simple 

evolution strategy xed (and discrete) distributions of mutations have been used. 

By contrast, in mathematically formulated problems that are to be solved on a digital 

computer, the variables often run over muchofthenumber range of the computer, which 

corresponds to many powers of 10. In a numerical optimum search the step lengths 

must be continuously modi ed if the algorithm is to be e cient{a problem reminiscent 

of steering safely between Scylla and Charybdis for if the step length is too small the 

search takes an unnecessarily large number of iterations if it is too large, on the other 

hand, the optimum can only be crudely approached and the search caneven get stuck 

far from the optimum, for example, if the route to the minimum passes along a narrow 

valley. Thus in all optimization strategies the step length control is the most important 

part of the algorithm after the recursion formula, and it is furthermore closely linked to 

the convergence behavior. 

The corresponding remarks hold for the evolution strategy, with the following di erence: 

In place of a predetermined step length for a parameter of the objective function 

there is the variance of the random changes in this parameter, and instead of the statement 

that an improvement will or will not be made in a given direction with a speci ed 

step length, there can only be a statement of probability of the success or failure for a 

chosen variance. 

In his theoretical investigations of the two membered evolution strategy, Rechenberg 

discovered using two basically di erent model objective functions (sphere model = Problem 

1.1, corridor model = Problem 3.8 of the problem catalogue see Appendix A) that 

the maximal rate of convergence corresponds to a particular value for the probability ofa 

success, i.e., an improvement in the objective function value. He was thus led to formulate 

the following rule for controlling the size of the random changes: 

The 1=5 success rule: 

From time to time during the optimum search obtain the frequency of successes, 

i.e., the ratio of the number of successes to the total number of trials 

(mutations). If the ratio is greater than 1=5, increase the variance,ifitisless 

than 1=5, decrease the variance. 

In many problems this rule proves to be extremely e ective inmaintaining approximately 

the highest possible rate of progress towards the optimum. While in the rightangled 

corridor model the variances are adjusted once and for all in accordance with this 

rule and subsequently remain constant, in the sphere model they must steadily become 

smaller. The question then arises as to how often the success criterion should be tested 

and by what factor the variances are most e ectively reduced or increased. 

This question will be answered with reference to the sphere model introduced by 

Rechenberg, since this is the simplest non-linear model objective function and requires


the greatest and most frequent changes in the step lengths. The following results of 

Rechenberg's theory can be used here: 

For the maximum rate of progress 

'max = k1 

with a common variance 2 ,which is always optimal given by 

opt = k2 

r 

n k1 ' 0:2025 (5.4) 

r 

n k2 ' 1:224 (5.5) 

for all components zi of the random vector z. In these expressions r is the current distance 

from the goal (optimum) and n is the number of variables. The rate of progress is de ned 

as the expectation value of the radial di erence covered per trial (mutation), as illustrated 

in Figure 5.2. 

' (g) = r (g) ; r (g+1) 

(5.6) 

From Equations (5.4) to (5.6) one obtains a relation for the changes in the variance 

after a generation (iteration, or mutation) under the condition of maximum convergence 

rate 

x 2 

(g) 

ϕ 

(g) 

r 

(g+1) 

r 

E (g) 

(g+1) 

E 

Line of constant 

probability density 

x 

1 

Contours 

2 2 

F(x) = x + x = const. 

1 2 

Figure 5.2: The rate of progress for the sphere model


or after n generations 

(g+1) 

opt 

(g) 

opt 

(g+n) 

opt 

(g) 

opt 

= r(g+1) k1 

=1; 

r (g) n 

= 

1 ; k1 

! n 

n 

If n is large compared to one, and the formulae derived by Rechenberg are only valid 

under this assumption, the step length factor tends to a constant: 

lim 

n!1 

1 ; k1 

n 

! n 

= e ;k1 ' 0:817 ' 1 

1:224 

The same result is obtained by considering the rate of progress as a di erential quotient 

' = dr=dg, in which g represents the iteration number. 

This matches the limiting case of very manyvariables because, according to Equation 

(5.4) the rate of progress is inversely proportional to the number of variables. The fact 

that the rate of progress ' near its maximum is quite insensitive to small changes in the 

variances, together with the fact that the probability of success can only be determined 

from an average over several mutations, leads to the following more precise formulation 

of the 1=5 success rule for numerical optimization: 

After every n mutations, check howmany successes have occurred over the 

preceding 10 n mutations. If this number is less than 2 n, multiply the step 

lengths by thefactor0:85 divide them by 0:85 if more than 2 n successes 

occurred. 

The 1=5 success rule enables the step lengths or variances of the random variations 

to be controlled. One might doeven better by looking for a control mechanism with 

additional di erential and integral coe cients to avoid the oscillatory behavior of a mere 

proportional feedback. However, the probability of success unfortunately gives no indi- 

2 

cation of how appropriate are the ratios of the variances i to each other. The step 

lengths can only be all reduced together, or all increased. One would sometimes rather 

like to build in a scaling of the variables, i.e., to determine ratios of the step lengths to 

each other. This can be achieved by a suitable formulation of the objective function, in 

which new parameters are introduced in place of the original variables. The functional 

dependence can be freely chosen and in the simplest case it is given by multiplicative 

factors. In the formulation of the numerical procedure for the two membered evolution 

strategy (Appendix B, Sect. B.1) the possibility is therefore included of specifying an 

initial step length for each individual variable. The ratios of the variances to each other 

remain constant during the optimum search, unless speci ed lower bounds to the step 

lengths are not operating at the same time. 

All digital computers handle data only in the form of a nite number of units of 

information (bits). The number of signi cant gures and the range of numbers is thereby 

limited. If a quantity is repeatedly divided by a factor greater than one, the stored value of


the quantity eventually becomes zero after a nite number of divisions. Every subsequent 

multiplication leaves the value as zero. If it happens to one of the standard deviations i, 

the a ected variable xi remains constant thereafter. The optimization continues only in a 

subspace of IR n .To guard against this it must be required that i > 0 for all i = 1(1)n. 

The random changes should furthermore be su ciently large that at least the last stored 

place of a variable is altered. There are therefore two requirements: 

Lower limits for the \step lengths": 


where 

"a > 0 

1+"b > 1 

(g) 

i "a for all i =1(1)n 

(g) 

i "b x (g) 

i for all i = 1(1)n 

) 

according to the computational accuracy 

It is thereby ensured that the random variations are always active and the region of the 

search stays spanned in all dimensions. 

5.1.3 The Convergence Criterion 

In experimental optimization it is usually decided heuristically when to terminate the 

series of trials: for example, when the trial results indicate that no further signi cant 

improvement can be gained. One always has an overall view of how the experiment is 

running. In numerical optimization, if the calculations are made by computer, one must 

build into the program a rule saying when the iteration sequence is to be terminated. For 

this purpose objective, quantitative criteria are needed that refer to the data available at 

any time. Sometimes, although not always, one will be concerned to obtain a solution as 

exactly as possible, i.e., accurate to the last stored digit. This requirement can relate to 

the variables or to the objective function. Remember that the optimum may beaweak 

one. 

Towards the minimum, the step lengths and distances covered normally become 

smaller and smaller. A frequently used convergence criterion consists of ending the search 

when the changes in the variables become zero (in which case no further improvementin 

the objective function is made), or when the step lengths have become zero. As a rule one 

sets the lower bound not to zero but to a su ciently small, nite value. This procedure 

has however one disadvantage that can be serious. Small step lengths occur not only if 

the minimum is nearby, but also if the search ismoving through a narrow valley. The 

optimization may then be practically halted long before the extreme value being sought is 

reached. In Equations (5.4) and (5.5), r can equally well be thought of as the local radius 

of curvature. Neither ', the distance covered, nor , the step length, are a measure of the 

closeness to the optimum. Rather they convey information about the complexity ofthe 

minimum problem: the number of variables and the narrowness of valleys encountered. 

The requirement >"or kx (g) ; x (g;1) k >"for the continuation of the search isthus 

no guarantee of su cient convergence.


Gradient methods, which seek a point withvanishing rst derivatives, frequently also 

apply this necessary condition for the existence of an extremum as a termination criterion. 

Alternatively the search can be continued until 4F = F (x (k;1) ) ; F (x (k) ), the change 

in the objective function value in one iteration, goes to zero or to below a prescribed 

limit. But this requirement can also be ful lled far from the minimum if the valley in 

which the deepest point issought happens to be very at in shape. In this case the 

step length control of the two membered evolution strategy ensures that the variances 

become larger, and thus the function value di erences between two successful trials also 

on average become larger. This is guaranteed even if the function values are equal (within 

computational accuracy), since a change in the variables is always then registered as a 

success. One thus has only to take care that 4F is summed over a number of results 

in order to derive a termination criterion. Just as lower bounds are de ned for the step 

lengths, an absolute and a relative bound can be speci ed here: 

Termination rule: 

End the search if 

or 

where 


1 

"d 

"c > 0 

1+"d > 1 

F (x (g;4g) 

E ) ; F (x (g) 

E ) "c 

h (g;4g) 

F (x E ) ; F (x (g) 

E ) i 

) 

4g 20 n 

jF (x (g) 

E )j 


The condition 4g 20 n is designed to ensure that in the extreme case the standard 

deviations are reduced or increased within the test period by at least the factor 

(0:85) 20 ' (25) 1 , in accordance with the 1=5 success rule. This will prevent the search 

being terminated only because the variances are forced to change suddenly. It is clear 

from Equation (5.4) that the more variables are involved in the problem, the slower is 

the rate of progress. Hence it does not make sense to test the convergence criterion very 

frequently. A recommended procedure is to make a test every 20 n mutations. Only one 

additional function value then needs to be stored. 

Another reason can be adduced for linking the termination of the search to the function 

value changes. While every success in an optimum search means, in the end, an economic 

pro t, every iteration costs computer time and thus money. If the costs exceed the pro t, 

the optimization may well provide useful information, but it is certainly not on the whole 

of any economic value. Thus someone who only wishes to optimize from an economic 

point of view can, by a suitable choice of values for the accuracy parameters, restrain the 

search process as soon as it starts running into a loss.


5.1.4 The Treatment of Constraints 

Inequality constraints Gj(x) 0 for all j = 1(1)m are quite acceptable. Sign conditions 

may be formulated in the same manner and do not receive anyspecial treatment. In 

contrast to linear and non-linear programming, no sign conditions need to be set in order 

to keep within a bounded region. If a mutation falls in the forbidden region it is assessed 

as a worsening (in the sense of a lethal mutation) and the variation of the variables is not 

accepted. 

No particular penalty function, such asRosenbrock chooses for his method of rotating 

coordinates, has been developed for the evolution strategy. The user is free to use the 

techniques for example of Carroll (1961), Fiacco and McCormick (1968), or Bandler and 

Charalambous (1974), to construct a suitable sequence of substitute objective functions 

and to solve the original constrained problem as a sequence of unconstrained problems. 

This, however, can be done outside the procedure. 

It is sometimes di cult to specify an allowed initial vector of the variables. If one were 

to wait until by chance a mutation satis ed all the constraints, it could take avery long 

time. Besides, during this search period the success checks could not be carried out as 

described above. It would nevertheless be desirable to apply the normal search algorithm 

e ectively to nd an allowed state. Box (1965) has given in the description of his complex 

method a simple way of proceeding from a forbidden starting point. He constructs an 

auxiliary objective function from the sum of the constraint function values of the violated 

constraints: 

mX 

~F (x) = Gj(x) j(x) 

where 

j(x) = 

j=1 

( ;1 if Gj(x) < 0 

0 otherwise 

(5.7) 

Each decrease in the value of ~F(x) represents an approach to the feasible region. When 

eventually ~ F(x) = 0, then x satis es all the constraints and can serve as a starting vector 

for the optimization proper. This procedure can be taken over without modi cation for 

the evolution strategy. 

5.1.5 Further Details of the Subroutine EVOL 

In Appendix B, Section B.1 a complete FORTRAN listing is given of a subroutine corresponding 

to the two membered evolution scheme that has been described. Thus no 

detailed algorithm will be formulated here, but a few further programming details will be 

mentioned. 

In nearly all digital computers there are library subroutines for generating uniformly 

distributed pseudorandom numbers. They work, as a rule, according to the multiplicative 

or additive congruence method (see Johnk, 1969 Niederreiter, 1992 Press et al., 1992). 

From any two numbers taken at random from a uniform distribution in the range [0 1], by 

using the transformation rules of Box and Muller (1958) one can generate two independent,


normally distributed random numbers with the expectation values zero and the variances 

unity. The formulae are 

Z 


0 q 

1 = ;2lnY1 sin (2 Y2) 

Z 0 q (5.8) 

2 = ;2lnY1 cos (2 Y2) 

where the Yi are the uniformly distributed and the Z0 i (0 1)-normally distributed random 

numbers respectively. Toobtain a distribution with a variance di erent from unity, the 

Z0 i must simply be multiplied by the desired standard deviation i (the \step length"): 

Zi = i Z 0 i 

The transformation rules are contained in a function procedure separate from the actual 

subroutine. To make use of both Equations (5.8) a switch withtwo settings is de ned, 

the condition of which must be preset in the subroutine once and for all. In spite of 

Neave's (1973) objection to the use of these rules with uniformly distributed random 

numbers that have been generated by amultiplicative congruence method, no signi cant 

di erences could be observed in the behavior of the evolution strategy when other random 

generators were used. On the other hand the trapezium method of Ahrens and Dieter 

(1972) is considerably faster. 

Most algorithms for parameter optimization include a second termination rule, independent 

of the actual convergence criterion. They end the search after no more than a 

speci ed number of iterations, in order to avoid an in nite series of iterations in case the 

convergence criterion should fail. Such a rule is e ectively a bound on the computation 

time. The program libraries of computers usually contain a procedure with which the 

CPU time used by the program can be determined. Thus instead of giving a maximum 

number of iterations one could specify a maximum computation time as a termination 

criterion. In the present program the latter option is adopted. After every n iterations 

the elapsed CPU time is checked. As soon as the limit is reached the search ends and 

output of the results can be initiated from the main program. 

The 1=5 success rule assumes that there is always some combination of variances 

i > 0 with which, on average, at least one improvement can be expected within ve 

mutations. In Figure 5.3 two contour diagrams are shown for which the above condition 

cannot always be met. At some points the probability of a success cannot exceed 1=5 : for 

example, at points where the objective function has discontinuous rst partial derivatives 

or at the edge of the allowed region. Especially in the latter case, the selection principle 

progressively forces the sequence of iteration points closer up to the boundary and the 

step lengths are continuously reduced in size, without the optimum being approached 

with comparable accuracy. 

Even in the corridor model (Problem 3.8 of Appendix A, Sect. A.3) di culties can 

arise. In this case the rate of progress and probability of success depend on the current 

position relative to the edges of the corridor. Whereas the maximum probability of success 

in the middle of the corridor is 1=2, at the corners it is only 2 ;n . If one happens to be in the 

neighborhood of the edge of the corridor for several mutations, the probability of success


x 2 

To the 

optimum 

x 

1 

Circle : line of equal probability density 

Bold segment : fraction where success can be scored 

x 2 

Figure 5.3: Failure of the 1/5 success rule 

Forbidden 

region 

Optimum 

x 

1 

calculated by the above rule will be very di erent from that associated with the same 

step length if an average over the corridor cross section were taken. If now, on the basis 

of this low estimate of the success probability, the step length is further reduced, there 

is a corresponding decrease in the probability of escaping from the edge of the corridor. 

It would therefore be desirable in this special case to average the probability of success 

over a longer time period. Opposed to this, however, is the requirement from the sphere 

model that the step lengths should be adjusted to the topology as directly as possible. 

The present subroutine o ers several means of dealing with the problem. For example, 

the lower bounds on the variances (variables EA, EB in the subprogram EVOL) can be 

chosen to be relatively large, or the number of mutations (the variable LS) after which 

the convergence criterion is tested can be altered by the user. The user has besides a free 

choice with regard to the required probability of success (variable LR) and the multiplier 

of the variance (variable SN). The rate of change of the step lengths, given by the factor 

0:85 per n mutations, was xed on the basis of the sphere model. It is not ideal for all 

types of problems but rather in the nature of a lower bound. If it seems reasonable to 

operate with constant variances, the parameter in question should be set equal to one. 

An indication of a suitable choice for the initial step lengths (variable array SM) can be 

obtained from Equation (5.4). Since r increases as the root of the number of parameters, 

one is led to set 

(0) 

i 

= 4xi 

pn 

in which 4xi is a rough measure of the expected distance from the optimum. This does 

not actually give the optimal step length because r is a kind of local scale of curvature of 

the contours of the objective function. However, it does no harm to start with variances 

that are too large they will quickly be reduced to a suitable size by the1=5 success rule. 

During this transition phase there is still a chance of escaping from the neighborhood 

of a merely local optimum but very little chance afterwards. The global convergence


property (see Rechenberg, 1973) of the evolution strategy can only be proved under the 

condition of constant step lengths. With the introduction of the success rule, it is lost, 

or to be more precise: the probability of nding the global minimum among several 

local minima decreases continuously as a local minimum is approached with continuous 

reduction in the step lengths. Rapid convergence and reliable global convergence behavior 

are two contradictory requirements. They cannot be reconciled if one has absolutely no 

knowledge of the topology of the objective function. The 1=5 success rule is aimed at high 

convergence rates. If several local optima are expected, it is thus advisable to keep the 

(0) 

variances large and constant, or at least to start with large i and perhaps to require a 

lower success probability than 1/5. This measure naturally costs extra computation time. 

Once one is sure of having located a point near the global extremum, the accuracy can be 

improved subsequently in a follow-up computation. For more sophisticated investigations 

of the global convergence see Born (1978), Rappl (1984), Scheel (1985), Back, Rudolph, 

and Schwefel (1993), and Beyer (1993). 

5.2 A Multimembered Evolution Strategy 

While the simple, two membered evolution strategy is successful in application to many 

optimization problems, it is not a satisfactory method of solving certain types of problem. 

As we have seen,by following the 1=5 success rule, the step lengths can be permanently 

reduced in size without thereby improving the rate of progress. This phenomenon occurs 

frequently if constraints become active during the search, and greatly reduce the size of 

the success scoring region. A possible remedy would be to alter the probability distribution 

of the random steps in such away astokeep the success probability su ciently 

large. To do so the standard deviations i would have to be individually adjustable. 

The contour surfaces of equal probability could then be stretched or contracted along 

the coordinate axes into ellipsoids. Further possibilities for adjustment would arise if the 

random components were allowed to depend on each other. For an arbitrary quadratic 

problem the rate of convergence of the sphere model could even be achieved if the random 

changes of the individual variables were correlated so as to make the regression line of 

the random vector run parallel to the concentric ellipsoids F (x) =const:, which now lie 

at some angle in the space. To put this into practice, information about the topology 

of the objective function would have to be gathered and analyzed during the optimum 

search. This would start to turn the evolution strategy into something resembling one 

of the familiar deterministic optimization methods, as Marti (1980) and recently again 

Ostermeier (1992) have done this is contrary to the line pursued here, which is to apply 

biological evolution principles to the numerical solution of optimization problems. Following 

Rechenberg's hypothesis, construction of an improved strategy should therefore be 

attempted by taking into account further evolution principles. 

5.2.1 The Basic Algorithm 

When the ground rules of the two membered evolution strategy were formulated in the 

language of biology, reference was to one parent and one o spring the basic population

A Multimembered Evolution Strategy 119 

thus consisted of two individuals. In order to reach a higher level of imitation of the 

evolutionary process, the number of individuals must be increased. This is precisely the 

concept behind the evolution strategy referred to in the following as multimembered. In 

his basic work (Rechenberg, 1973), Rechenberg already presented a scheme for a multimembered 

evolution. The one considered here is somewhat di erent. It turns out to 

be particularly useful with respect to the individual control of several step lengths to be 

described later. As yet, however, no detailed comparison of the two variants has been 

undertaken. 

It is useful to introduce at this point anomenclature for the di erent evolution strategies. 

We shall call the number of parents of a generation ,andthenumber of descendants 

, so that the selection takes place between + = 1+1 = 2 individuals in the two membered 

strategy. Wethus characterize the simplest imitation of evolution in abbreviated 

notation as the (1+1) strategy. Since the multimembered evolution scheme described by 

Rechenberg allows a selection between > 1 parents and = 1 o spring it should be 

called the ( +1) strategy. Accordingly a more general form, a ( + )evolution strategy, 

should be formulated in such away that a basic population of parents of generation g 

produces o spring. The process of selection only allows the best of all + individuals 

to proceed as parents of the following generation, be they o spring of generation 

g or their parents. In this model it could happen that a parent, because of its vitality, 

is far superior to the other parents in the same generation, \lives" for a very long time, 

and continues to produce further o spring. This is at variance to the biological fact of a 

limited lifespan, or more precisely a limited capacity for reproduction. Aging phenomena 

do not, as far as is known, a ect biological selection (see Savage, 1966 Osche, 1972). As 

a further conceptual model, therefore, let us introduce a population in which parents 

produce > o spring but the parents are not included in the selection. Rather 

the parents of the following generation should be selected only from the o spring. To 

preserve a constant population size, we require that each time the best of the o spring 

become parents of the following generation. We will refer to this scheme in what follows 

as the ( , ) strategy. As for the (1+1) strategy in Section 5.1.1, the algorithm of the 

multimembered ( , ) strategy will rst be formulated in the language of biology. 


A given population consists of individuals. Each ischaracterized by its 

genotype consisting of n genes, which unambiguously determine the vitality, 

or tness for survival. 

Step 1: (Variation) 

Each individual parent produces = o spring on average, so that a total of 

new individuals are available. The genotype of a descendant di ers only 

slightly from that of its parents. The number of genes, however, remains to 

be n in the following, i.e., neither gene duplication nor gene deletion occurs. 

Step 2: (Filtering) 

Only the best of the o spring become parents of the following generation. 

In mathematical notation, taking constraints into account, the rules are as follows:



De ne x (0) 

k 

x (0) 

k 

= x (0) 

Ek =(x(0) 

k1:::x (0) 

kn) T for all k = 1(1) : 

= x(0) Ek is the vector of the kth parent Ek, suchthat Gj(x (0) 

k ) 0 for all k =1(1) and all j = 1(1)m: 

Set the generation counter g =0: 


Generate x (g+1) 

` 

= x (g+1) 

k + z (g + `) 

such thatGj(x (g+1) 

` ) 0 j =1(1)m ` =1(1) 

where k 2 [1 ] 

e.g., k = 

( if ` = p pinteger 

`(mod ) otherwise. 

x (g+1) 

` = x (g+1) 

N` 

=(x(g+1) `1 :::x (g+1) 

`n ) T is the vector of the `th o spring N` 

and z (g +`) is a normally distributed random vector with n components: 


Sort the x (g+1) 

` for all ` = 1(1) so that 

F (x (g+1) 

`1 ) F (x (g+1) 

`2 ) for all `1 = 1(1) `2 = + 1(1) 

Assign x (g+2) 

k 

= x (g+1) 

`1 for all k`1 = 1(1) : 

Increase the generation counter g g +1: 

Go to step 1, unless some termination criterion is ful lled. 

What happens in one generation for a (2 , 4) evolution strategy is shown schematically on 

the two dimensional contour diagram of a non-linear optimization problem in Figure 5.4. 

5.2.2 The Rate of Progress of the (1 , )Evolution Strategy 

In this section we attempt to obtain approximately the rate of progress of the multimembered, 

or ( , ) strategy{at least for = 1. For this purpose the n-dimensional 

sphere and corridor models, as used by Rechenberg (1973), are employed for calculating 

the progress for the (1+1) strategy. 

In the two membered evolution strategy ' was the expectation value of the useful 

distance covered in each mutation. It is convenient here to de ne the rate in terms of the 

number of generations. 

' = expectation value k^x ; x (g) k;k^x ; x (g;1) k 

where ^x is the vector of the optimum and x (g) is the average vector of the parents of 

generation g. 

From the chosen n-dimensional normal distribution of the random vector, which has 

expectation value zero and variance 2 for all independent vector components, the probability 

density for going from a point E with vector xE = (xE1:::xEn) T to another


x 2 

Circles : lines of 

constant probability 

density 

(g) 

E 

1 

(g) (g+1) 

N = E 

2 2 

(g) 

N 

1 

(g) 

N 

4 

Opt. 

(g) (g+1) 

N = E 

3 1 

(g) 

E 

2 

x 

1 

E : Parents 

k 

N : Offspring 

Figure 5.4: Multimembered (2 , 4) evolution strategy 

point N with vector xN =(xN1:::xNn) T is 

w(E ! N) = 

1 

p2 

! n 

exp 

The distance kxE ; xNk between xE and xN is 

kxE ; xNk = 

vu 

u 

t nX 

i=1 

; 1 

2 2 

nX 

i=1 

(xEi ; xNi) 2 

(g) : Generation index 

(xEi ; xNi) 2 

! 

(5.9) 

But of this, only a part, s = f(xExN), is useful in the sense of approaching the objective. 

To discover the total probability density forcovering a useful distance s, anintegration 

must be performed over the locus of points for which the useful distance is s, measured 

from the starting point xE. This locus is the surface of a nite region in n-dimensional 

space: 

Z Z 

p(s) = 

w(E ! N) dxN1 dxN2 ::: dxNn (5.10) 

f(xExN) =s 

The result of the integration depends on the weighting function f(xExN) andthus on 

the topology of the objective function F (x). 

So far only one random change has been considered. In the multimembered evolution 

strategy, however, the average over the best of the o spring must be taken, in which 

each of the o spring is to be associated with its own distance s`. We rst have to nd the 

probability density w (s 0 ) for the th best descendant of a generation to cover the useful 

distance s 0 .Itisacombinatorial product of


The probability density w(s`1 = s 0 ) that a particular descendant N`1 gets exactly 

s 0 closer to the objective 

The probability p(s`2 >s 0 ) that a descendant N`2 advances further than s 0 

The probability p(s`3 s 0 ) 

Y 

` +1 =1 

` +162f`1`2:::` g 

; X+2 

` 2 =1 

`26=`1 

( 

p(s`2 >s 0 ) 

X 

` =` ;1 +1 

` 62f`1`2:::` ;2g 

p(s` +1 s 0 ) 

p(s`3 >s 0 ) 

(5.11) 

w(s 0 )= 1 X 

w (s 0 ) (5.12) 

' = 

Z1 

s 0 =su 

=1 

s 0 w(s 0 ) ds 0 

(5.13) 

The meaning of su will be described later. 

To evaluate ', besides and , all components of the position vectors of all parents of 

the generation must be given, together with the values of for producing each descendant. 

If ' is to become independent of a particular initial con guration, it is necessary to 

de ne representative oraverage values of the relative positions of the parents, which are 

established during the optimization as a function of the topology. Todosowould require 

setting up and solving an integral equation. This has not yet been achieved. 

To be able to say something nevertheless about the rate of convergence some simplifying 

assumptions will be made. All parents will be represented by a single position vector 

xk, and the standard deviations ì will be assumed equal for all components i = 1(1)n 

and for the descendants ` = 1(1) . Equation (5.11) thereby simpli es to 

Since 

w (s 0 )= 

; 1 

; 1 

! 

w(s` = s 0 )[p(s` s 0 )] ;1 

p(s` >s 0 )+p(s`


and ; 1 

; 1 

we have 

w (s 0 )= 

! 

= 

( ; 1) ! 

( ; 1) ! ( ; )! 

! 

( ; 1)!( ; )! w(s` = s 0 )[p(s`


askew distribution this is not the case. Perhaps, however, the skewness is only slight, so 

that one can determine at least approximately the expectation value from the position of 

the maximum. 

Before treating the sphere and corridor models in this way,wewillcheck the usefulness 

of the scheme with an even simpler objective function. 

5.2.2.1 The Linear Model (Inclined Plane) 

The simplest way the objective function can depend on the variables is linearly. Imagining 

the function to be a terrain in the (n + 1)-dimensional space, it appears as an inclined 

plane. In the two dimensional projection the contours are straight, parallel lines in this 

case. Without loss of generality one can orient the coordinate system so that the plane only 

slopes in the direction of one axis x1 and the starting point or parent under consideration 

lies at the origin (Fig. 5.5). 

The useful distance s` towards the objective thatiscovered by descendant N` of the 

parent E is just the part of the random vector z lying along the x1 axis. Since the 

components zi of z are independent, we have 


p(s`


s 0 =su 

x 2 

z 

N 

E x 

E : Parent 

1 

N : th offspring 

s = z ,1 

Contours 

F (x) = const. 

Figure 5.5: The inclined plane model function 

To the minimum 

sublinearly however, probably proportional to the logarithm of .Tocompare the above 

approximation ~' with the exact value ' the following integral must be evaluated: 

' = 

Z1 

s0 p 

2 

exp ; s02 

2 2 

! " 

0 

1 

s 

1 + erf p 

2 

2 

!#! ;1 

ds 0 

For small values of the integration can be performed by elementary methods, but 

not for general values of . The value of ' was therefore obtained by simulation on the 

computer rst for the case in which the parent survives if the best of the descendants is 

worse than the parent ('sur with su = 0) and secondly for the case in which the parent 

is no longer considered in the selection ('ext with su = ;1). The two results are shown 

in Figure 5.6 for comparison with the approximate solution ~'. Itisimmediately striking 

that for only ve o spring the extinction of the parent has hardly any e ect on the rate 

of progress, i.e., for 5 it is as good as certain that at least one of the descendants 

will be better than the parent. The greatest di erences between 'sur and 'ext naturally 

appear when = 1. Whereas 'ext goes to zero, 'sur keeps a nite value. This can be 

determined exactly. Omitting here the details of the derivation, which is straightforward, 

the result is simply 

'sur( =1)=p 2 

The relationship to the (1+1) evolution scheme is thereby established. The di erences 

between the approximate theory ( ~') and the simulation ('ext) indicate that the assumption 

of the symmetry of w(s 0 ) is not correct. The discrepancy with regard to '= seems 

to tend to a constant value as increases. While the approximate theory is shown by this


2 

1 

0 

ϕ 

1 

Rate of progress for σ = 1 

ϕ 

ext 

(λ) 

ϕ 

sur 

(λ) 

ϕ ( λ) λ 

(1+1) 

5 10 15 20 25 

Simulation with 

“extinction” 

Simulation with 

“survival” 

(1, ) approximate theory 

Theory 

Number of offspring 

Figure 5.6: Rate of progress for the inclined plane model 

comparison to be poor for making exact quantitative predictions, it nevertheless correctly 

reproduces the qualitative relation between the rate of progress and the number of descendants 

in a generation. The probability distributions w(s 0 ) are illustrated in Figure 5.7 

for ve di erent values of 2f1 3 10 30 100g, according to Equation (5.16). 

For the inclined plane model the question of an optimal step length does not arise. The 

rate of progress increases linearly with the step length. Another question that does arise, 

however, is how tochoose the optimal number of o spring per parent in a generation. 

The immediate answer is: the bigger is, the faster the evolution advances. But in 

nature, since resources are limited (territory, food, etc.) it is not possible to increase 

the number of descendants arbitrarily. Likewise in applications of the strategy to solving 

problems on the digital computer, the requirements for computation time impose limits. 

The computers in common use today can only work in a serial rather than parallel way. 

Thus all the mutations must be produced one after the other, and the more descendants 

the longer the computation time. We should therefore turn our attention instead to nding 

the optimum value of '= . In the case where the parent survives if it is not bettered by 

any descendant, we have the trivial solution 

opt =1 

The corresponding value for the (1 , ) strategy is, however, larger. With Equation (5.17) 

λ


1.0 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

Probability density 

w(s’) for = 1 

Parameter = number 

of offspring 

one obtains from the requirement 

the relation 

= 1 

3 

−4 −2 0 2 4 

Useful distance s’ 

10 

30 

100 

Figure 5.7: Probability distribution w(s 0 ) 

@ 

@ 

~' 

= opt 

opt =~' @ 

@ ~' = opt 

! 

=0 

and, by substituting it back in Equation (5.17), the result 

opt =1+ 

s 

The value obtained iteratively is 

5.2.2.2 The Sphere Model 

2 opt 

exp 

1 

2 opt 

! 2 

= 

2 

~' 2 

0 

41+ erf@ 

1 

q 

2 opt 

opt ' 2:5 (as an integer: opt =2or3) 

We willnowtry to calculate the rate of progress for the simple spherically symmetrical 

model, which is of importance for considering the convergence rate properties of the 

strategy. The contours of the objective function F (x) are concentric hypersphere surfaces, 

given for example by 

F (x) = 

nX 

i=1 

x 2 

i = const: 

13 

A5


x 2 

r 

2 

a 

a 

r 1 

2 

1 

r E 

N 2 

s >0 

2 

E 

N 

Contours 

2 

F(x) = x + x = const 

1 2 

1 

s


For the distance covered towards the objective, s`, the portion is now calculated that 

contributes to an improvement of the objective function, i.e., in this case the radial difference 

s` = rE ; r` (see Fig. 5.8). The locus of all points N` for which s` is the same is 

the surface of the n-dimensional hypersphere about the origin with radius r` = rE ; s`. 

Accordingly the total probability density that a mutation (index `) starting from point 

E will cover the distance s` is the n-fold line integral: 

w(s`) = 

Z Z 

rE ; r` = s` 

1 

p2 

! n 

exp ; 1 

r2 

2 2 ` + r 2 

E ; 2 rE x`1 dx`1 :::dx`n 

By transforming to spherical coordinates one obtains a simple integral 

w(s`) = 

1 

p2 

! n 

n;1 

2 

; n;1 

2 

exp 

; r2 E + r2 ` 

2 2 

! 

r n;1 

` 

2Z 

=0 

exp rE r` cos 

2 

The remaining integral can be expressed as a modi ed Bessel function: 

w(s`) = 

r n 

2 

` 

r1; n 

2 

E 

2 

exp 

; r2 

E 

+ r2 

` 

2 2 

! 

I n 2 ;1 

rE r` 

To simplify the notation we nowintroduce the following de nitions: 

We thereby obtain 

w(s`) = a 

rE 

= n 

2 a= r2 E 

2 

v= r` 

a 

av2 

; ; 

e 2 v e 2 I ;1(av) with s` = rE (1 ; v) 

rE 

2 

sin n;2 

In order to use Equation (5.15) to calculate the total probability that the best of 

descendants will cover the distance 

the following quantities are still required: 

with 


s 0 =max 

` fs` j ` = 1(1) g = rE ; r 0 

w(s` = s 0 )= a 

r 0 

rE 

rE 

e ; a au2 ; 2 u e 2 I ;1(au) 

= u and s 0 = rE (1 ; u) 

p(s` s 0 ) 

=1; s0 

=1; uR 

R 

s`=rE 

v=0 

w(s`) ds` 

a av2 

; ; ae 2 v e 2 I ;1(av) dv 

d


This nally gives the probability function for the useful distance s0 covered in one generation, 

expressed in units of u: 

w(s 0 )= a 

rE 

e ; a au2 ; 2 u e 

0 

2 I ;1(au) @1 ; ae ; a 2 

Zu 

v=0 

av2 ; 

v e 2 I ;1(av) dv 

Since the expectation value of this distribution is not readily obtainable, we shall determine 

its maximum to give an approximation ~'. From the necessary condition 

with the more concise notation 

we obtain the relation 

=1+ @D(u) 

@u u=1; ~'=rE 

@w(s 0 ) 

@s 0 

s 0 =~' 

! 

=0 

D(y) =ae ; a ay2 

; 2 y e 2 I ;1(ay) 

0 

B 

[D(1 ; ~'=rE)] ;2 

@1 ; 

Z 

1; ~'=rE 

v=0 

1 

C 

1 

A 

;1 

D(v) dvA 

(5.18) 

Except for the upper limit of integration, this is the same integral that made it so di cult 

to obtain the exact solution for the rate of progress in the (1+1) evolution strategy (see 

Rechenberg, 1973). Under the condition 1and =a 1, which means for many 

variables and at a large enough distance from the optimum, Rechenberg obtained an 

estimate by expanding Debye's asymptotic series representation of the Bessel function 

(e.g., Jahnke-Emde-Losch, 1966) in powers of =a. Without giving here the individual 

steps in the derivation, the result is 

Z1 

D(v) dv ' 1 

" 

1 ; erf 

2 

!# 

p + 

2 a 2 

p 

a 

p 2 

v=0 

" 

exp 

; ( ; 1)2 

! 

; exp 

2 a 

2 

; 

2 a 

!# 

(5.19) 

It is clear from Equation (5.4) that the rate of progress of the (1+1) strategy for the 

two membered evolution varies inversely as the number of variables. Even if a higher 

convergence rate is expected from the multimembered scheme, with many descendants 

per parent, there will be no change in the relation to n, thenumber of parameters. In 

addition to the assumptions already made regarding and =a, without further risk to 

the validity of the approximate theory we can assume that 1 ; ~'=rE ' 1. Equation (5.19) 

can now also be applied here. 

For the partial di erential 

@D(u) 

@u u=1; ~'=rE 

we obtain with the use of the Debye series again: 

@D(u) 

@u u=1; ~'=rE 

= D(1 ; ~'=rE) 

" 

a exp 

a (1 ; ~'=rE) 

! 

1 

+ 

1 ; ~'=rE 

# 

; a (1 ; ~'=rE)


Figure 5.9: Rate of progress for the sphere model


If the result is substituted into Equation (5.18) a longer expression is obtained, of the 

form: 

= (~' rEn) 

In the expectation of an end result similar to Equation (5.4) and since a particular starting 

point rE is of no interest, we will introduce new variables: 

' = ~'n 

rE 

and = n 

If ~' and are now replaced by ' and , taking the limit 

lim 

n!1 

(' rEn) 

we nd that the quantities n and rE disappear from the parameter list of . ' and 

can therefore be regarded as \universal" variables. We obtain 

= (' )=1+ p ! 2 

! 3 2 " 

' 

p2 + p exp 4 ' 

p2 + p 5 1 + erf 

8 

8 

!# 

p 

8 

(5.20) 

As in the case of the inclined plane considered previously, this equation cannot be 

simply solved for ' . Figure 5.9 shows the family of curves ' = ' ( ). 

For ! 0, as expected, ' ! 0. For = 1, the rate of progress is always negative. 

Since the parent in the (1 , ) strategy is not included in the selection after it has served 

to produce a descendant, = 1 means that every mutation is adopted, whether better 

or worse. For the sphere model, except for = 0, the region of success is always smaller 

than half of the variable space. With increasing , the ratio becomes even worse ' is 

thus always 0, and more strongly negative the greater is . 

For 2 the rate of progress increases at rst as a function of the variance, reaches a 

maximum, and then decreases continuously until it becomes negative. From this behavior 

one can see even more clearly than in the (1+1) strategy how important the correct choice 

of variance is for the optimization. 

In the (1 , ) strategy, the progress can turn retrograde if all the o spring are worse 

than the parent that produced them. Only with an immortal parent having an in nite 

capacity for reproduction would progress be guaranteed or, at least, would retrogression 

be ruled out. We shall see later why the model with \extinction" is nevertheless advantageous. 

Except for small values of , the maximum rate of progress is almost the same 

in the \survival" and \extinction" cases. So if the optimal variance can be maintained 

throughout, leaving the parents out of the selection is not a disadvantage. 

The position of the maxima of ' with respect to at a constant is obtained by 

simple di erentiation and equating the partial derivative to zero. De ning 

the equation is 

opt 

p = 

8 + and 'max p 

2 opt 

rE 

= ' + 

+ (' + + + ) exp(; +2 )+ p ( + ; ' + ) 1 

2 +('+ + + ) 2 

1 + erf( + ) 

! 

=0 (5.21)


2.0 

1.0 

1 

Maximal universal rate 

of progress 

ϕ ( σ ) 

max opt 

(1+1) - theory 

Numer of offspring 

5 10 15 20 25 30 

Figure 5.10: Maximal rate of progress for the sphere model 

Points on the curve ' max = ' ( = opt) can only be obtained iteratively. To 

express = (' max), the non-linear system of equations consisting of Equations (5.20) 

and (5.21) must be solved. The results as obtained with the multimembered evolution 

strategy are shown in Figure 5.10. A convenient formula can only be obtained by assuming 

' + ' + i.e., 2 ' max ' 2 

opt 

This estimate is actually not far wrong, since the second term in Equation (5.21) goes to 

zero. We thus nd 

' 1+ q 'max exp('max ) 1 + erf 1 q 

'max (5.22) 

2 

a relation with comparable structure to the result for the inclined plane. 

Finally we ask whether ' max= has a maximum, as in the inclined plane case. If the 

parent can survive the o spring, opt = 1 here too if not the condition 

p 1 

opt =2 

2 +('+ + + ) 2 exp[(' + + + ) 2 ][1 + erf( + )] ' + 

must be added to Equations (5.20) and (5.21). The solution, obtained iteratively, is: 

opt ' 4:7 (as an integer: opt =5) 

λ 

(5.23)


Both the (1 , ) and (1+ )schemes were run on the computer for the sphere model, 

with n = 100rE = 100, and variable . In each case' was evaluated over 10 000 

generations. The resulting data are shown in terms of ' and in Figure 5.9. In 

comparison with the approximate theory, deviations are apparent mainly for > opt . 

The skewness of the probability distribution w(s 0 ) and the error in the estimate of the 

integral R D(y) dy have only a weak e ect in the region of greatest interest, where the 

rate of progress is maximum. Furthermore, the results of the simulation fall closer to 

the approximate theory if n is taken to be greater than 100 however, the computation 

time then becomes excessive. For large values of the possible survival of the parent 

only becomes noticeable when the variance is too large to allow rapid convergence. The 

greatest di erences, as expected, appear for =1. 

On the whole we see that the theory worked out here gives at least a qualitative 

account ofthebehavior of the (1 , ) strategy. Amuch more elegant method yielding an 

even better approximation may be found in Back, Rudolph, and Schwefel (1993), or Beyer 

(1993, 1994a,b). 

5.2.2.3 The Corridor Model 

As a third and last model objective function, we will now consider the right-angled corridor. 

The contours of F (x) in the two dimensional picture (Fig. 5.11) are straight and 

parallel, but not necessarily equidistant. 

F (x) =c0 + 

For the sake of simplifying the calculation we will again give the coordinate system a 

particular position and orientation with c1 = ;1 ci = 0 for all i = 2 3:::n: The 

right-angled corridor (Problem 2.37, see Appendix A, Sect. A.2){we are using here three 

dimensional concepts for the essentially n-dimensional case{is de ned by constraints of 

the form 

Gj(x) =jxjj b for j = 2(1)n 

It has the width 2 b for all coordinate directions xi i = 2(1)n hence the cross section 

(2 b) n;1 . As a starting point, the position xE of the parent E, wechoose the origin with 

respect to x1 = 0. The useful part of a random step is just its component z1 in the 

x1 direction, which is the negative gradient direction. The formulae for w(s` = s 0 ) and 

p(s`


2b 

N 2 

N 3 

s 2 < 0 

0 

x 2 

E 

s 1 > 0 

N 1 

Line of equal probability density 

Figure 5.11: Corridor model function 

= 1 

" 

erf 

2 

! 

b ; xEi 

p + erf 

2 

!# 

b + xEi 

p 

2 

Contours F(x) = const. 

Downwards 

x 1 

Allowed region 

Forbidden region 

That is, the probability depends on the current position xEi of the starting point E. We 

can only construct an average value for all possible situations if we know the probability 

pa of certain situations occurring. It could well be that, during the minimum search, 

positions near the border are occupied less often than others. The same problem of 

nding the occupation probability pa has arisen already in the theoretical treatment of 

the (1+1) strategy. Rechenberg (1973) discovered that 

pa = 1 

2 b (with respect to one of the variables xi i= 2(1)n) 

which is a constant independent of the current values of the variables. We will assume 

that this also holds here. Thus the average probability that one of the n ; 1 constrained 

variables will remain within the corridor can be given as: 

= 1 

4 b 

~p(jxìj b) = 

Zb 

xEi=;b 

" 

erf 

Zb 

xEi=;b 

! 

b ; xEi 

p + erf 

2 

pa p(jxìj b) dxEi 

!# 

b + xEi 

p dxEi 

2


Making use of the relation (see Ryshik and Gradstein, 1963) 

one nally obtains 

Zp 

y=0 

erf( y) dy = p erf( p)+ exp(; 2 p2 ) ; 1 

p 

~p(jxìj b) = erf 

p ! 

2 b 

+ 1 p 

2 b 

In the following we refer to this expression as item v. 

v =~p(jxìj b) 

" 

exp 

; 2 b2 

2 

! 

# 

; 1 

(5.24) 

With the above de nition of v, the total probability that a descendant N` is feasible, i.e., 

that it satis es all the constraints, is 

pf eas = 

and the probability that N` is lethal is 

nY 

i=2 

= v n;1 

~p(jxìj b) 

pleth =1; pf eas =1; v n;1 

Only non-lethal mutants come into consideration as parents of the next generation. Hence, 

instead of w(s` = s0 )wemust insert into Equation (5.15) the expression 

w(s` = s 0 ) pf eas = 1 

p 2 

and instead of p(s`


outcome would be extinction of the population and the rate of progress would no longer 

be de ned. The probability of extinction of the population is given by the product of the 

lethal probabilities: 

pstop =(1; v n;1 ) 

To be able to optimize further in such situations let us adopt the following procedure: If 

all the mutations lead to forbidden points, the parent willsurvive and produce another 

generation of descendants. Thus for this generation the rate of progress takes the value 

zero. Equation (5.25) then only holds for s 0 6= 0 and we must reformulate the probability 

of advancing by s 0 in one generation as follows: 

where 

w(s 0 )= ~w(s 0 )+ pstop 

= 

( 0 if s 0 6=0 

1 if s 0 =0 

The distribution w(s 0 ) is no longer continuous, and even if w 0 (s 0 ) is symmetricwe cannot 

assume that the maximum of the distribution is a useful approximation to the average 

rate of progress (Fig. 5.12). The following condition must be satis ed: 

Z1 

s 0 =;1 

w(s 0 ) ds 0 = 

Z1 

s 0 =;1 

~w(s 0 ) ds 0 + wstop =1 (5.26) 

We can think of w(s 0 ) as a superposition of two density distributions, with conditional 

mathematical expectation values 


and with associated frequencies 


p1 = 

'1 = 

Z1 

s 0 =;1 

Z1 

s 0 =;1 

'2 =0 

s 0 ~w(s 0 ) ds 0 

~w(s 0 ) ds 0 =1; pstop 

p2 = pstop 

The events belonging to the two density distributions are mutually exclusiveandby virtue 

of Equation (5.26) they make up together a complete set of events. The expectation value 

is then given by (e.g., Gnedenko, 1970 Sweschnikow ,1970). 

' = 

Z1 

s 0 =;1 

s 0 w(s 0 ) ds 0 = '1 p1 + '2 p2 = '1 (1 ; pstop)


w(s’) 

Probability density 

p 

stop 

s’ = 0 

w(s’= ~ 0) 

w(s’) 

s’ 

Useful distance covered 

Figure 5.12: Estimation of the rate of progress from the probability density for 

the corridor model 

Since we are unable to calculate '1 directly, we make an approximation: 

taking for ^' the position of the maximum of ~w(s 0 ). 

We require 

~' =^' (1 ; pstop) = ^'[1 ; (1 ; v n;1 ) ] (5.27) 

@ ~w(s 0 ) 

@s 0 

s 0 =^' 

! 

=0 

By di erentiating Equation (5.25) and setting the rst derivative to zero: 

=1+ p ^' 

p2 

exp 

^' 2 

2 2 

!" 

1+ erf 

^' 

p2 

! 

+2(v 1;n ; 1) 

# 

(5.28) 

Apart from an extra term, this formula is similar to the relation = (~' ) found for 

the inclined plane (Equation (5.17)). The main di erence here, however, is that in place 

of ~' ^' appears, as de ned by Equation (5.27). 

As in the case of the sphere model, we willintroduce here \universal parameters" 

' = ~'n 

b 

and = n 

b 

and take the limit n !1in order to arrive at a practically useful relation = (' ).


With the new quantities ' and , Equation (5.24) for v becomes 

p ! 

2 n 

v = erf ; p 

2 

" 

1 ; exp 

n 

; 2 n2 

!# 

2 

Since the argument of the error function increases as n, thenumber of variables, the 

approximation 

erf(y) ' 1 ; 1 

p exp (;y 

y 2 ) 

can be used to give 

and with 

nally 

v =1; n p 2 

lim 

n!1 

1+ 1 

n 

v 1;n =exp 

The desired relation = (' )isthus 

=1+ 

p ~' 

p 2 

exp 

2 

4 

~' 

p2 

in which, from Equation (5.27), 

~' = 

! 2 3 

5 

" 

erf 

n 

p 2 

~' 

p2 

' 

for n 1 

= e 

! 

! 

1 ; h 1 ; exp ;p 2 

+ 2 exp 

i 

p 2 

! 

# 

; 1 

(5.29) 

Pairs of values obtained iteratively are shown in Figure 5.13 together with simulation 

results for the cases of \survival" and \extinction" of the parent (n = 100 b = 100, 

average over 10 000 successful generations). 

As in the case of the sphere model, the deviations can be attributed to the simplifying 

assumptions made in deriving the approximate theory. For = 1' is always zero if 

the parent is not included in the selection. The transition to the inclined plane model is 

correctly reproduced in this respect. Negative rates of progress cannot occur. 

The position of the maxima ' max = ' ( = opt) at constant are obtained in the 

same way as for the sphere model. The condition to be added to Equation (5.29) is 

c exp (; + )[1; exp (; + )] ;1 ; 1 

h erf(' + )+2exp( + ) ; 1 ih 1+2' +2 i + 2 

p ' + exp (;' +2 ) 

in which the following new quantities are introduced again for compactness: 

+ 

! 

! 

+2exp( + ) ! =0 

(5.30)


Figure 5.13: Rate of progress for the corridor model 

j 

j


7 

6 

5 

4 

3 

2 

1 

0 

1 

Maximal universal rate 

of progress 

ϕ * ( σ * ) 

max opt 

(1+1) - theory 

Number of descendants 

10 20 30 40 50 λ 60 

Figure 5.14: Maximal rate of progress for the corridor model 

+ 

= 

opt 

p 

2 

' + = 

'max p 

2 opt c 

" 

c = 1 ; 1 ; exp 

; opt 

p 

2 

Pairs of values found by iteration are shown in Figure 5.13. Figure 5.14 shows ' max 

versus .To determine opt for the (1 , ) strategy, i.e., the value of for which ' max= is 

a maximum, it is necessary to solve the system of three non-linear equations, comprising 

Equation (5.29), Equation (5.30), and 

The result is 

opt = ' + np exp (' +2 )[ erf (' + ) + 2 exp ( + ) ; 1][1+2' +2 ]+2' + o 

!# 

( opt 

c [1 ; exp (; + )] ln[1 ; exp(; + )]+ 1 

opt ' 6:0 (as an integer: opt =6) 

) 

(5.31)


5.2.3 The Step Length Control 

How should one proceed in order to still achieve the maximum rate of progress, i.e., 

2 

to maintain the optimum variances i i = 1(1)n, for the case of the multimembered 

evolution scheme? For the (1+1) strategy this aim was met by the1=5 success rule, 

which was based on the probability of success at maximum convergence rate of the sphere 

and corridor model functions. Such control from outside the actual mutation-selection 

game does not correspond to the biological paradigm. It should rather be assumed that 

the step lengths, or more precisely the variances, have adapted and are still adapting to 

circumstances arising in the course of natural evolution. Although the environmentally 

induced rate of mutation cannot be interfered with directly, the existence of mutator 

genes and repair enzymes strongly suggests that the consequences of such environmental 

in uences are always reduced to the appropriate level. In the multimembered evolution 

strategy the fact that the observed rates of mutation are also small, indeed that they 

must be small to be optimal, comes out of the universal rate of progress and standard 

deviation introduced above, which require to be inversely proportional to the number 

of variables, as in the (1+1) strategy. 

If we wish to imitate organic evolution, we can proceed as follows. Besides the variables 

xEi i= 1(1)n, a set of parameters Ei i = 1(1)n, is assigned to a parent E. These 

describe the variances of the random changes. Each descendant N` of the parent E should 

di er from it both in xì and ì. The changes in the variances should also be random 

and small, and the most probable case should be that there is no change at all. Whether 

a descendant can become a parent of the next generation depends on its vitality, thus 

only on its xì. Whichvalues of the variables it represents depends, however, not only 

on the xEi of the parent, but also on the standard deviations ì, whichaect the size of 

the changes zi = xì ; xEi. In this way the \step lengths" also play an indirect rôle in 

the selection mechanism. 

The highest possible probability that a descendant is better than the parent is normally 

wemax =0:5 

It is attained in the inclined plane case, for example, and for other model functions in the 

limit of in nitely small step lengths. In order to prevent that a reduction of the i always 

gives rise to a selection advantage, must be at least 2. But the optimal step lengths 

can only take e ect if 

> 1 

weopt 

This means that on average at least one descendant represents an improvement of the 

value of the objective function. The number of descendants per parent thus plays a 

decisive rôle in the multimembered scheme, just as does the check on the success ratio in 

the two membered evolution scheme. For comparison let us tabulate here the opt of the 

(1 , ) strategy and weopt of the (1+1) strategy for the three model functions considered. 

The values of weopt are taken from the work of Rechenberg (1973).


Model function 

Inclined plane 

Sphere 

weopt 

1 

2 

0:27 

1 

weopt 2 

3.7 

opt 

2.5 

4.7 

Corridor 

1 

2e 5.4 6.0 

How should the step lengths now be altered? We shall rst consider only a single 

variance 2 for changes in all the variables. In the production of the random changes, 

the standard deviation is always a positive factor. It is therefore reasonable to generate 

new step lengths from the old by amultiplicative rather than additive process, according 

to the scheme 

(g) 

N = (g) 

E Z (g) 

(5.32) 

The median of the random distribution for the quantity Z must equal one to satisfy the 

condition that there is no deterministic drift without selection. Furthermore an increase 

of the step length should occur with the same frequency as a decrease more precisely,the 

probability of occurrence of a particular random value must be the same as that of its reciprocal. 

The third requirement is that small changes should occur more often than large 

ones. All three requirements are satis ed by the log-normal distribution. Random quantities 

obeying this distribution are obtained from (0 2 ) normally distributed numbers Y 

by the process 

Z = e Y (5.33) 

The probability distribution for Z is then 

w(z) = 1 

p 2 

1 

z exp 

; (ln z)2 

2 2 

! 

The next question concerns the choice of ,andwe shall answer it, in the same way as 

for the (1+1) strategy, with reference to the rate of change of step lengths that maintains 

the maximum rate of progress in the sphere model. Regarding ' as a di erential quotient 

;dr=dg leads to the relation (see Sect. 5.1.2) 

(g+1) 

opt 

(g) 

opt 

=exp ; ' max 

n 

(5.34) 

for the optimal step lengths of two consecutive generations, where ' max now has a different, 

larger value that depends on and . The actual size of the average changes in 

the variances, using the proposed mutation scheme based on Equations (5.32) and (5.33), 

depends on the topology of the objective function and the number of parents and descendants. 

If n, thenumber of variables, is large, the optimal variance will only change 

slightly from generation to generation. We will therefore assume that the selection in 

any generation is more or less indi erent to reductions and increases in the step length. 

We thereby obtain the multiplicative change in the random quantity X, averaged over n 

generations: 

X = 

0 

@ nY 

g=1 

Z (g) 

1 

A 

1 

n 

= exp 

0 

@ 1 

n 

nX 

g=1 

Y (g) 

1 

A


Since the Y (g) are all (0 2 ) normally distributed, it follows from the addition theorem 

of the normal distribution (Heinhold and Gaede, 1972) that 

1 

n 

nX 

g=1 

isa(0 2 =n) normally distributed random quantity. Accordingly, the two quantities 

exp( = p n)arecharacteristic of the average changes (minus sign for reduction) in the 

step lengths per generation. The median of w(z) isofcoursejuste 0 =1.Together with 

Y (g) 

Equation (5.34), our observation leads us to the requirement 

or 

exp ' max 

n 

' exp 

' 'max p 

n 

p n 

! 

(5.35) 

The variance 2 of the normally distributed random numbers Y , from which the lognormally 

distributed random multipliers for the standard deviations (\step sizes") of the 

changes in the object variables are produced, thus must vary inversely as the number of 

variables. Its actual value should depend on the expected rate of convergence ' and 

hence on the choice of the number of descendants . 

Instead of only one common strategy parameter ,each individual can now have a 

complete set of n di erent i i = 1(1)n, for every alteration in the corresponding n 

object variables xi i=1(1)n. The two following schemes can be envisioned: 

or 

(g) (g) (g) 

Ni = Ei Z i 

(g) 

Ni = (g) 

Ei Z (g) 

i 

Z (g) 

0 

(5.36) 

(5.37) 

But only the second one should be taken into further consideration, because otherwise in 

the case of n 1theaverage overall step size of the o spring 

sN = 

vu 

u 

t nX 

i=1 

2 

Ni 

could not be substantially di erent from that of its parent 

sE = 

vu 

ut nX 

due to the levelling e ect of the many randommultiplication events (law of large number 

of events). In order to split the mutation e ects to the overall step size and the individual 

step sizes one could choose 

0 ' 

' 

i=1 

2 

Ei 

p 

' 

for Z0 (5.38) 

2 n 

' p p for all Zi i= 1(1)n (5.39) 

2 n


We shall not go into further details since another kind of individual step length control 

will o er itself later, i.e., recombination. 

At this point afurtherword should be said about the alternative (1+ )or(1, ) 

strategies. Let us assume that by virtue of a jump landing far from the expectation value, 

a descendant has made a very large and useful step towards the optimum, thus becoming 

a parent of the next generation. While the variance allocated to it was eminently suitable 

for the preceding situation, it is not suited to the new one, being in general much too 

big. The probability that one of the new descendants will be successful is thereby low. 

Because the (1+ ) strategy permits no worsening of the objective function value, the 

parent survives{and may do so for many generations. This increases the probability ofa 

successful mutation still having a poorly adapted step length. In the (1 , ) strategy such 

a stray member will indeed also turn up in a generation, but it will be in e ect revoked in 

the following generation. The descendant that regresses the least survives and is therefore 

probably the one that most reduces the variance. The scheme thus has better adaptation 

properties with regard to the step length. In fact this phenomenon can be observed in the 

simulation. Since we have seen that for 5 the maximum rate of progress is practically 

independent of whether or not the parent survives, we should favor a ( , ) strategy, at 

least when = is not chosen to be very small, e.g., less than 5 or 6. 

5.2.4 The Convergence Criterion for > 1 Parents 

In Section 5.2.2 wewere really looking for the rate of progress of a ( , )evolution method. 

Because of the analytical di culties, however, we had to fall back onthe = 1 case, 

with only one parent. We shall now proceed again on the assumption that > 1. In 

each generation state vectors xE and associated step lengths are stored, which should 

always be the best of the mutants of the previous generation. We naturally require 

more storage space for doing this on the computer, but on the other hand we havemore suitable values at our disposal for each variable. Supposing that the topology of the 

objective function is complicated or even \pathological," and an individual reaches a point 

that is unfavorable to further progress, we still have su cient alternative starting points, 

which mayeven be much more favorable. According to the usefulness of their parameter 

sets, some parents place more mutants in the prime group of descendants than others. 

In general the best individuals of a generation will di er with respect to their variable 

vectors and objective function values as long as the optimum has not been reached. This 

provides us with a simple convergence criterion. 

From the population of 

function value: 

parents Ek k = 1(1) ,welet Fb be the best objective 

Fb = min fF (x 

k (g) 

k )k= 1(1) g 

and Fw the worst 

Fw = max 

k 

Then for ending the search we require that either 

fF (x(g) 

k )k= 1(1) g 

Fw ; Fb "c


or 

"d 

(Fw ; Fb) 

where "c and "d are to be de ned such that 

"c > 0 

1+"d > 1 

) 

X 

k=1 

F (x (g) 

k ) 


Either absolutely or relatively, the objective function values of the parents in a generation 

must fall closely together before convergence is accepted. The reason for basing the 

criterion on function values, rather than variable values or step lengths, has already been 

discussed in connection with the (1+1) strategy (see Sect. 5.1.3). 

5.2.5 Scaling of the Variables by Recombination 

The ( , ) method opens up the possibility of imitating a further principle of organic 

evolution, which is of particular interest from the point ofviewofnumerical optimization 

problems, namely sexual propagation. By combining the genes of twoparents a new source 

of variation is added to point mutation. The fact that only a few primitive organisms do 

without this mechanism of recombination leads us to expect that it is very favorable for 

evolution. Instead of one vector x (g) 

E now there are distinct vectors x (g) 

k for k = 1(1) 

in a population. In biology, the totality of all genes in a generation is known as a gene 

pool. Among the concerns of population genetics (e.g., Wilson and Bossert, 1973) is the 

frequency distribution of certain alleles in a population, the so-called gene frequencies. 

Until now, we did not argue on that level of detail, nor did we godown to the oor of only 

four nucleic acids in order to model, for example, the mutation process within evolution 

strategies. This might beworthwhile for quaternary optimization, but not in our case of 

continuous parameters. It would be a tedious task to model all the intermediate processes 

from nucleic acids to proteins, cell, organs, etc., taking into account the genetic code and 

the whole epigenetic apparatus. We shall now apply the principle of recombination to 

numerical optimization with continuous parameters, once again in a simpli ed fashion. 

In our population of parents we have stored di erent values of each component 

xi i = 1(1)n. From this gene pool we now draw one of the values of xi for each 

i = 1(1)n. The draw should be random so that the probability that an xi comes from any 

particular parent(k) of the is just 1= for all k = 1(1) . The variable vector constructed 

in this way forms the starting point for the subsequent variation of the components. The 

Figure 5.15 should help to clarify that kind of global recombination. 

By imitating recombination in this way wehave, so as to speak, replaced bisexuality 

by multisexuality. This was less for reasons of principle than as a result of practical 

considerations of programming. A crude test yielded only a slight further increase in 

the rate of progress in changing from the bisexual to the multisexual scheme, whereas 

appreciable acceleration was achieved by introducing the bisexual in place of the asexual 

scheme, which allowed no recombination. A more detailed and exact comparison has yet 

to be carried out. Without some guidance from theory it is hard to choose the correct 

initial step lengths and rates of change of step lengths for each of the di erent algorithms.


Parents of 

generation g 

Descendants 

x x x . . . 

x 

1 2 3 n 

Discrete global recombination 

Figure 5.15: Scheme of global uniform recombination 

Recombination 

by choosing 

components 

columnwise 

and at random 

This is, however, the only way to arrive atquantitative statements, free from confusing 

side e ects. 

It is thus hard to explain the origin of the accelerating e ect of recombination. It 

may, for example, lie in the fact that instead of di erent starting points, the bisexual 

scheme o ers 

2 + ( 

n;2 X 

; 1) 2i possible combinations in the case of n variables. With multirecombination, as chosen 

here, there are as many as n ,which is far more than could be put into e ect. A more 

detailed investigation may be found in Back (1994a). 

So far we have only considered recombination of the object variables, but the strategy 

variables, the step lengths, can be recombined in just the same way. Even if all the parents 

start with equal i = for all i = 1(1)n, and if all the step length components are varied 

by a common random factor in the production of descendants, the variances i of all the 

individuals for each i = 1(1)n di er from each other in the subsequent generations. 

Thus by recombination is it possible for the step lengths to adapt individually in 

this way to circumstances. A better combination a ords a higher chance of survival to 

its bearer. It can therefore be expected that in the course of the optimum search, the 

currently best combination of the f i i= 1(1)ng prevails{the one that is associated with 

the fastest rate of progress. In attempting to verify this in a practical test, an unpleasant 

phenomenon occurs. It can happen that one of the standard deviations i is suddenly 

i=1


(e.g., by a random value very far from the expectation value) so much reduced in size that 

the associated variable xi can now hardly be changed. The total change in the vector x is 

then roughly speaking within an (n ; 1)-dimensional subspace of IR n .Contrary to what 

one might hope, that such a descendant would have less chance of surviving than others, 

it turns out that the survival of such a descendant is actually favored. The reason is that 

the rate of progress with an optimal step length is proportional to 1=n. If the number 

of variables n decreases, the rate of convergence, together with the optimal step length, 

increases. The optimum search therefore only proceeds in a subspace of IR n . Not until 

the only improvement in the objective function entails changing the variable that has 

hitherto been omitted from the variation will the mutation-selection mechanism operate 

to increase its associated variance and so restore it to the range for which noticeable 

changes are possible. 

The minimum search proceeds by jumps in the value of the objective function and 

with rates of progress that vary alternately above and below whatwould otherwise be 

smooth convergence. Such unstable behavior is most pronounced when , the number 

of parents, is small. With su ciently large the reserve of step length combinations 

in the gene pool is always big enough to avoid overadaptation, or to compensate for it 

quickly. From an experimental study (Schwefel, 1987) the conclusion could be drawn 

that punctuated equilibrium evolution (Gould and Eldredge, 1977, 1993) can be avoided 

by using a su ciently large population ( >1) and a su ciently low selection pressure 

( = ' 7). A further improvement can be made by using as the starting point inthe 

variation of the step lengths the currentaverage of two parents' variances, rather than the 

value from only one or the other parent. This measure too has its biological justi cation 

it represents an imitation of what is called intermediary recombination (instead of discrete 

recombination). 

In this context chromosome mutations should be very e ective, those in which for 

example, the positions of two individual step lengths are exchanged. As well as the haploid 

scheme of inheritance on which the presentwork is based, some forms of life also exhibit the 

diploid scheme. In this case each individual stores two sets of variable values. Whilst the 

formation of the phenotype only makes use of one allele, the production of o spring brings 

both alleles into the gene pool. If both alleles are the same one speaks of homozygosity, 

otherwise of heterozygosity. Heterozygote alleles enlarge the set of variants in the gene 

pool and thus the range of possible combinations. With regard to the stability of the 

evolutionary process this also appears to be advantageous. The true gain made by diploidy 

only becomes apparent, however, when the additional evolutionary factors of recessiveness 

and dominance are included. For multiple criteria optimization, the usefulness of this 

concept has been demonstrated by Kursawe (1991, 1992). Many possible extensions of 

the multimembered scheme have yet to be put into practice. To nd their theoretical 

e ect on the rate of progress, one would rst have to construct a theory of the ( , ) 

strategy for > 1. If one goes beyond the = 1 scheme followed here, signi cant 

di erences between approximate theory and simulation results arise for >1 because of 

the greater asymmetry of the probability distribution w(s 0 ).


5.2.6 Global Convergence 

In our discussion of deterministic optimization methods (Chap. 3) we haveestablished that only simultaneous strategies are capable of locating with certainty global minima 

of arbitrary objective functions. The computational cost of their application increases 

with the volume of the space under consideration and thus with the power of n. The 

dynamic programming technique of Bellman allows the reliabilityofglobalconvergence to 

be maintained at less cost, but only if the objective function has a rather special structure, 

such that only a part of the space IR n needs to be investigated. Of the stochastic search 

procedures, the Monte-Carlo method has the best chance of global convergence it o ers a 

high probability rather than certainty of nding the global optimum. If one requires a 90% 

probability, its cost is greater than that of the equidistant grid search. However, the (1+1) 

evolution strategy can also be credited with a nite probability of global convergence if the 

step lengths (variances) of the random changes are held constant (see Rechenberg, 1973 

Born, 1978 Beyer, 1989, 1990). How great the chance is of nding an absolute minimum 

among several local minima depends on the topology, in particular on the disposition and 

\width" of the minima. 

If the user wishes to realize the possibility of a jump from a local to a global extremum, 

it requires a trial of patience. The requirement of approaching an optimum as quickly and 

as accurately as possible is always diametrically opposed to maintaining the reliability of 

global convergence. In the formulation of the algorithms of the evolution strategies we 

have mainly strived to satisfy the rst requirement of rapid convergence, by adaptation 

of the step lengths. Thus for both strategies no claims can be made for good global 

convergence properties. 

With > 1 in the multimembered evolution scheme, several state vectors x (g) 

k 2 

IR n k = 1(1) are stored in each generation g. If the x (g) 

k are very di erent, the 

probability is greater that at least one point is situated near the global optimum and that 

the others will approach it in the process of generation. The likelihood of this is less if 

the x (g) 

k fall close together, with the associated reduction in the step lengths. It always 

remains nite, however, and increases with ,thenumber of parents. This advantage 

over the (1+1) strategy is best exploited if one starts the search with initial vectors x (0) 

k 

roughly evenly distributed over the whole region of interest, and chooses fairly large initial 

(0) 

values of the standard deviations k 2 IRn k = 1(1) . Here too the ( )scheme is 

preferable to the ( + ) because concentration at a locally very favorable position is at 

least delayed. 

5.2.7 Program Details of the ( + ) ES Subroutines 

Appendix A, Section A.2 contains FORTRAN listings of the multimembered ( + ) 

evolution strategy developed here, with the alternatives 

GRUP without recombination 

REKO with recombination (intermediary recombination for the step lengths) 

KORR the so far most general form with correlated mutations as well as ve 

di erent recombination types (see Chap. 7)


In the choice of (number of parents) and (number of descendants) there is no 

need to ensure that is exactly divisible by . The association of descendants to parents 

is made by a random selection of uniformly distributed random integers from the range 

[1 ]. It is only necessary that exceeds by a su cient margin that on average at least 

one descendant can be better than its parent. From the results of Section 5.2.3 a suitable 

choice would be for example 6 . 

The transformation from [0 1]evenly distributed random numbers to (0 2 ) normally 

distributed pseudorandom numbers is carried out in the same way as in subroutine EVOL 

of the (1+1) strategy (see Sect. 5.1.5). The log-normally distributed variance multipliers 

are produced by the exponential function. The step lengths (standard deviations of the 

individual random components) can initially be speci ed individually. During the subsequent 

process of generation they satisfy the constraints 

(g) 

i 

(g) 

i 

"a 

) 

and "b jx (g) 

where 


for all i = 1(1)n 

i j 

) 

"a > 0 


1+"b > 1 

can be speci ed in advance. 

The parameter which in uences the average rate of change of the step lengths 

should be given a value roughly proportional to 1= p nincaseoftwofactors (the case 

to be preferred), a global and an individual one, the values given in Section 5.2.3 are 

recommended. The constant of proportionality depends mainly on another adjustable 

feature, = , whichmaybe called the selection pressure. For a (10 , 100) strategy it should 

be set at about unity to allow the fastest convergence of simple optimization problems like 

the hypersphere. With increasing this value ' can be changed sublinearly according 

to 

p ' 

' e 

(compare Equation (5.22)). 

(0) 

If the initial step lengths i are chosen to be too large, what may havebeen an 

especially well situated starting point x (0) can be thrown away. Nevertheless, this step 

backwards in the rst generation works in favor of reaching a global minimum among 

several local minima. In principle, for > 1eachofthedi erent starting vectors 

x (0) 

k 2 IRn and (0) 

k 2 IRn k = 1(1) can be speci ed. In the present program this 

di erentiation of the parent generation is carried out automatically the x (0) 

k are produced 

from x (0) by addition of (0 ( (0) ) 2 ) normally distributed random vectors. The (0) (0) 

k = 

are initially equal for all parents. 

The convergence criterion is described in Section 5.2.4. It is based on the di erence 

in objective function values between the current best and worst parents of a generation. 

As accuracy parameters, an absolute and a relativequantity ("c and "d) must be speci ed 

(compare Sect. 5.1.3). Furthermore, an upper bound on the computation time for the 

search can be given so that whatever the outcome results can be output from the main 

program (see also Sect. 5.1.5). 

Inequality constraints are treated as described for subroutine EVOL (Sect. 5.1.4) so

Genetic Algorithms 151 

too is the case of the starting point x (0) lying outside the feasible region. 

Whereas the subroutine GRUP with option REKO has been taken into account in 

the test series of Chapter 6, this is not so for the third version KORR, which was created 

later (Schwefel, 1974). Still, more often than any multimembered version, the (1+1) 

strategy has been used in practice. Nonetheless it has proved its usefulness in several 

applications: for example, in conjunction with a linearization method for minimizing 

quadratic functions in surface tting problems (Plaschko andWagner, 1973). In this case 

the evolution process provides useful approximate values that enable the deterministic 

method to converge. It should also serve to locate the global minimum of the multimodal 

objective function. Another practically oriented multiparameter case was to nd the 

optimum weight disposition of lightweight rigidly jointed frameworks (Ho er, Ley ner, 

and Wiedemann, 1973 Ley ner, 1974). Here again the evolution strategy is combined 

with another method, this time the simplex method of linear programming. Each strategy 

is applied in turn until the possible improvements remaining at a step are very small. 

The usefulness of this procedure is demonstrated by checking against known solutions. 

A third example is provided by Hartmann (1974), who seeks the optimal geometry of 

a statically loaded shell support. He parameterizes the functional optimization problem 

by assuming that the shape of the cross section of the cylindrical shell is described by a 

suitable polynomial. Its coe cients are to be determined such that the largest absolute 

value of the transverse moment is as small as possible. For various cases of loading, 

Hartmann nds optimal shell geometries di ering considerably from the shape of circular 

cylinders, with sometimes almost vanishingly small transverse moments. More examples 

are mentioned in Chapter 7. 

5.3 Genetic Algorithms 

At almost the same time that evolution strategies (ESs) were developed and used at the 

Technical University of Berlin, two other lines of evolutionary algorithms (EAs) emerged 

in the U.S.A., all independently of each other. One of them, evolutionary programming 

(EP), was mentioned at the end of Chapter 4 and goes back to the workofL.J.Fogel 

(1962 see also Fogel, Owens, and Walsh, 1965, 1966a,b). For a long time, activity on this 

front seemed to have become quiet. However, in 1992 a series of yearly conferences was 

started by D.B.Fogel and others (Fogel and Atmar, 1992, 1993 Sebald and Fogel, 1994) 

to disseminate recent results on the theory and applications of EP. Since EP uses concepts 

that are rather similar to either ESs or genetic algorithms (GAs) (Fogel, 1991, 1992), it 

will not be described in detail here, nor will it be compared to ESs on the basis of test 

results. This was done in a paper presented at the second EP conference (Back, Rudolph, 

and Schwefel, 1993). Similarly, contributions to comparing ESs and GAs in detail may 

be found in Ho meister and Back (1990, 1991, 1992 see also Back, Ho meister, and 

Schwefel, 1991 Back andSchwefel, 1993). 

The third line of EAs mentioned above, genetic algorithms, has become rather popular 

today and di ers from the others in several aspects. This approach will be explained in 

the following according to its classical (also called canonical) form. 

Even to attentive scientists, GAs did not become apparent before 1975 when the rst


book of Holland (1975) and the dissertation of De Jong (1975) were published. Thus this 

work was unknown in Europe at the time when Rechenberg's and the author's dissertations 

were completed and, later on, published as books. Only 10 years later, however, in 

1985, a series of biennial conferences (ICGA, International Conferences on Genetic Algorithms) 

has been started (Grefenstette, 1985, 1987 Scha er, 1989 Belew and Booker, 

1991 Forrest, 1993) to bring together those who are interested in the theory or application 

of GAs. On the Eastern side of the Atlantic, a similar revival of the eld began in 

1990 with the rst conference on parallel problem solving from nature (PPSN) (Schwefel 

and Manner, 1991 Manner and Manderick, 1992 Davidor, Schwefel, and Manner, 1994). 

During the PPSN 90 and the ICGA 91 events, proponents of GAs and ESs agreed upon 

the common denominators evolutionary algorithms (EAs) for both approaches as well as 

evolutionary computation (EC) for a new international journal (see De Jong, 1993). The 

latter term has been adopted among others by the Institute of Electrical and Electronics 

Engineers (IEEE) for an international conference during the 1994 World congress on computational 

intelligence (WCCI). Surveys of the history have been attempted by De Jong 

and Spears (1993) and Spears et al. (1993). As forerunners of the genetic simulation, 

Fraser (1957), Friedberg (1958), and Hollstien (1971) should at least be mentioned here. 

5.3.1 The Canonical Genetic Algorithm for Parameter 

Optimization 

Even if the originators of the GA approach emphasized that GAs were designed for general 

adaptation processes, most applications reported up to now concern numerical optimization 

by means of digital computers, including discrete as well as combinatorial optimization. 

Books by Ackley (1987), Goldberg (1989), Davis (1987, 1991), Davidor (1990), 

Rawlins (1991), Michalewicz (1992, 1994), Stender (1993), and Whitley (1993) may serve 

as sources for more details in this eld. As for so-called classi er systems (CS see Holland 

et al., 1986) and genetic programming (GP see Koza, 1992), two very interesting special 

areas of evolutionary computation{in which GAsplay an important rôle in searching 

for production rules in so-called knowledge-based systems and for correct expressions in 

computer programs, respectively{the reader must be referred to the relevant andvast 

literature (Alander, 1994 he compiled more than 3,000 references). 

The GA for parameter optimization usually has been presented in the following general 

form: 


Agiven population consists of individuals. Each ischaracterized by its 

genotype consisting of n genes, which determine the vitality, or tnessfor 

survival. Each individual's genotype is represented by a (binary) bit string, 

representing the object parameter values either directly or by means of an 

encoding scheme. 


Two parents are chosen with probabilities proportional to their relative position 

in the current population, either measured by their contribution to the


mean objective function value of the generation (proportional selection) orby 

their rank (e.g., linear ranking selection). 

Step 2: (Recombination) 

Two di erent preliminary o spring are produced by recombination of two 

parental genotypes by means of crossover at a given recombination probability 

pc only one of those o spring (at random) is actually taken into further 

consideration. 

Steps 1 and 2 are repeated until individuals represent the (next) generation. 


The o spring eventually (with a given xed and small probability pm) underly 

further modi cation by means of point mutations working on individual bits, 

either by reversing a one to a zero, or vice versa or by throwing a dice for 

choosing a zero or a one, independent of the original value. 

At rst glance, this scheme looks very similar to that of a multimembered ES with 

discrete recombination. To reveal the di erences one has to take a closer look at the 

so-called operators, \selection (S)", \mutation (M)", and \recombination (R)." The GA 

sequence of events, i.e., S { R { M, as opposed to M { R { S within ESs, should not matter 

signi cantly since the whole process is a circular one, and whether one likes to reverse the 

order of mutation and recombination is a matter of avoiding unnecessary operations or 

not. In applications, the evaluation of the individuals with respect to their corresponding 

objective function values normally dominates all other operations. Canonical values for 

the recombination probability arepc =0:6, for the number of crossover points nc =2, 

and for the mutation probability pm =0:001. 

5.3.2 Representation of Individuals 

One of the most apparent di erences between GAs and ESs is the fact, that completely 

di erent representations of the object variables are used. Organic evolution uses four 

di erent nucleotides to encode the genotype in pairs of triplets. By means of the genetic 

code these are translated to 20 di erent amino acids. Since there are 4 3 = 64 di erent 

triplets, the genetic code is largely redundant. A closer look reveals its property of 

maintaining similarity on the amino acid level despite most of the small variations on the 

level of single nucleotides. Similar transmission laws between chains of amino acids and 

proteins, proteins and higher aggregates like cells and organs, up to the overall phenotype 

are called the epigenetic apparatus (Riedl, 1976). As a matter of fact, biologists as 

well as behaviorists report that di erences among several children of the same parents as 

well as di erences between two consecutive generations can well be described by normal 

distributions with zero mean and characteristic, probably genetically coded, variances. 

That is why ESs, when used for seeking optimal values for continuous variables use the 

more aggregate model of normal distributions for mutations and discrete or intermediary 

recombination as described in Sections 5.1 and 5.2. 

GAs, however, rely on binary representations of the object variables. One might call 

this genotypic modelling of the variation process, instead of phenotypic modelling as is


practiced in ESs and EP. An important linkbetween both levels, i.e., the genetic code 

as well as the so-called epigenetic apparatus, is neglected at least in the canonical GA. 

For dealing with integer or real values on the level of the object variables GAs make use 

of a normal Boolean representation or they use the so-called Gray code. Both, however, 

present the di culty of so-called Hamming cli s. Depending on its position, a single 

bit reversal thus can lead to small or very large changes on the phenotypic level. This 

important fact has advantages and disadvantages. The advantage lies in the broad range 

of di erent phenotypes available in a GA population at the same time, a matter a ecting 

its global convergence reliability (for a thorough convergence analysis of the canonical GA 

see Rudolph, 1994a). The corresponding disadvantage stems from the other side of the 

same coin, i.e., the inability to focus the search e ort in a close enough vicinity of the 

current positions of individuals in one generation. 

There is a second reason to cling to binary representations of object variables within 

GAs, i.e., Holland's schema theorem (Holland, 1975, 1992). This theorem tries to assure 

exponential penetration of the population by individuals with above average tness under 

proportional selection, with su ciently higher reproduction rates for better individuals, 

one point crossover with xed crossover probability, and small, xed mutation rates. 

If, at some time, especially when starting the search, the population contains the 

globally optimal solution, this will persist in the case where there are zero probabilities 

for mutation and recombination. Mutation, according to the theorem, is an always destructive 

force and thus called a subordinate operator. It only serves to introduce missing 

or reintroduce lost correct bits into nite populations. Recombination (here, one point 

crossover) mayormay not be destructive, depending on whether the crossover point happens 

to lie within a so-called building block, i.e., a short substring of the bit string that 

contributes to above-average tness of one of the mating individuals, or not. Building 

blocks are especially important in case of decomposable objective functions (for a more 

detailed description see Goldberg, 1989). 

GAs in their original form do not permit the handling of implicit inequality or equality 

constraints. On the other hand, explicit upper and lower bounds have tobeprovided for 

the range of the object variables: 

ui xi vi for all i = 1(1)n 

in order to have a basis for the binary decoding and encoding process, e.g., 

xi = ui + vi ; ui 

2 l ; 1 

lX 

j=1 

aij 2 j;1 

where aij for j = 1(1)l represents the bit string segment of length l for encoding the ith 

element of the object variable vector x. 

Instead of this Boolean mapping one also may choose the Gray code, which has the 

property that neighboring values for the xi di er in one bit position only. Looking for 

the probability distribution p( xi) of phenotypic changes xi from one generation to the 

next at a given position x (0) 

i and a given mutation probability pm shows that changing 

the code from Boolean to Gray only shifts, but never avoids, the so-called Hamming


p( ∆x) 

1 

1e-5 

1e-10 

1e-15 

-6 -4 -2 0 2 4 6 8 10 ∆x 

p( ∆x) 

1 

1e-5 

1e-10 

1e-15 

-6 -4 -2 0 2 4 6 8 10 ∆x 

Figure 5.16: Probability distributions for GA mutations / left: normal binary 

code right: Gray code 

cli s. As Figure 5.16 clearly shows for a one dimensional case with x (0) =5,l = 4, and 

pm =0:001, the expectation values for changes x are di erent from zero in both cases, 

and the distribution is in no case unimodal. 

5.3.3 Recombination and Mutation 

Innovation during evolutionary processes occurs in two di erent ways, for so-called higher 

organisms at least. Only the most early and primitive species operate asexually. People 

have often said that GAs can do their work without mutations, which, according to 

the schema theorem, always hamper the adaptation or optimization process, and that, 

on the other hand, ESs can do their work without recombination. The latter is not 

true if self-adaptation of the individual mutation variances and covariancesistowork 

properly (see Schwefel, 1987), whereas the former conjecture has been disproved by Back 

(1993, 1994a,b). For a GA the probability of containing the correct bits for the global 

solution, dispersed over its random start population, is 1 ; L 2 ; ,whichmay be close 

enough to 1 for = 50 as population size and L = 1000 as length of the bit string 

(actually it is 0:999999999999) however, it cannot be guaranteed that those bits will not 

get lost in the course of generations. Whether this happens or not, largely depends on 

the problem structure, the phenomenon being called deception (e.g., Whitley, 1991 Page 

and Richardson, 1992). 

If one looks for recombination e ects within GAs on the level of phenotypes, one 

stumbles over the fact that a recombined o spring of two parents that are close together 

in the phenotype space may largely deviate from both parental positions there. This


Table 5.1: Two point crossover within a GA and its e ect on the 

phenotypes 

Bit strings Phenotype 

Parent 1 0111 1100 7 12 

Parent 2 1000 1011 8 11 

Two point crossover 

O spring 1 0000 1000 0 8 

O spring 2 1111 1111 15 15 

completely contradicts the proverbial saying that the apple never falls far from the tree. 

Table 5.1 shows a simple situation with two parents producing two o spring by means of 

two point crossover, on a bit string of length 8, and encoding two phenotypic variables 

in the range [0 15]in the standard Boolean form. Neither discrete nor intermediary 

recombination within ESs can be that disruptive intermediary recombination always 

delivers phenotypic values for the o spring between those of their parents. The assumption 

that mutations are not necessary for the GA process may even stem from that disruptive 

character of recombination that permits crossover points not only at the boundaries of 

meaningful parental information but also within the genes themselves. 

ESs obey the general rule, that mutations are undirected, by means of using normally 

distributed changes with zero mean{even in the case of correlated mutations. That this 

is not so for GAs can easily be seen from Figure 5.16. Without selection, the GA process 

thus provides biased genetic drift, depending on the actual situation. 

Table 5.2 presents the probability transition matrix for one phenotypic integer variable 

xi in the range [0 3]encoded by means of two bitsonly.Let 

p = pm 

1 

2 

single bit inversion probability and 

q =1; pm probability of not inverting the bit 

From Table 5.2 it is obvious that among all possible transitions (except for those with- 

Table 5.2: Transition probabilities for mutations within a GA 

xi new 

Genotype 00 01 10 11 

Phenotype 0 1 2 3 

00 0 q 2 pq pq p 2 

xi old 01 1 pq q 2 p 2 pq 

10 2 pq p 2 q 2 pq 

11 3 p 2 pq pq q 2


out any change) between the four di erent genetic states 00 01 10 11 (e.g., phenotypes 

0 1 2 3), those from 01 to 10 and from 10 to 01 are the most improbable ones despite their 

phenotypic vicinity. Let pm = 10 ;3 then q 2 = 0:998001pq = 0:000999 and p 2 = 

0:000001: 

5.3.4 Reproduction and Selection 

Whether selection is the rst or last operator in the generation loop of EAs should not 

matter except for the rst iteration. The di erence in this respect between ESs and GAs, 

however, is that both mingle several aspects of the generation transition. Let us look rst, 

therefore, at the biological facts to be modelled by a selection operator. 

An o spring may ormay not be able to survive the time span between birth and 

reproduction. If it is vital up to its reproductive ageitmayhave varying numbers of 

o spring with one or more partners of its own generation. Thus, the term \selection" in 

EAs comprises at least three di erent aspects: 

Survival to adult state (ontogeny) 

Mating behavior (perhaps including promiscuity) 

Reproductive activity 

Both ESs and GAs select parents for each o spring anew, thus modelling maximal 

promiscuity. GAs assign higher mating and reproductive activities to individuals with 

better objective function values (both for proportional as well as linear or other ranking 

selection). But even the worst o spring of generation g may become parents for generation 

g +1. The probability, however, may bevery low. If this is the case, most o spring 

are descendants of a few best parents only. The corresponding loss of diversity inthe 

population may lead to premature stagnation (not convergence!) of the evolutionary 

seeking process. Reducing the proportionality factor in the selection function, on the 

other hand, ultimately leads to random walk behavior. This enhances the reliability in 

multimodal situations, but reduces the convergence velocity and the precision of locating 

the optimum. 

For proportional selection, after Holland derived from an analogy to the game-theoretic 

multiarmed bandit problem, the average number of o spring for an individual with genotype 

ak, phenotype xk, and vitality f(xk) is 

(f(xk)) 

(ak) = ps(ak) = 

= 

1 X 

(f(xi)) 

k 

The transformation (f) is necessary for introducing the proportionality factor mentioned 

aboveaswell as for dealing with negativevalues of the objective function. ps often is called 

the survival probability, which is misleading. No parent really survives its generation 

except in an elitist GA version. Then the best parent isputinto the next generation 

i=1


without applying the selection operator. Otherwise it may happen simply by chance that 

one or the other descendant is not di erent from one of its parents. 

In contrast to ESs, the number of o spring always is equal to the number of parents 

( = ). There is no surplus of descendants to cope with lethal mutations and 

recombinations. ESs need that kind of surplus for handling constraints, at least. In 

the non-preserving case of its comma-version, a multimembered ES also needs a surplus 

( > ) for the selection process. The ; worst o spring are handled as if they do 

not survive to the adult reproductive state the best, however, have the same reproduction 

probability ps =1= ,which does not depend on their individual phenotypes or 

corresponding objective function values. Thus,onaverage, every parent has = descendants. 

This is depicted on the left-hand side of Figure 5.17, where the average number 

of descendants of the two bestof = 10 descendants (evenly distributed on the tness 

scale just for simpli cation purposes) is just = = 5 for a (2,10) ES, and zero for all 

others. 

Within a GA it largely depends on the scaling function (f), how many o spring are 

produced on average by their ancestors. The right-hand part of Figure 5.17 presents two 

possible situations. Crosses (+) belong to a steep, triangles (4) to a at reproduction 

probability curve (average number of o spring) over the tness of the individuals. In 

the former case it typically happens that, just like in ESs, only the best individuals 

produce o spring (here the best parent has 6, the second best 3, the third best only 1, 

and all others zero o spring). One would call this strong selection. Weak selection, on 

the contrary, characterizes the other case (only the worst parent has no o spring, the 

best one just 2, and all others 1). It will strongly depend on the actual topology how one 

should choose the proportionality factor and it mayeven be necessary to change it during 

one optimum seeking process. 

Self-adaptation of internal strategy parameters is possible within the framework of 

GAs, too. Back (1992a,b, 1993, 1994a,b) has demonstrated this with respect to the 

mutation rate. For that purpose he adopts the selection mechanism of the multimembered 

ES. 

Last but not least, the question remains whether a stochastic or a deterministic approach 

to modelling selection is more appropriate. The argument that a stochastic model 

is closer to reality, is not su cient for the purpose at hand: optimization and adaptation. 

5.3.5 Further Remarks 

Of course, one would like to incorporate at least one close-to-canonical GA version into the 

comparative test series with all the other optimization procedures. But there are problems 

with that kind of endeavor. First, GAs do not permit general inequality constraints. 

This does not matter too much, since there are other algorithms that are not applicable 

directly in such cases, too. Next, GAs must be provided with lower and upper bounds for 

all parameters, which of course have tobechosen to contain the solution, probably in or 

near the middle of the hypercube de ned by the explicit bounds. The GA thus would be 

provided with information that is not available for the other algorithms. 

For all other methods the starting point is of great importance, not only because it


Average 

Number of 

Offspring 

5 

4 

3 

2 

1 

0 

1 5 10 

Average 

Number of 

Offspring 

5 

4 

3 

2 

1 

0 

Fitness 1 5 10 

Figure 5.17: Comparison of selection consequences in EAs 

left: ES right: GA 

Fitness 

de nes the initial distance from the optimum and thus determines largely the number of 

iterations needed to approximate the solution at the prede ned accuracy, but also because 

it may provide more or less topological di culties in its vicinity. GAs, however, should 

be started at random in the whole hypercube de ned by the lower and upper bounds 

of the variables, in order to give themachance of approaching the global or, at least, 

avery good local optimum. Reliability tests (see Appendix A, Sect. A.2), especially 

in cases of multimodal functions would thus be biased against all other methods, if one 

allows the GA to start from many points at the same time and if one gives the GA the 

needed extra information about the relevant search region that is not available for the 

other methods. One might provide special test conditions to compare di erent EAs with 

each other without giving one of them an advantage from the very beginning, but no large 

e ort of this kind has been made so far. 

Even in cases of special constraints or side conditions one may formulate appropriate 

instantiations of suitable GA versions. This has been done, for example, for the 

combinatorial optimization task of solving the travelling salesperson problem (TSP) by 

Gorges-Schleuter (1991a,b) repair mechanisms were used in cases where unfeasible tours 

were caused by recombination. Beyer (1992) has investigated ESs for solving TSP-likeoptimization 

problems. It is much better to look for data structures tted to the special task 

and to rede ne the genetic operators to keep to the feasible solution set (see Michalewicz, 

1992, 1994). The time for developing such special EAs must be added to the run time 

on the computer, and one argument infavor of EAs is lost, i.e., their simplicity of use or 

generality of application. 

As the short analysis of GA mutation and recombination operators above has clearly


shown, GAs other than ESs favor in-breadth search andthus are especially prepared to 

solve global and discrete optimization problems, where a volume-oriented approach is 

more appropriate than a path-oriented one. They have so far done their best in all kinds 

of combinatorial optimization (e.g., Lawler et al., 1985), a eld that has not been pursued 

in depth throughout this book. One example in the domain of computational intelligence 

has been the combined topology and parameter optimization of arti cial neural networks 

(e.g., Mandischer, 1993) another is the optimization of membership function parameters 

within fuzzy controllers (e.g., Meredith, Karr, and Kumar, 1992). 

5.4 Simulated Annealing 

The simulated annealing approach tosolveoptimization problems does not really belong 

to the biologically motivated evolutionary algorithms. However, it belongs to the realm of 

problem solving methods that make use of other natural paradigms. This is the reason why 

this section has not been placed elsewhere among the traditional hill climbing strategies. 

In order to harden steel one rst heats it up to a high temperature not far away 

from the transition to its liquid phase. Subsequently one cools down the steel more or 

less rapidly. This process is known as annealing. According to the cooling schedule the 

atoms or molecules have more or less time to nd positions in an ordered pattern (e.g., 

a crystal structure). The highest order, which corresponds to a global minimum of the 

free energy, canbeachieved only when the cooling proceeds slowly enough. Otherwise 

the frozen status will be characterized by one or the other local energy minimum only. 

Similar phenomena arise in all kinds of phase transitions from gaseous to liquid and from 

liquid to solid states. 

A descriptive mathematical model abstracts from local particle-to-particle interactions. 

It describes statistically the correspondences between macro variables like density, 

temperature, and entropy. It was Boltzmann who rst formulated a probability lawto 

link the temperature with the relative frequencies of the very many possible micro states. 

Metropolis et al. (1953) simulated on that basis the evolution of a solid in a heat bath 

towards thermal equilibrium. By means of a Monte-Carlo method new particle con gurations 

were generated. Their free energy Enew was compared with that of the former 

state (Eold). If Enew Eold then the new con guration \survives" and forms the basis 

for the next perturbation. The new state may survivealsoifEnew >Eold, but only with 

a certain probability w 

w = 1 

c exp Eold ; Enew 

KT 

where K denotes the famous Boltzmann constant andTthe current temperature. The 

constant c serves to normalize the probability distribution. This Metropolis algorithm 

thus is in line with the probability lawof Boltzmann. 

Kirkpatrick, Gelatt, and Vecchi (1983) and Cerny (1985) published optimization methods 

based on Metropolis' simulation algorithm. These methods are used quite frequently 

nowadays as simulated annealing (SA) procedures. Due to the fact that good intermediate 

positions may be \forgotten" during the searchforaminimum or maximum, the algorithm 

is able to escape from local extrema and nally might reach the global optimum.

Simulated Annealing 161 

There are two loops within the SA process: 

Lowering the temperature (outer loop) 

Tnew = f(Told)


Step 4: (Termination criterion) 

If T (k) ", end the search with result x . 

Step 5: (Cooling, outer loop) 

Set x (k+10) = x ~x = x , 

and T (k+1) = T (k) 0 <

Tabu Search and Other Hybrid Concepts 163 

Aggressive exploration using a short-term memory forms the core of the TS. From 

a candidate list of (non-exhaustive) moves the best admissible one is chosen. The decision 

is based on tabu restrictions on the one hand and on aspiration criteria on the 

other. Whereas aspiration criteria aim at perpetuating former successful operations, tabu 

restrictions help to avoid stepping back to inferior solutions and repeating already investigated 

trial moves. Although the best admissible step does not necessarily lead to 

an improvement, only better solutions are stored as real moves. Successes and failures 

are used to update the tabu list and the aspiration memory. If no further improvements 

can be found, or after a speci ed number of iterations, one transfers the results to the 

longer-term memories and switches to either an intensi cation or a diversi cation mode. 

Intensi cation combined with the medium-term memory refers to procedures for reinforcing 

movecombinations historically found good, whereas diversi cation combined with 

the long-term memory refers to exploring new regions of the search space. The rst articles 

of Glover (1986, 1989) present many ideas to decide upon switching back and forth 

between the three modes. Many morehave been conceived and published together with 

application results. In some cases complete procedures from other optimization paradigms 

have been used within the di erent phases of the TS, e.g., line search orgradient-liketechniques 

during intensi cation, and GAs during diversi cation. 

Instead of going into further details here, it seems appropriate to give some hints that 

point to rather similar hybrid methods, more or less centered around either GAs, ESs, or 

SA as the main strategy. 

One could start again with Powell's rule to look for further restart points in the 

vicinity of the nal solutions of his conjugate direction method (Chap. 3, Sect. 3.2.2.1) 

or with the restart rule of the simplex method according to Nelder and Mead (Chap. 3, 

Sect. 3.2.1.5), in order to interpret them in terms of some kind of diversi cation phase. But 

in general, both approaches cannot be classi ed as better ideas than starting a speci c 

optimum seeking method from di erent initial solutions and simply comparing all the 

(maybe di erent) outcomes, and choosing the best one as the nal solution. It might 

even be more promising to use di erent strategies from the same starting point and to 

select the overall best outcome again as a new start condition. On MIMD (multiple 

instructions, multiple data) parallel computers or nets of workstations the competition of 

di erent search methods could even be used to set up a knowledge base that adapts to 

a speci c situation (e.g., Peters, 1989, 1991). Only individual conclusions for one or the 

other special application can be drawn from this kind of metastrategic approach, however. 

At the close of this general survey, only a few further hints will be given regarding the 

vast number of recent proposals. 

Ablay (1987), for example, uses a basic search routine similar to Rechenberg's (1+1) 

ES and interrupts it more or less frequently by a pure random search in order to avoid 

premature stagnation as well as convergence to a non-global local optimum. 

The replicator algorithm of Voigt (1989) also refers to organic evolution as a metaphor 

(see also Voigt, Muhlenbein, and Schwefel, 1990). Its modelling technique may be called 

descriptive, according to earlier work of Feistel and Ebeling (1989). Ebeling (1992) even 

proposes to incorporate ontogenetic learning features (so-called Haeckel strategy). 

Muhlenbein and Schlierkamp-Voosen (1993a,b) proposed a so-called breeder GA, which


combines a greedy algorithm to locate nearest local optima very quickly, with a genetic 

algorithm to allocate recombined start positions for further local optimum seeking cycles. 

This has proven to be very successful in special situations where the local optima are 

situated in a regular pattern in the search space. 

Dueck and Scheuer (1990) have devised a so-called threshold accepting strategy, which 

is rather similar to the simulated annealing approach but pretends to deliver superior 

results. Later on Dueck (1993) elaborated his great deluge algorithm, which addstothe 

threshold accepting method some kind of diversi cation mode like the tabu search in order 

to avoid premature stagnation at a non-global local optimum. 

Lohmann (1992) and Herdy (1992) propose a hierarchical ES according to Rechenberg's 

extended notation (Rechenberg, 1978, 1989, 1994) of the multimembered scheme 

to solve so-called structural optimization problems. Whereas this term normally points 

to situations in which a solid structure subject to stresses and deformations has to be 

designed in order to have least weight or production cost, Lohmann and Herdy do not 

mean anything else than a mixed-integer optimization problem. The solution is sought 

for in an outer ES-loop that varies the integer object variables only and an inner ESloop 

that varies the real-valued variables. Thus the outer loop compares relative optima 

found in the inner loops. This kind of cyclical subspace search, somehow similar to the 

Gauss-Seidel approach, must not represent the ultimate solution to mixed-integer problems, 

however. It is more or less prone to nding non-global local optima only. A more 

general evolutionary algorithm should be able to change{at the same time, by appropriate 

mutation and recombination operators{both the discrete and the real-valued object 

variables. But this speculation must be proved in forthcoming further steps towards a 

more general evolutionary algorithm, perhaps a hybrid of ES and GA ingredients.

Chapter 6 

Comparison of Direct Search 

Strategies for Parameter 

Optimization 

6.1 Di culties 

The vast and steadily increasing number of optimization methods necessarily raises the 

question of which is the best strategy. There seems to be no unique answer. If indeed 

there were an optimal optimization method all the others would be super uous and would 

have been long ago forgotten. 

Because of the strong competition between already existing strategies it is necessary 

nowadays that whenever any proposal for a new method or variant is made, its advantages 

and improvements compared to older strategies be displayed. The usual way is to refer to 

a minimum problem for which the known methods fail to nd a solution whereas the new 

proposal is successful. Or it is shown with reference to chosen examples that computation 

time or iterations can be saved by using the new version. The series of publications 

along these lines can in principle be continued inde nitely. With su cient insight into 

the working of any strategy a special optimization problem can always be constructed for 

which the strategy fails. Likewise for any problem a special method of solution can be 

devised that is superior to the other procedures. One simply needs to exploit to the full 

what one knows of the problem structure as contained in its mathematical formulation. 

Progress in the eld of optimization methods does not, however, consist in developing 

an individual method of solution for each problem or type of problem. A practitioner 

would much rather manage with just one strategy, which can solve all the practically 

occurring problems for as small a total cost as possible. But as yet there is no such 

universal optimization method, and some authors doubt if there ever will be (Arrow and 

Hurwicz, 1957). All the methods presently known can only be used without restriction 

in particular areas of application. According to the nature of the particular problem, 

one or another strategy o ers a more successful solution. The question of which isthe 

best strategy is itself a kind of optimization problem. To be able to answer it objectively 

an objective function would have tobeformulated for deciding which oftwo methods 

165

166 Comparison of Direct Search Strategies for Parameter Optimization 

was best from the point of view of its results. So long as no generally recognized quality 

function of this kind exists, the question of which optimization method is optimal remains 

unanswered. 

6.2 Theoretical Results 

Classical optimization theory is concerned with establishing necessary and su cient existence 

criteria for maxima and minima. It provides systems of equations but no iterative 

methods of nding their solutions. Not even Dantzig's simplex method (1966) for solving 

linear programming problems can be regarded as a direct result of theory{theoretical considerations 

of the linear problem only show that the extremum sought, except in special 

cases, must always lie in a corner of the polyhedron de ned by the constraints. With n 

variables and m constraints (together with n non-negativity conditions) the number of 

corners or points of intersection of the hypersurfaces formed by the constraints is also 

limited to a maximum of m+n 

. Even the systematic inspection of all the points of 

n 

intersection would be a nite optimization method. But not all the points of intersection 

are also within the allowed region (Saaty, 1955, 1963). Muller-Merbach (1971) gives 

mn ; m + 2 as an upper bound to the number of feasible corner points. The simplex 

method, which is a method of steepest ascent along the edges of the polyhedron only 

traverses a tiny fraction of all the corners. Dantzig (1966) refers to empirical evidence 

that the number of necessary iterations increases as n, thenumber of variables, if the 

number of constraints m is constant, or as m if (n ; m) is not too small. Since, in 

the least favorable case, between m and 2 m exchange operations must be performed on 

the tableau of (m +1)(n + 1) coe cients, the average computation time increases as 

O(m2 n). In so-called degenerate cases, however, the simplex method can also become 

in nite. The repeated cycling through the same corners must then be broken by arule 

for randomly choosing the iteration step (Dantzig). From a theoretical point of view the 

ellipsoid method of Khachiyan (1979) and the interior point method of Karmarkar (1984) 

do have the advantage of polynomial time consumption even in the worst case. 

The question of niteness of iterative methodsisalsoacentral theme of non-linear 

programming. In this case the solution can lie at any point on the border or interior 

of the enclosed region. For the special case that the objective function and all the constraint 

functions are convex and multiply di erentiable Kuhn and Tucker (1951) and John 

(1948) have derived necessary and su cient conditions for extremal solutions. Most of 

the iteration methods that have beendeveloped on this basis are designed for problems 

with a quadratic objective function and linear constraints. Representative of quadratic 

programming are, for example, the methods of Beale (1956) and Wolfe (1959a). They 

make extensive use of the algorithm of the simplex method and thus belong, according to 

Hadley (1969), to the class of neighboring extremal point methods. Other strategies can 

moveinto the allowed region in the course of the iterations. As far as the constraints permit 

they take the direction of the gradient of the objective function. They are therefore 

known as gradient methods of non-linear programming (Kappler, 1967). As their name 

may suggest, however, they are not suitable for all non-linear problems. Their convergence 

can be proved at best for di erentiable quasi-convex programs (Kunzi, Krelle, and

Theoretical Results 167 

Oettli, 1962). For these conditions the number of required iterations and rate of convergence 

cannot be stated in general. The same is true for the methods of Khachiyan (1979) 

and Karmarkar (1984). In the following chapters a short summary is attempted of the 

convergence properties of non-linear optimization methods in the unconstrained case (hill 

climbing methods). 

6.2.1 Proofs of Convergence 

A proof of convergence of an iterative method will aim to show that a sequence of iteration 

points x (k) tends monotonically with the index k towards the point x 0 which is sought: 

or 

lim 

k!1 kx(k) ; x 0 k!0 

kx (k) ; x 0 k " " 0 for K(") k


of a practical method, one must usually introduce adaptive rules for the termination of 

subroutines that would in principle run forever (Nickel, 1967 Nickel and Ritter, 1972). 

A further limitation to the predictive power of proofs of convergence arises from the 

properties of the point x 0 referred to above. Even if confusion of maxima and minima 

is eliminated, the approximate solution x 0 can still be a saddle point. To exclude this 

possibility, the second and sometimeseven higher partial derivatives must be constructed 

and tested. It still always remains uncertain whether the solution that is nally found 

represents the global minimum or only a local minimum of the objective function. The 

only possibility of proving the global convergence of a sequential optimization method 

seems to be to require unimodality of the objective function. Then only one local optimum 

exists that is also a global optimum. Some global convergence properties are only 

possessed by a few simultaneous methods, such as for example the systematic grid method 

or the Monte-Carlo method. They place no continuity requirements on the objective function 

but the separation of the trial points must be signi cantly smaller than the distance 

between neighboring minima and the required accuracy. The fact that its cost rises exponentially 

with the number of variables usually precludes the practical application of such 

a method. 

How does the convergence of the evolution strategy compare? For xed step lengths, or 

more precisely for xed variances 2 

i > 0 of the normally distributed mutation steps, there 

is always a positive probability of going from any starting point (e.g., a local minimum) 

to any other point with a better objective function value, provided that the separation of 

the points is nite. For the two membered method, Rechenberg (1973) gives necessary 

and su cient conditions that the probability of success should exceed a speci ed value. 

Estimates of the computation cost can only be made for special objective functions. In 

this respect there are problems in determining the rules for controlling the mutation step 

lengths and deciding when the search is to be terminated. It is hard to reconcile the 

requirements for rapid convergence in one case and for a certain minimum probability of 

global convergence in another. 

6.2.2 Rates of Convergence 

While it may be of importance from a mathematical point of view to show that under 

certain assumptions a particular method leads with certainty to the objective, it is even 

more important toknowhowmuch computational e ort is required, or what is the rate 

of convergence. The question of how fast an optimal solution is approached, or how many 

iterations are needed to reach a prescribed small distance from the objective, can only be 

answered for a few abstract methods and under even more restrictive assumptions. One 

distinguishes between rst and second order convergence. Although some authors reserve 

the term quadratic convergence for the case when the solution of a quadratic problem is 

found within a nite number of iterations, it will be used here as a synonym for second 

order convergence. A sequence of iteration points x (k) converges linearly to x if it satis es 

the condition 

kx (k) ; x k c k


where 0


their so-called Q-properties. Thus if a strategy takes p iteration steps for locating exactly 

the quadratic optimum it is said to have the property Qp. 

The Newton-Raphson method, for example, takes only a single step because the second 

partial derivatives are constant over the whole IR n and all higher order derivatives vanish. 

If the iteration rule is followed exactly it gives the position of the minimum right atthe 

rst step without the necessity of a line search. As no objective function values need to 

be evaluated explicitly one also refers to it as an indirect optimization method. It has the 

property Q 1. 

A conjugate gradients method, e.g., that of Fletcher and Reeves (1964), requires up 

to n cycles before a complete set of conjugate directions is assembled and a line search 

leads to the minimum. It therefore has the property Qn. 

Powell's (1964) derivative-free search method of conjugate directions requires n +1 

line searches for determining each of the n direction vectors and thus has the property 

Qn(n +1)orQO(n 2 ) in terms of the number of one dimensional minimizations. 

The variable metric strategy of Davidon (1959) in the formulation of Fletcher and 

Powell (1963) can be interpreted both as a quasi-Newton method and as a method with 

conjugate directions. If the objective function is quadratic, then the iteratively improved 

approximate matrix agrees with the exact inverse of the Hessian matrix after n iterations. 

This method has the property Qn. 

Apart from the fact that any practical algorithm can require more than the theoretically 

predicted number of iterations due to the e ect of rounding errors, for peculiar 

types of coe cient matrix in the quadratic problem the algorithm can fail completely. 

For example Zangwill (1967) demonstrates such a source of error in the Powell method if 

no improvement isachieved in one direction. 

6.2.4 Computing Demands 

The speci cation of the Q-properties of individual strategies is only the rst step towards 

estimating the computing demands. In di erent procedures an iteration or a cycle 

comprises various di erent operations. It is useful to distinguish ordinary calculation operations 

like additions and multiplications from the evaluation of functions such as the 

objective function and its derivatives. The number of variables is the basic quantity that 

determines the computation cost. A crude but adequate measure is therefore given by 

the power p of n, the number of parameters, with which the expected computation times 

increase. For the case of many variables, since the highest powers are dominant, lower 

order terms can be neglected. In the Newton-Raphson method, at each iteration the gradient 

vector rF and the Hessian matrix r2F must be evaluated, which meansn rst and 

n 

2 

(n + 1) second partial derivatives. Objective function values are not required. In fact 

the most costly step is the matrix inversion. It requires in the order of O(n 3 ) operations. 

A cycle of the conjugate gradient method consists of a line search and a gradient determination. 

The one dimensional minimization requires several calls of the objective function. 

Their number depends on the choice of method but it can be regarded as constant, or at 

least as independent of the number of variables. The remaining steps in the calculation, 

including vector multiplications, are composed of O(n) elementary arithmetical opera-


tions. Similar results apply in the case of the variable metric strategy, except that there 

are an additional O(n 2 ) basic operations for matrix additions and multiplications. The 

direct search method due to Powell evaluates neither rst nor second partial derivatives. 

After every n + 1 line searches the direction vectors are rede ned, which requires O(n 2 ) 

values to be assigned. But since each one dimensional optimization counts as an iteration 

step, only O(n) direct operations are attributed to each iteration. A convenient summary 

of the relationships is given in Table 6.1. For simplicity only the terms of highest order in 

the number of parameters n are accounted for, without their coe cients of proportionality. 

So far we have no scale for comparison of the di erent function evaluations with each 

other. Fletcher (1972a) and others consider an evaluation of the Hessian matrix to be 

equivalent toO(n) gradient determinations or O(n 2 ) objective function calls. This type 

of scaling is valid whenever the partial derivatives cannot be obtained in analytic form 

and provided as functions, but are calculated approximately as quotients of di erences 

obtained by trial steps in the coordinate directions. In any case it ought to be about 

right if the objective function is of higher than second order. Accordingly the following 

weighting of the function evaluations can be introduced on the table: 

F : rF : r 2 F ^ = n 0 : n 1 : n 2 

Before anything can be said about the overall computation cost, or time, one must 

know howmany operations are required for calculating a value of the objective function. 

In general a function of n variables will entail a cost that rises at least linearly with n. 

Table 6.1: Number of operations required by the most important 

basic strategies to minimize a quadratic objective function 

in terms of the number of variables n (only orders 

of magnitude) 

Number of Number of operations per iteration 

Strategy iterations 

Function evaluations 

F rF r 

Elementary 

2F operations 

Newton 

e.g., Newton-Raphson 

Variable metric 

e.g., Davidon 

Conjugate gradients 

e.g., Fletcher-Reeves 

Conjugate directions 

e.g., Powell 

n 0 | n 0 n 0 n 3 

n 1 n 0 n 0 | n 2 

n 1 n 0 n 0 | n 1 

n 2 n 0 | | n 1 

n 0 n 1 n 2 

Weighting factors


For a quadratic function with a full matrix of coe cients, just to evaluate the expression 

xT Axrequires O(n2 ) basic arithmetical operations. If the order of magnitude is denoted 

by O(nf ) then, assuming f 

computation time is given by: 

1, for all the optimization methods considered so far the 

T n 2+f 

n 3 

The advantage of having fewer function-independent operations in the Fletcher-Reeves 

method, therefore, only makes itself felt if the number of variables is small and the time 

for one function evaluation is short. 

All the variants of the basic second order strategies mentioned here can be tted, with 

similar assumptions, into the above scheme. Among these are (Broyden, 1972) 

Modi ed and quasi-Newton methods 

Methods of conjugate gradients and conjugate directions 

Variable metric strategies, with their variations using correction matrices of rank 

one 

There is no optimization method that has a cost rising with less than the third power 

of the number of variables. Even the indirect procedure, in which the equations for the 

necessary conditions for an extremum are set up and solved by conventional methods, 

does not a ord any basic reduction in the computational e ort. If the objective function 

is quadratic, a system of n simultaneous linear equations is obtained. To solve for the 

n unknowns the Gaussian elimination method requires 1 

3 n3 basic operations (multiplications 

and divisions). According to Zurmuhl (1965) all the other direct methods, meaning 

here non-iterative methods, are more costly, except in special cases. Methods involving 

a stepwise approach to the solution of systems of linear equations (relaxation methods) 

require an in nite number of iterations to reach an absolutely exact result. They converge 

linearly and correspond to rst order optimization strategies (single step or Gauss-Seidel 

methods and total step or gradient methods see Schwarz, Rutishauser, and Stiefel, 1968). 

Only the method of Hestenes and Stiefel (1952) converges after a nite number of calculation 

steps, assuming that the calculations are exact. It is a conjugate gradient method 

for solving systems of linear equations with a symmetrical, positive-de nite matrix of 

coe cients. 

The main concern here is with direct, i.e., derivative-free, search strategies for optimization. 

Finiteness of the search in the quadratic case and greater than linear convergence 

can only be proved for the Powell method of conjugate directions and for 

the Davidon-Fletcher-Powell variable metric method, which Stewart reformulated as a 

derivative-free quasi-Newton method. Of the coordinates strategy, at best it can be said 

that it converges linearly. The same holds for the simple gradient methods. There are also 

versions of them in which the partial derivatives are obtained numerically. Since various 

comparison tests have shown them to be rather ine ective in highly non-linear situations, 

none is considered here. No theoretically founded statements about convergence rates 

and Q-properties are available for the other direct strategies. The rate of progress dened 

by Rechenberg (1973) for the evolution strategy with adaptive step length control

Numerical Comparison of Strategies 173 

represents an average measure of convergence. It could, however, only be determined 

theoretically for two selected model objective functions. The one with concentric contour 

lines, or contour hypersurfaces, can be regarded as a special case of a quadratic objective 

function. The formula for the local rate of progress in both the two membered and the 

multimembered strategies has the form 

r is the current distance from the objective: 

'(r) =c r 

c = const: 

n 

r = kx (k) ; x k 

and ' is the change in r at one iteration or mutation 

Rearrangement of the above formulae gives 

'(r) =4r = kx (k) ; x k;kx (k+1) ; x k 

kx (k+1) ; x k = kx (k) ; x k 1 ; c 

n 

or 

kx (k) ; x k = kx (0) ; x k 1 ; c 

n 

k 

which because 

0 < 1 ; c 

< 1 

n 

for 1 n


hold for the idealized concept of an algorithm, not for a particular computer program. 

The susceptibility of a strategy to rounding errors depends on how it is coded. Thus, for 

this reason too there is a need to check the convergence properties of numerical methods 

experimentally. 

Because of the nite word length of a digital computer the number range is also 

limited. If it is exceeded, the program that is running normally terminates. Such fatal 

execution errors ( oating over ow, oating divide check), are usually the consequence of 

rounding errors in previous steps if the error is in going below the absolutely smallest 

number value ( oating under ow) it is not regarded as fatal. Only few algorithms, e.g., 

Brent (1973), take special account of nite machine accuracy. 

In spite of the frequent mention of the importance of numerical comparisons of strategies, 

few publications to date have reported results on several di erent test problems using 

a large number of minimization methods. By virtue of its scope, the work of Colville 

(1968, 1970) stands out among the older studies by Brooks (1959), Spang (1962), Dickinson 

(1964), Leon (1966a), Box (1966), and Kowalik and Osborne (1968). It included 

30 strategies and 8 di erent problems, but not many direct search methods compared to 

gradient methods. In some other tests by Jacoby, Kowalik, and Pizzo (1972), Himmelblau 

(1972a), Smith (1973), and others in the collection of Lootsma (1972a), derivative-free 

strategies receive much moreattention. The comparisons of Gorvits and Larichev (1971) 

and Larichev and Gorvits (1974) treat only gradient methods, and that of Tapley and 

Lewallen (1967) deals with some schemes for the numerical treatment of functional optimization 

problems. The huge collection of test problems of Hock andSchittkowski (1981) 

is biased towards standard methods of mathematical programming and their capabilities 

(Schittkowski, 1980). 

6.3.1 Computer Used 

The machine on which thenumerical experiments were carried out was a PDP 10 from 

the rm Digital Equipment Corporation, Maynard, Massachusetts. It had the following 

speci cations: 

Core storage area: 64K (1K = 1024 words) 

Word length: 36 bits 

Cycle time: 1.65 or 1.8 s 

The time-sharing operating system accounted for about 34K of core, so that only 

30K remained available to the user. To tackle some problems with as many variables 

as possible, the computations were generally worked only to single precision. The main 

program, which was the same for all strategies, occupied about 5+ 

2 n 

1024 

Kwords, 

and the FORTRAN library a further 5K. The consequent maximum number nmax of 

parameters is given for each search method under test in Table 6.2. The nite word length 

of a digital computer means that its number range is limited. The absolute bounds for 

oating point arithmetic were given by: 

Largest absolute number: 2 127 ' 1:7 10 38 

Smallest absolute number: 2 ;128 ' 2:9 10 ;39


Only a part of the word is available for the mantissa of a number. This imposed the 

di erential accuracy limit, which ismuch lower and usually more important: 

Smallest di erence relative to unity: 2 ;27 ' 7:5 10 ;9 

Accordingly the following equalities hold for this computer: 

" = 0 for j"j < 2 ;128 

1+" = 1 for j"j < 2 ;27 

These computer-speci c data play arôle when testing for zero or for the equality of 

two quantities. The same programs can therefore lead to di erent results on di erent 

computers. 

Strategies are often judged by the computation time they require to achieve a result, for 

example, with a speci ed accuracy. The basic quantity for this purpose is the occupation 

time of the central processor unit (CPU). It also depends on the machine. Word lengths 

and cycle times are not enough to allow comparison between runs that were made on 

di erent computers. So-called MIX-times, which are average values of the duration of 

certain operations, also prove to be unsuitable, since the speed of calculation is so strongly 

dependent on the frequency of its individual steps. A method proposed by Colville (1968) 

has received wide recognition. Its design was particularly suited to optimization methods. 

According to this scheme, measured computation times are expressed relative to the time 

required for 10 consecutive inversions of a 40 40 matrix, using the FORTRAN program 

written by Colville. In our case this unit was around 110 seconds. Because of the timesharing 

operation, with its rather variable load on the PDP 10, there were deviations 

of 10% and above on the reported CPU times. This was especially marked for short 

programs. 

6.3.2 Optimization Methods Tested 

One goal of this work is to compare evolution strategies with other derivative-free methods 

of continuous parameter optimization. To this end we consider not only direct search 

methods in the narrower sense, but also those methods that glean their required knowledge 

of partial derivatives by means of trial steps and nite di erence methods. Altogether 14 

strategies or versions of basic strategies are considered. Their names and abbreviations 

used for them are listed in Table 6.2. All tests were run on the PDP 10 mentioned in the 

previous section. 

Finite computer accuracy implies that in the case of quadratic objective functions 

the iteration process could or should not be continued until the exact solution has been 

obtained. The decision when to terminate the optimum search is a necessary and often 

crucial component ofany iterative method. Just as the procedures of the individual 

strategies di er, so too do their termination or convergence criteria. As a rule, the user 

is given the chance to exert an in uence on the termination criterion by means of an 

input parameter de ned as the required accuracy. It refers either to the values of the 

variables (change in xi within one iteration or size of the step lengths si) ortovalues of 

the objective function. Both criteria harbor the danger that the search will be terminated


prematurely, that is before arriving as close to the objective as is required. This is made 

clear by Figure 6.1. 

Neither 4x < "x nor 4 F < "F alone are su cient conditions for being close to 

the solution x . The condition krF k


Table 6.2: Strategies applied: their abbreviations, maximum number 

of variables and accuracy parameters 

Strategy Abbreviation Maximum Accuracy 

number of parameter 

variables 

Coordinate strategy with FIBO 2900 " =7:5 10 ;9 

Fibonacci search 

Coordinate strategy with GOLD 2910 " =7:5 10 ;9 

golden section 

Coordinate strategy with LAGR 2540 " =7:5 10 ;9 

Lagrangian interpolation 

Direct search ofHooke HOJE 4090 " =7:5 10 ;9 

and Jeeves 

Davies-Swann-Campey method DSCG 75 " =7:5 10 ;9 

with Gram-Schmidt 

orthogonalization 

Davies-Swann-Campey method DSCP 95 " =7:5 10 ;9 

with Palmer orthogonalization 

Powell's method of conjugate POWE 135 " =7:5 10 ;9 

directions 

Stewart's modi cation of the DFPS 180 "a = "b = "c = 

;9 y 

Davidon-Fletcher-Powell 

method 

7:5 10 

Simplex Method of Nelder 

and Mead 

SIMP 135 " =10 

Method of Rosenbrock with 

Gram-Schmidt orthogonalization 

ROSE 75 " =10 

Complex method of Box COMP 95 " =10 

(1 + 1) Evolution strategy EVOL 4000 ) "a = "c = 

) 3:0 10 ;39 

(10 100) Evolution strategy GRUP 435 ) 

(10 100) Evolution strategy REKO 435 ) "b = "d = 

with recombination ) 7:5 10 ;9 

z Values xed by the author. 

y In place of the values set in Lill's program: "a =10 ;6 " b =10 ;10 " c =5 10 ;13 : 

The maximum numberofvariables refers to an available core storage area of 30K words, which includes 

the main program and the FORTRAN library. 

;8 z 

;4 z 

;6 z


Besides their considerable cost in programming and computation time, numerical 

strategy comparisons entail further di culties. The e ectiveness of a method can be 

strongly in uenced by small programming details. A number of methods were not fully 

worked out by their originators and require heuristic rules to be introduced before they can 

be applied. The way in which this degree of freedom is exercised to de ne the procedure 

depends on the skill and experience of the programmer, which leads to large discrepancies 

between the results of investigations and the judgements of di erent authors on one and 

the same strategy. 

We have therefore, as far as possible, used already published programs (FORTRAN 

or ALGOL) for the algorithms or parts of them under study: 

One dimensional search with the Fibonacci method of Kiefer: 

M. C. Pike, J. Pixner (1965) Algorithm 2, Fibonacci search 

J. Boothroyd (1965) Certi cation of Algorithm 2 

M. C. Pike, I. D. Hill, Note on Algorithm 2 

F. D. James (1967) 

One dimensional search with the golden section method of Kiefer: 

K. J. Overholt (1967) Algorithm 16, Gold 

Direct search (pattern search) of Hooke and Jeeves: 

A. F. Kaupe, Jr. (1963) Algorithm 178, direct search 

M. Bell, M. C. Pike (1966) Remark on Algorithm 178 

R. DeVogelaere (1968) Remark on Algorithm 178 

F. K. Tomlin, L. B. Smith (1969) Remark on Algorithm 178 

L. B. Smith (1969) Remark on Algorithm 178 

Orthogonalization method for the strategies of Rosenbrock and of Davies, Swann, 

and Campey: 

J. R. Palmer (1969) An improved procedure for orthogonalizing the 

search vectors in Rosenbrock's and Swann's direct 

search optimization methods 

Derivative-free method of conjugate directions of M. J. D. Powell: 

M. J. Hopper (1971) Harwell subroutine library. A catalogue of subroutines, 

from which subroutine VA04A, updated 

May 20, 1970 (received as a card deck). 

Variable metric method of Davidon, Fletcher, and Powell as formulated by Stewart: 

S. A. Lill (1970) Algorithm 46. A modi ed Davidon method for 

nding the minimum of a function, using di erence 

approximation for the derivatives.


S. A. Lill (1971) Note on Algorithm 46 

Z. Kovacs (1971) Note on Algorithm 46 

Some of the parameters a ecting the accuracy were altered, either because the small 

values de ned by the author could not be realized on the available computer or 

because the closest possible approach to the objective could not have been achieved 

with them. 

Simplex method of Nelder and Mead: 

R. O'Neill (1971) Algorithm AS 47, function minimization using a 

simplex procedure 

A complete program for the Rosenbrock strategy: 

M. Machura, A. Mulawa (1973) Algorithm 450, Rosenbrock function minimization 

This was not applied because it could only treat the unconstrained case. 

The same applies to the code for the complex method of M. J. Box: 

J. A. Richardson, J. L. Kuester 

(1973) 

Algorithm 454, the complex method for constrained 

optimization 

The part of the strategy that, when the starting point is not feasible seeks a basis 

in the feasible region, is not considered here. 

Whenever the procedures named were published in ALGOL they have been translated 

into FORTRAN. All the other optimization strategies not mentioned here have also been 

programmed in FORTRAN, with close reference to the original publications. If one wanted 

to repeat the test series today, amuch larger number of codes could be made use of from 

the book of More and Wright (1993). 

6.3.3 Results of the Tests 

6.3.3.1 First Test: Convergence Rates for a Quadratic Objective Function 

In the rst part of the numerical strategy comparison the theoretical predictions of convergence 

rates and Q-properties will be tested, or, where these are not available, experimental 

data will be supplied instead. For this purpose two quadratic objective functions are used 

(Appendix A, Sect. A.1). In the rst (Problem 1.1) the matrix of coe cients is diagonal 

with unit diagonal elements, i.e., a scalar matrix. This simplest of all quadratic problems 

is characterized by concentric contour lines or surfaces that can be represented or 

imagined as circles in the two parameter case, spheres in the three parameter case, and 

surfaces of hyperspheres in the general case. The same pattern of contours but with arbitrary 

monotonic variation in the objective function occurs in the sphere model for which 

the average rates of progress of the evolution strategies could be determined theoretically 

(Rechenberg, 1973 and Chap. 5 of this book). 

The second objective function (Problem 1.2) has a matrix of coe cients with all nonzero 

elements. It represents a full quadratic problem (except for the missing linear term)


with concentric, oblique ellipses, or ellipsoids as the contour lines or surfaces. The condition 

number of the matrix of coe cients increases quadratically with the number of 

parameters (see Appendix A, Sect. A.1). In general, the time required to calculate one 

value of the objective function increases as O(n 2 ) for a quadratic problem, because, for a 

full form matrix, n 

2 (n + 1) distinct second order terms aij xi xj must be evaluated. The 

objective function of Problem 1.2 has been formulated with the intention of reducing the 

computation time per function call to O(n), without it being such a particular quadratic 

problem that one of the strategies could nd it especially advantageous. The strategy 

comparison for this problem could thereby be made for much larger numbers of variables 

for the prescribed maximum computation time (Tmax = 8 hours). The storage requirement 

for the full matrix A would also have been an obstacle to numerical tests with many 

parameters. 

To enable comparison of the experimental and theoretical results, the required number 

of iterations, line searches, orthogonalizations, objective function calls, and the computation 

time were measured in going from the initial values 

to an approximation 

x (k) 

i 

x (0) 

i 

; x i 

= x i + (;1)i 

p n for i = 1(1)n 

1 

10 x(0) 

i ; x i for i =1(1)n 

The interval of uncertainty of the variables thus had to be reduced by at least 90%. 

The distance covered is e ectively independent ofthenumber of variables. The above 

conditions were tested after each iteration, and as soon as they were satis ed the search 

was terminated. The convergence criteria of the strategies themselves were not suppressed, 

but they could not generally take e ect as they were much stricter. If they did actually 

operate it could be regarded as a failure of the method being applied. 

The results of the rst test are given in Tables 6.3 and 6.4. The number of function 

calls and the number of iterations or other characteristic processes involved are displayed 

in Figures 6.2 to 6.13 as a function of the number of parameters n on a log-log scale. As 

the data show, the computation time and e ort of a strategy increase sharply with n. 

The large range in the number of variables compared to other investigations allows the 

trends to be seen clearly. To facilitate an overall view, the computation times of all the 

strategies are plotted as a function of the number of variables in Figures 6.14 and 6.15.


Table 6.3: Results of all strategies for test Problem 1.1 

FIBO{Coordinate strategy with Fibonacci search 

Number of variables Number of cycles Number of objective Computation time 

function calls in seconds 

3 1 158 0.13 

6 1 278 0.28 

10 1 456 0.53 

20 1 866 1.66 

30 1 1242 3.07 

60 1 2426 10.7 

100 1 3870 26.5 

200 1 7800 106 

300 1 10562 210 

600 1 21921 826 

1000 1 38701 2500 

2000 1 67451 8270 

(max) 2900 1 103846 19300 

1 cycle = n line searches 

GOLD{Coordinate strategy with golden section 

Number of variables Number of cycles Number of objective Computation time 



3 1 158 0.10 

6 1 279 0.22 

10 1 458 0.51 

20 1 866 1.48 

30 1 1242 3.14 

60 1 2426 11.3 

100 1 3870 27.6 

200 1 7802 114 

300 1 10562 221 

600 1 21921 808 

1000 1 38703 2670 

2000 1 67431 8410 

2900 1 103834 18300


Table 6.3 (continued) 

LAGR{Coordinate strategy with Lagrangian interpolation 

Number of variables Number of cycles Numberofobjective Computation time 


3 1 85 0.04 

6 1 163 0.12 

10 1 271 0.30 

20 1 521 0.88 

30 1 781 1.80 

60 1 1561 6.68 

100 1 2501 17.3 

200 1 5001 68.6 

300 1 7201 153 

600 1 14401 546 

1000 1 24001 1620 

2000 1 46803 6020 

(max) 2540 1 64545 10300 


HOJE-Direct search ofHooke and Jeeves 

Number of variables Number of cycles Numberofobjective Computation time 


3 4 20 0.02 

6 4 43 0.04 

10 3 48 0.06 

20 7 274 0.50 

30 3 168 0.43 

60 8 874 3.70 

100 2 352 2.37 

200 8 3104 40.1 

300 9 4954 100 

600 7 7503 286 

1000 12 23505 1460 

2000 9 35003 4270 

3000 10 58504 11200 

(max) 4090 13 104300 25600 

1 cycle = n to 2 n individual steps



DSCG{Davies-Swann-Campey method with Gram-Schmidt orthogonalization 

Number of Number of Number of line Number of objec- Computation time 

variables orthog. searches tive function calls in seconds 

3 0 3 20 0.04 

6 0 6 34 0.10 

10 0 10 56 0.20 

20 0 20 111 0.68 

30 0 30 136 1.18 

50 0 50 226 2.80 

(max) 75 0 75 338 6.10 

DSCP{Davies-Swann-Campey method with Palmer orthogonalization 



(max) 95 0 95 428 9.49 

Results for n 75 identical to those of DSCG in addition. 

POWE{Powell's method of conjugate directions 


variables iterations searches tive function calls in seconds 

3 1 3 11 0.02 

6 1 6 20 0.06 

10 1 10 32 0.12 

20 1 20 62 0.32 

30 1 30 92 0.60 

60 1 60 182 1.96 

100 1 100 202 3.72 

(max) 135 1 135 407 8.60 

1 complete iteration = n + 1 line searches included are all the iterations begun



DFPS{Stewart's modi cation of the Davidon-Fletcher-Powell method 

Number of variables Number of iterations Number of objective Computation time 


3 1 10 0.02 

6 1 16 0.04 

10 1 24 0.06 

20 1 44 0.16 

30 1 64 0.32 

60 1 124 1.14 

100 1 204 3.19 

135 1 274 5.42 

(max) 180 1 364 9.56 

1 iteration = 1 gradient evaluation and 1 line search 

SIMP{Simplex method of Nelder and Mead (with restart) 

Numberofvariables Number of restarts Number of objective Computation time 


3 0 28 0.09 

6 0 104 0.64 

10 0 138 1.49 

20 0 301 8.24 

30 0 664 37.4 

60 0 1482 277 

100 0 1789 862 

(max) 135 1 5142 5270 

ROSE{Rosenbrock's method with Gram-Schmidt orthogonalization 

Number of variables Number of orthog. Number of objective Computation time 


3 1 27 0.08 

6 2 60 0.32 

10 2 120 0.91 

20 1 181 2.56 

30 0 121 1.18 

40 1 281 13.7 

50 2 550 48.4 

60 2 600 78.3 

(max) 75 2 899 145



COMP{Complex method of Box (2n vertices) 

Number of variables Numberofobjective Computation time 


3 69 0.22 

6 259 1.62 

10 535 6.72 

20 1447 72.0 

30 2621 211 

60 7263 2240 

(max) 95 14902 11000 

All numbers are averages over several attempts. 

EVOL{(1+1) evolution strategy (average values) 

Number of variables Number of mutations Computation time 

in seconds 

3 49 0.17 

6 154 0.79 

10 224 1.74 

20 411 6.47 

30 630 14.0 

60 1335 60.0 

100 2192 149 

150 3322 340 

200 4232 565 

300 6666 1310 

600 13819 5440 

1000 23607 15600 

Maximum numberofvariables (4,000) not achieved because too 

much computation time required. 

Number of objective function calls = 1 + number of mutations



GRUP{(10 , 100) evolution strategy (average values) 

Number of variables Number of generations Computation time 

in seconds 

3 4 1.81 

6 10 6.75 

10 17 16.8 

20 37 64.5 

30 55 145 

60 115 519 

100 194 1600 

200 377 5720 

300 551 13600 

(max) 435 854 28300 

Number of objective function calls: 

10 + 100 times number of generations. 

REKO{(10 , 100) evolution strategy with recombination (average 

values) 


in seconds 

3 4 2.67 

6 6 7.42 

10 13 23.3 

20 23 82.5 

30 34 177 

60 53 514 

100 84 1420 

200 136 4380 

300 180 9340 

(max) 435 289 21100 

Number of objective function calls: 

10 + 100 times number of generations.


Table 6.4: Results of all strategies for test Problem 1.2 

FIBO{Coordinate strategy with Fibonacci search 

Number of variables Number of cycles Number of objec- Computation time 

tive function calls in seconds 

3 8 928 0.68 

6 22 4478 4.44 

10 40 12644 15.6 

20 87 50265 102 

30 132 110423 298 

50 227 297609 1290 

60 282 422911 2120 

100 Search terminates prematurely 

GOLD{Coordinate strategy with golden section 



3 8 946 0.61 

6 22 4418 3.96 

10 40 12622 14.5 

20 86 50131 102 

30 133 111219 287 

50 226 296570 1330 

60 279 423471 2040 

100 Search terminates prematurely 

LAGR{Coordinate strategy with Lagrangian interpolation 



3 8 586 0.39 

6 22 2826 2.48 

10 40 8023 9.55 

20 87 32452 62.8 

30 134 70889 192 

60 272 263067 1320 

100 519 703130 5770 

150 Search terminates prematurely



HOJE{Direct search ofHooke and Jeeves 



3 11 65 0.04 

6 30 353 0.34 

10 26 502 0.62 

20 78 3035 5.70 

30 111 6443 16.3 

60 212 24801 119 

100 367 71345 547 

200 727 284060 4270 

300 1117 656113 14800 

DSCG{Davies-Swann-Campey method with Gram-Schmidt orthogonalization 



3 3 16 87 0.22 

6 7 55 195 0.87 

10 8 101 323 2.70 

20 16 361 1209 29.2 

30 21 691 2181 110 

40 42 1802 5883 484 

50 27 1451 4453 582 

60 44 2822 9308 1540 

(max) 75 87 6676 20365 5790 

DSCP{Davies-Swann-Campey method with Palmer orthogonalization 



3 3 16 84 0.22 

6 7 55 194 0.78 

10 8 101 324 1.54 

20 16 361 1208 10.3 

30 28 901 2809 33.8 

50 28 1501 4610 89.7 

75 79 6076 18591 547 

(max) 95 100 9691 29415 1090



POWE{Powell's method of conjugate directions 


variables iterations searches tive function calls in seconds 

70 

80 

90 

100 

(max) 135 

3 3 11 27 0.08 

6 5 35 77 0.30 

10 9 99 215 0.97 

20 17 354 744 4.82 

30 53 1621 3401 24.1 

40 search becomes in nite { no convergence 

50 175 8864 21532 235 

60 138 8367 19677 222 

9 

>= 

> 

search becomes in nite { no convergence 

DFPS{Stewart's modi cation of the Davidon-Fletcher-Powell method 

Number of Number of Number of objec- Computation time Fatal errors 

variables iterations tive function calls in seconds 

3 3 20 0.04 

6 4 41 0.14 

10 5 74 0.34 

20 7 178 1.36 

30 9 333 3.63 

60 13 926 19.7 

100 17 2003 67.9 

135 20 3190 145 

(max) 180 22 4757 270 2 oating 

divide checks



SIMP{Simplex method of Nelder and Mead (with restart) 

Number of Number of Number of objec- Computation time 

variables restarts tive function calls in seconds 

3 0 29 0.09 

6 1 173 1.06 

10 0 304 3.17 

20 0 2415 77.6 

30 0 8972 579 

40 2 28202 3030 

50 1 53577 8870 

60 1 62871 13700 

70 1 86043 25800 

ROSE{Rosenbrock's method with Gram-Schmidt 

orthogonalization 

Number of Number of Number of objec- Computation time 

variables orthog. tive function calls in seconds 

3 3 38 0.12 

6 4 182 0.82 

10 8 678 4.51 

20 12 2763 35.6 

30 14 5499 114 

40 19 10891 329 

50 21 15396 645 

60 23 20911 1130 

(max) 75 34 43670 3020 

COMP{Complex method of Box (2n vertices) 

Number of Number of objective Computation time 

variables function calls in seconds 

3 60 0.21 

6 302 2.06 

10 827 12.0 

20 5503 235 

30 24492 2330 

Search sometimes terminates prematurely 

40 Search always terminates prematurely 

All numbers are averages over several attempts



EVOL{(1+1) evolution strategy (average values) 

Number of variables Number of mutations Computation time 

in seconds 

3 85 0.33 

6 213 1.18 

10 728 6.15 

20 2874 44.4 

30 5866 136 

60 24089 963 

100 69852 4690 

150 152348 15200 

GRUP{(10 , 100) evolution strategy (average values) 


in seconds 

3 5 2.02 

6 14 9.36 

10 53 49.4 

20 183 326 

30 381 955 

50 1083 4400 

80 2977 18600 

100 4464 35100 

REKO{(10 , 100) evolution strategy with recombination (average 

values) 


in seconds 

3 6 2.44 

6 15 18.9 

10 42 76.2 

20 162 546 

30 1322 6920 

40 9206 61900 

Figures 6.2 to 6.13 translate the numerical data into vivid graphics. The abbreviations 

used here are: 

OFC stands for objective function calls 

ORT stands for orthogonalizations 

The parameters 1.1 and 1.2 refer to Problems 1.1 and 1.2 as mentioned above.


Figure 6.2: Coordinate strategy with Fibonacci search 

Figure 6.3: Coordinate strategy with golden section


Figure 6.4: Coordinate strategy with Lagrangian interpolation 

Figure 6.5: Strategy of Hooke and Jeeves


Figure 6.6: Strategy of Davies, Swann, and Campey with Gram-Schmidt orthogonalization 

Figure 6.7: Strategy of Davies, Swann, and Campey with Palmer orthogonalization 

.


no convergence 

Figure 6.8: Strategy of Powell with conjugate directions 

Figure 6.9: Strategy of Davidon, Fletcher, Powell, and Stewart as formulated 

by Lill (variable metric) 

no convergence


Figure 6.10: Strategy of Rosenbrock with Gram-Schmidt orthogonalization 

Figure 6.11: Left: Simplex strategy of Nelder and Mead, 

Right: Complex strategy of Box


Mutations 1.2 

Mutations 1.1 

Figure 6.12: (1+1) evolution strategy 

Figure 6.13: Left: (10 , 100) evolution strategy without recombination, 

Right: (10 , 100) evolution strategy with recombination


10 1 

10 0 

10 −1 

10 −2 

10 −3 

−4 

10 

10 0 

Computation time (sec) 

2 

(Number of variables) 

10 1 

(10,100) evolution strategy without recombination 

(1+1) evolution strategy 

Complex strategy of Box 

Strategy of Rosenbrock 

Simplex strategy of Nelder and Mead 

Strategy of Davidon−Fletcher−Powell−Stewart 

(10,100) evolution strategy, parallel 

10 2 

....... 

Figure 6.14: Result of the rst comparison test: 

computation times for Problem 1.1 

COMP SIMP 

10 3 

Strategy of Powell 

DSC−Strategy (Palmer−Orthog.) 

DSC−Strategy (Gram−Orthog.) 

Strategy of Hooke and Jeeves 

Coordinate / Lagrange 

Coordinate / Fibonacci 

................... 

Number of variables 

−4 

10 

ROSE 

GRUP 

EVOL 

FIBO 

HOJE 

LAGR 

DSCP/ 

DSCG 

POWE 

DFPS


10 0 

10 −1 

−2 

10 

−3 

10 

10 −4 

−5 

10 

10 0 

Computation time (sec) 

(Number of variables) 3 

10 1 

COMP 

Figure 6.15: Result of the rst comparison test: 

computation times for Problem 1.2 

Meanings of the symbols as in Figure 6.14 

10 2 

SIMP 

................................. 

Number of Variables 

DSCG 

ROSE 

GRUP 

FIBO/ 

GOLD 

LAGR 

EVOL 

DSCP 

POWE 

HOJE 

DFPS 

10 3


Points that deviated greatly from the trends have been omitted. To emphasize the 

di erences between the methods, instead of the computation time T the quantities T=n 2 

for Problem 1.1 and T=n 3 for Problem 1.2 have been plotted on a logarithmic scale. 

For solving Problem 1.1 nearly all strategies require computation times of the order 

of O(n 2 ). This corresponds to O(n) objective function calls, each requiring O(n) computation 

time. As expected, the most successful methods are the two that theoretically 

show quadratic convergence, namely the method of conjugate directions (Powell) and the 

variable metric method (DFPS). They obtain the solution within one iteration and n line 

searches respectively. For this simple problem, however, the same can be said for strategies 

with cyclic variation of the variables, since the search directions are the same. Of the 

three coordinate methods, the one with quadratic interpolation is a bit faster than the 

two which use sequential interval division. The latter two are of equal merit. The strategy 

of Davies, Swann, and Campey (DSC) also performs very well. Since the objective is 

reached within the rst n line searches, no orthogonalizations need to be carried out. For 

this reason too both versions yield identical results for n 75. 

The evolution strategies live up to expectations in so far as the number of mutations 

or generations increases linearly with n. The number of objective function calls 

and the computation times are, however, considerably higher than those of the previously 

mentioned methods. For r (0) =r (M ) = 10 the approximate theory of the two membered 

evolution strategy with optimal step length control predicts the number of mutations to be 

M ' (5 ln 10) n ' 11:5 n 

In fact nearly twice as many objective function calls (about 22 n) are required. This is 

partly because of the discrete way inwhichthevariances are adjusted and partly because 

the chosen reduction factor of 0.85 corresponds to a success rate below the optimal value 

of 0.27. The ASSRS (adaptive step size random search) method of Schumer and Steiglitz 

(1968), which resembles the simple evolution strategy, is presently the most e ective 

random method as far as we know. According to the experimental results of Schumer 

(1967) for Problem 1.2, taking into account the di erent initial and nal conditions, it 

requires about the same number of steps as the (1+1) evolution strategy. 

It is noteworthy that the (10 , 100) strategy without recombination only takes about 

10 times as much time as the (1+1) method, in spite of having to execute 100 mutations 

per generation. This factor of acceleration is signi cantly higher than the theory for a 

(1 , 10) version would indicate and is closer to the calculated value for a (1 , 30) strategy. 

In the case of many variables, recombination further reduces the required number of 

generations by two thirds. This is less apparent in the computation time that is increased 

by the extra arithmetic operations, compared to the relatively inexpensive calculation of 

one objective function value. Thus, in the gures showing computation times only the 

(10 , 100) evolution without recombination has been included. 

The strategy of Hooke and Jeeves appears to require computation times rather more 

than O(n 2 )onaverage for many variables, nearer O(n 2:2 ). This arises from the slight 

increase with n of the number of exploratory moves. The likely cause is the xed initial 

step length, which for problems with manyvariables is signi cantly too big and must rst 

be reduced to the appropriate size. Three search strategies exhibit strikingly di erent 

behavior.


The method of Rosenbrock requires computation times on the order of O(n 3 ). This 

can be readily understood. Up to the single exception of n = 30, in each case one or two 

orthogonalizations are performed. The Gram-Schmidt method employed performs O(n 3 ) 

operations. If the number of variables is large the orthogonalization time is of major 

signi cance whenever the time for one function call increases less than quadratically with 

the number of variables. One can see here that the number of objective function calls is 

not always su cient tocharacterize the cost of a strategy. In this case the DSC method 

succeeds with no orthogonalizations. The introduction of quadratic interpolation proves 

to give better results than the single step method of Rosenbrock. 

Computation times for the simplex and complex strategies also increase as n 3 ,oreven 

somewhat more steeply with n for many variables. The determining factor for the cost in 

this case is calculating the centroid of the simplex (or complex), about which theworst 

of the (n +1)or2n vertices is re ected. This process takes O(n 2 ) additions. Since the 

number of re ections and objective function calls increases as n, the cost increases, simply 

on this basis, as O(n 3 ). Even in this simplest of all quadratic problems the simplex of the 

Nelder-Mead method collapses if the number of variables is large. To avoid premature 

termination of the optimum search, in the presently used algorithm for this strategy the 

simplex is initialized again. The search can thereby be prevented from stagnating in a 

subspace of IR n , but the required computation time increases even more rapidly than 

O(n 3 ). The situation is even worse for the complex method of Box. The author suggests 

using 2 n vertices for problems with few variables and considers that this number could 

be reduced for many variables. However, the attempt to solve Problem 1.1 for n = 30 

with a complex of 40 vertices fails in one of three cases with di ering sequences of random 

numbers: i.e., the search process ends before achieving the required approximation to 

the objective. For n = 40 and 50 vertices the complex collapsed prematurely in all 

three attempts. With 2 n vertices the complex strategy is successful up to the maximum 

possible number of variables, n = 95. Here again, however, for n>30 the computation 

time increases faster than O(n 3 ) with the number of parameters. It is therefore dubious 

whether the search would have been pursued to the point of reaching the maximum 

internally speci ed accuracy. 

The second order methods only distinguish themselves from other strategies for solving 

Problem 1.1 in that their required computation time 

T = cn 2 c = const: 

is characterized by a small constant of proportionality c. Their full capabilities should 

become apparent in solving the true quadratic problem (Problem 1.2). The variable 

metric method lives up to this expectation. According to theory it has the property Qn, 

which means that after n iterations, n 2 line searches, and O(n 3 ) computation time the 

problem should be solved. It comes as something of a surprise to nd that the numerical 

tests indicate a requirement for only about O(n 0:5 ) iterations and O(n 2:5 ) computation 

time. This apparent discrepancy between theory and experiment is explained if we note 

that the property Qnsigni es absolute accuracy within at most n iterations, while in 

this example only a nite reduction of the uncertainty interval is required. 

More surprising than the good results of the DFPS method is the behavior of the


strategy of Powell, which in theory is also quadratically convergent. Not only does it 

require signi cantly more computation time, it even fails completely when the number of 

parameters is large. And in the case of n =40variables the step length goes to zero along 

achosen direction. The convergence criterion is subsequently not satis ed and the search 

process becomes in nite it must be interrupted externally. For n =50andn = 60 the 

Powell method does converge, but for n = 70 80 90 100 and 130 it fails again. The 

origin of this behavior was not investigated further, but it may well have to do with the 

objection raised by Zangwill (1967) against the completeness of Powell's (1964) proof of 

convergence. It appears that rounding errors combined with small step lengths in the one 

dimensional search can cause linearly dependent directions to be generated. However, 

independence of the n directions is the precondition for them to be conjugate to each 

other. 

The coordinate strategies also fail to converge when the number of variables in Problem 

1.2 becomes very large. With the Fibonacci search and golden section as interval 

division methods they fail for n 100, and with quadratic interpolation for n 150. 

For successful line searching the step lengths would have to be smaller than allowed by 

the nite word length of the computer used. This phenomenon only occurs for many variables 

because the condition of the matrix of coe cients in Problem 1.2 varies as O(n 2 ). 

In this proportion the elliptic contour surfaces F (x) = const: become gradually more 

extended and the relative minimizations along the coordinate directions become less and 

less e ective. This failure is typical of methods with variation of individual parameters 

and demonstrates how important it can be to choose other search directions. This is 

where random directions can prove advantageous (see Chap. 4). 

Computation times for the method of Hooke and Jeeves and the method of Davies- 

Swann-Campey (DSC) clearly increase as O(n 3 )ifPalmer orthogonalization is employed 

for the latter. For the method of Hooke and Jeeves this corresponds to O(n) exploratory 

moves and O(n 2 ) function calls for the DSC method it corresponds to O(n) orthogonalizations 

and O(n 2 ) line searches and objective function evaluations. The original 

Gram-Schmidt procedure for constructing mutually orthogonal directions requires O(n 3 ) 

rather than O(n 2 ) arithmetic operations. Since the type of orthogonalization seems to 

hardly alter the sequence of iterations, with the Gram-Schmidt subroutine the DSC strategy 

takes O(n 4 ) instead of O(n 3 ) basic operations to solve Problem 1.2. For the same 

reason the Rosenbrock method requires computation times that increase as O(n 4 ). It 

is, however, striking that the single step method (Rosenbrock) in conjunction with the 

suppression of orthogonalization until at least one successful step has been made in each 

direction requires less time than line searching, even if only one quadratic interpolation 

is performed. In both these methods the number of objective function calls, which isof 

order O(n 2 ), plays only a secondary rôle. 

Once again the simplex and complex strategies are the most expensive. From n =30, 

the method of Nelder and Mead does not come within the required distance of the objective 

without restarts. Even for just six variables the search simplex has to be re-initialized 

once. The number of objective function calls increases approximately as O(n 3 ), hence 

the computation time increases as O(n 5 ). The strategy of Box with 2 n vertices shows 

a correspondingly steep increase in the time with the number of variables. For n =30


Problem 1.2 was actually only solved in one out of three attempts, and for n = 40 not at 

all. If the number of vertices of the complex is reduced to n + 10 the method fails from 

n = 20. 

As in Problem 1.1, the cost of the evolution strategies increases rather smoothly with 

the number of parameters{more so than for several of the deterministic search methods. 

To solve Problem 1.2, O(n 2 ) objective function calls are required, corresponding to O(n 3 ) 

computation time. Since the distance to be covered is no greater than it was in Problem 

1.1, the greater cost must have been caused by the locally smaller curvatures. These are 

related to the lengths of the semi-axes of the contour ellipsoids. Because of the regular 

structure of the matrix of coe cients A of the quadratic objective function in Problem 

1.2, the condition number K, the ratio of greatest to least semi-axes (cf. test Problem 

1.2 in Appendix A, Sect. A.1) 

K = amax 

amin 

can be considered as the only quantity of signi cance in determining the geometry of the 

contour pattern. The remaining semi-axes will distribute themselves uniformly between 

amin and amax. The fact that K increases as O(n 2 ) suggests that the rate of progress ', 

the average change in the distance from the objective per mutation or generation, only 

decreases as the square root of the condition number. There is so far no theory for the 

general quadratic case. Such a theory will also look more complicated, since apart from 

the ratio of greatest to smallest semi-axis a further n ; 2 parameters that determine the 

shape of the hyperellipsoid will play arôle. The position of the starting point will also 

have an e ect, although in the case of many variables only at the beginning of the search. 

After a transition phase the starting point ofmutations will always lie in the vicinity ofa 

point where the objective function contour surfaces are most curved. In the sphere model 

theory of Rechenberg, if r is regarded as the average local radius of curvature, the rate of 

progress at worst should become inversely proportional to the square root of the condition 

number. The convergence rate of the evolution strategy would then be comparable to that 

of the strategy of steepest descents, for which function values of two consecutive iterations 

in the quadratic case are in the ratio (Akaike, 1960) 

amax ; amin 

amax + amin 

Compared to other methods having costs in computation time that increase as O(n 3 ), 

the evolution strategies fare better than they did in Problem 1.1. Besides the fact that 

the coordinate strategies do not converge at all when the number of variables becomes 

large, they are surpassed in speed by thetwo membered evolution strategy. The relative 

performance of the two membered and multimembered evolution strategies without 

recombination remains about the same. 

The behavior of the (10 , 100) evolution strategy with recombination deviates from that 

of the other versions. It requires considerably more computation time to solve Problem 

1.2. This can be attributed to the fact that, although the probability distribution for 

mutation steps alters, it cannot adapt continuously to the local conditions. Whilst the 

mutation ellipsoid, the locus of all equiprobable mutation steps, can extend and contract 

2


along the coordinate directions, it cannot rotate in the space. To do so, not only the 

variances but also the orientation or covariances would need to be variable (for such 

an extension see Chap. 7 and subroutine KORR). As the results show, starting from a 

spherical shape the mutation ellipsoid adopts a con guration that initially accelerates the 

search process. As it progresses towards the objective the ellipsoid must become smaller 

but it should also gradually rotate to follow the orientation of the contour lines. That 

is not possible because the mechanism adopted here allows no mutual dependence of the 

components of the random vector. The ellipsoid rst has to form itself into a sphere 

again, or to become generally small, before it extends again with the longer axes in new 

directions. This awkward process actually occurs, but it causes an appreciable delay in 

the search. 

There is a further undesirable phenomenon. Supposing that a single variance suddenly 

becomes very much smaller. The associated variation in the variables then takes place 

in an (n ; 1)-dimensional subspace of IR n (for a more detailed analysis see Schwefel, 

1987). Other things being equal, the probability of a success is thereby greater than if 

all the parameters had varied. Step length alterations of this kind are therefore favored 

and, together with the resistance to rotation of the mutation ellipsoid, they enhance the 

unstable behavior of the strategy with recombination. This can be prevented by having a 

large population, in which there is always a su cient supply of di erent kinds of parameter 

combinations for the variances as well. Another possibility istoallow one individual to 

execute several consecutive mutations with one setting of the step length parameters. 

Then the overall success depends rather less on the instantaneous probability of success 

and more on the size of the partial successes. The quality of the strategy parameters is 

thereby assessed more objectively. It should be noticed that Problem 1.2 is actually the 

only one in which recombination appears troublesome. In many other cases it led to a 

reduction in the computation cost, even in the simple form applied here (see second and 

third test). 

6.3.3.2 Second Test: Reliability 

Convergence in the quadratic case is a minimum requirement of non-linear optimization 

methods. The unsatisfactory results of the coordinate strategies and of Powell's method 

for a large number of variables con rm the necessityofnumerical tests even when convergence 

is assured by theory. Even more important, in fact unavoidable, are experimental 

tests of the reliabilityofconvergence of optimization methods on non-quadratic, non-linear 

problems. Some methods with an internal quadratic model of the objective function have 

to be modi ed in order to deal with more general problems. Such, for example, is the 

method of conjugate gradients. The method of Fletcher and Reeves (1964) actually terminates 

after the relative minimum has been obtained in each ofn conjugate directions. 

However, for higher order objective functions the optimum will not have been reached 

after n iterations. Even in quadratic problems, if they are ill-conditioned, more iterations 

may be required. There are two possible ways to proceed. Either the iteration process can 

be formally continued beyond n line searches or it can be repeated in a cyclic way. Fletcher 

and Reeves recommend destroying all the accumulated information after each setofn +1


iterations and beginning again, i.e., with uncorrected gradient directions. This procedure 

is said to be more e ective for non-quadratic objective functions. On the other hand, 

Fox (1971) suggests that a periodic restart of the search can prevent convergence in the 

quadratic case, whereas a simple continuation of the sequence of iterations is successful. 

Further suggestions for the way to restart are made by Fletcher (1972a). 

The situation is similar for the quasi-Newton methods in which the Hessian matrix or 

its inverse is approximated in discrete steps. Some of the proposed formulae for improving 

the approximation matrix can lead to division by zero sometimes due to rounding errors 

(Broyden, 1972), but in other cases even on theoretical grounds. If the Hessian matrix has 

singular points, the optimization process stagnates before reaching the optimum. Bard 

(1968) and others recommend as a remedy replacing the approximation matrix from time 

to time by the unit matrix. The information gathered over the course of the iterations 

is destroyed again in this process. Pearson (1969) proposes a restart period of 2 n cycles, 

while Powell (1970b) suggests regularly adding steps di erent from the predicted ones. It 

is thus still true to say of the property of quadratic termination that its \relevance for 

general functions has always been questionable" (Fletcher, 1970b). No guarantee is given 

that Newtonian directions are better than the (anti-) gradient. 

As there is no single objective function that can be taken as representative for determining 

experimentally the properties of a strategy in the non-quadratic case, as large 

and as varied a range of problem types as possible must be included in the numerical 

tests. To a certain extent, it is true to say that the greater their number and the more 

skillfully they are chosen, the greater the value of strategy comparisons. Some problems 

have become established as standard examples, others are added to each experimenter's 

own taste. Thus in the catalogue of problems for the second series of tests in the present 

strategy comparison, both familiar and new problems can be found the latter were mainly 

constructed in order to demonstrate the limits of usefulness of the evolution strategies. 

It appears that all the previously published tests use as a basis for judging perfor- 

mance the number of function calls (with objective function, gradient, and Hessian matrix 

weighted in the ratio 1 : n : n 

(n+1)) and the computation time for achieving a prescribed 

2 

accuracy. Usually the objective functions considered are several times continuously differentiable 

and depend on relatively few variables, and the results lack compatibility from 

problem to problem and from strategy to strategy. With one method, a rst minimum 

may be found very quickly, and a second much moreslowly another method may work 

just the opposite way round. The abundance of individual results actually makes a comprehensive 

judgement more di cult. Hence average values are frequently calculated for 

the required computation time and the number of function calls. Such tests then result 

in establishing that second order methods are faster than rst order and these in turn 

are faster than direct search methods. These conclusions, which are compatible with 

the test results for quadratic problems, lead one to suspect that the selected objective 

functions behave quadratically, at least in the neighborhood of the objective. Thus it 

is also frequently noted that, at the beginning of a search, gradient methods converge 

faster, whereas towards the end Newton methods are faster. The average values that 

are measured therefore depend on the chosen starting point and the required closeness of 

approach to the objective.


The assessment is tricky if a method does not converge for a particular problem but 

terminates the search following its own criteria without getting anywhere near the solution. 

Any strategy that fails frequently in this way cannot be recommended for use in practice 

even if it is especially fast in other cases. In a practical problem, unlike a test problem, 

the correct solution is not, of course, known in advance. One therefore has to be able 

to rely on the results given by a strategy if they cannot be checked by another method. 

Hence, reliability is just as important a criterion for assessing optimization methods as 

speed. 

The second part of the strategy comparison is therefore designed to test the robustness 

of the optimization methods. The scale for assessing this is the number of problems that 

are solved by agiven method. Since in this respect it is the complexity rather than size 

of the problem that is signi cant, the number of variables ranges only from one to six. 

All numerical iteration methods in practice can only approximate a solution with a 

nite accuracy. In order to be able either to accept the end result of an optimum search 

as adequate, or to reject it as inadequate, a border must be de ned explicitly, on one side 

of which the solution is exact enough and on the other side of which it is unsatisfactory. 

It is the structure of the objective function that is the decisive factor determining the 

accuracy that can be achieved (Hyslop, 1972). With this in mind the border values for 

the purpose of ranking the test results were obtained by the following scheme. Starting 

from the known exact or best solution 

x =(x 1 x 2 ::: x n ) T 

the variables were individually altered by the amounts 

4xi = 

( 

for x i =0 

x i for x i 6= 0 

in all combinations. For example for n = 2 one obtains eight di erent test values of the 

objective function (see Fig. 6.16). In the general case there are 3 n ; 1 di erent values. 

The greatest deviation 4F ( ) from the optimal value F (x ) de nes the border between 

results that approach the objective su ciently closely and results that do not. To obtain 

anumber of grades of merit, four di erent test increments j j = 1(1)4 were selected: 

1 =10 ;38 

2 =10 ;8 

3 =10 ;4 

4 =10 ;2 

A problem is deemed to have been solved \exactly" at ~x if 

F (~x) F (x )+4F ( 1) 

is attained. On the other hand, if at the end of the search 

F (~x) >F(x )+4F ( 4)


x 2 

x 

1 

Optimum 

Test position 

Figure 6.16: Eight di erent testvalues of the objective function in case of n =2 

the strategy employed has failed. Three intermediate classes of approximation are de ned 

in the obvious way. 

The maximum possible accuracy was required of all strategies. The corresponding 

free parameters of the strategies that enter the termination criteria have already been 

de ned in Table 6.2. In contrast to the rst test, no additional common termination rule 

was employed. 

A total of 50 problems were to be solved. The mathematical formulations of the 

problems are given in Appendix A, Section A.2. Some of them are only distinguished by 

the chosen initial conditions, others by the applied constraints. Nine out of 14 strategies 

or versions of basic strategies are not suited to solving constrained problems, at least not 

directly. Methods involving transformations of the variables and penalty function methods 

were not employed. An exception is the method of Rosenbrock, which only alters the 

objective function near the boundaries and can be applied in one pass otherwise penalty 

functions require a sequence of partial optimizations to be executed. The second series of 

tests therefore comprises one set of 28 unconstrained problems for all 14 strategies and a 

second set of 22 constrained problems for 5 of the strategies. The results are displayed 

together in Tables 6.5 to 6.8. The approximation to the objective that has been achieved 

in each case is indicated by a corresponding symbol, using the classes of accuracy de ned 

above. 

Any interesting features in the solution of individual problems are documented in the 

Appendix A, Section A.2, in some cases together with a brief analysis. Thus at this point 

it is only necessary to make some general observations about the reliability of the search 

methods for the totality of problems. 

Unconstrained Problems 

The results of the three versions of the coordinate strategies are very similar and generally 

unsatisfactory. A third of all the problems cannot be solved with them at all, or only 

very inaccurately. Exact solutions ( =10 ;38 ) are the exception and only in less than a


Table 6.5: Results of all strategies in the second comparison test, 

unconstrained problems 

Problem F G L H D D P D S R C E G R 

I O A O S S O F I O O V R E 

B L G J C C W P M S M O U K 

No. O D R E G P E S P E P L P O 

2.1 3 3 3 1 1 1 2 1e 2n 1 2 1 1 1 

2.2 3 3 3 1 1 1 1 2 2a 1a 2 1 1 1 

2.3 2 2 3 1 1 1 1 1 1n 1a 4 1 1 1 

2.4 3 3 3 1 1 1 2 2e 3 1 3 1 1 1 

2.5 2 2 2 2 1 1 2 1e 2a 1 2 1 1 1 

2.6 5 5 5 5 2 2 5 3 2a 1 3 1 1 1 

2.7 5 5 4 2 5ea 5ea 5e 5 3 5 3 2 1 1 

2.8 5 5 4 3 5ea 5ea 5e 5 3 5 2 1 1 1 

2.9 3 3 3 2 2 2 2e 2 1a 3 1 1 2 1 

2.10 5 5 4 3 2 2 5a 4e 4n 2 2 4 3 1 

2.11 5 5 5 3 2 2 2 4 4n 2 2 4 3 1a 

2.12 5 5 5 3 2 3 2 4e 2 2 2 4 3 1a 

2.13 3 3 3 2 2 1 3 2 3 3 3 2 1 1 

2.14 3 3 3 2 2 2 2a 5e 2n 2 2 3r 3r 3r 

2.15 3 3 3 2 2 2 2ea 3 2 2 2 3r 3r 3r 

2.16 2 1 2 2 1 1 2 2 2n 2 3 3 2 2 

2.17 2 2 1 2 2 2 2e 2 1a 2 1 1 1 1 

2.18 5 5 5 2 2 2 2e 2 1an 1 1 1 1 1 

2.19 5 5 5 5 2 2 5 2e 2 2 3 3 2 3 

2.20 2 2 2 2 3 2 2 1e 3n 2 2 1 1 1 

2.21 5 5 5 2 4 2 5 2e 5a 2 5 1 1 1 

2.22 2 2 2 2 2 2 2 5 5a 2 5 1 1 1 

2.23 1 1 1 1 1 1 5a 5e 1a 1a 1 1 1 1 

2.24 3 3 3 2 1 1 2 2 2 2 2 2 2 1 

2.25 3 3 3 2 1 1 2e 2e 3 1 3 1 1 1 

2.26 1 1 2 1 1 1 1e 1 1n 1 4 1 1 1 

2.27 1 1 5 2 1 1 1 5 1 1 2 1 1 1 

2.28 4 4 4 3 4 3 2 4e 2 3 1 4 4 3 

Sum 91 90 93 61 56 52 74 79 65 54 68 51 51 37 

Meaning of the number and letter symbols used above: 

1 Accuracy achieved better than 10 ;38 




5 Accuracy achieved worse than 10 ;2 

e Fatal execution error ( oating over ow, oating divide check) 

a Termination rule ine ective search in nite with no further convergence 

r Computation time too long or convergence too slow search terminated 

n Concerns the simplex method of Nelder and Mead: restart(s) required


third of all cases are the end results good ( 10 ;8 ). As already shown by the quadratic 

objective function models, it appears again that progress along the directions of the unit 

vectors becomes possible only in very small step lengths. The limit of smallest possible 

changes in the variables, as de ned by the nite word and mantissa lengths of the digital 

computer, is often reached before the search has come su ciently close to the objective. 

The three methods with rotating axes also behave similarly to one another, namely 

the strategies of Rosenbrock and of Davies, Swann, and Campey. Although the choice 

of orthogonalization method (Gram-Schmidt or Palmer) has a considerable e ect on the 

computation times it makes little di erence to the accuracies achieved. If \exact" solutions 

are required, all three methods prove useful in about 4 out of 10 cases. This proportion 

is doubled if the accuracy requirement islowered by a grade. Two problems (Problems 

2.7 and 2.8) are not solved by anyofthethreevariants. In the Rosenbrock method, the 

search isendedavery long way from the objective, while in the DSC method a line search 

becomes in nite. To prepare for the single quadratic interpolation it uses a subroutine for 

bounding the relative minimum in the chosen direction. In this case, however, the relative 

minimum is situated at in nity thus, after some time, the range of numbers that can be 

handled by the computer is exceeded. It eventually makes a fatal execution error with 

the message: \ oating over ow." In most computers, a program would terminate at this 

point, but the PDP 10 continues the calculation using its largest number 2 127 in place of 

the value that exceeded the number range. Nevertheless the bounding procedure does not 

end because in the DSC method any steps that do not change the value of the objective 

function are also regarded as successful. The convergence criterion is not tested within this 

subroutine, so the whole procedure becomes in nite without any further changeinvalue 

of the objective function. It must be terminated externally. The convergence criterion of 

the Rosenbrock method fails in three cases, in spite of the fact that the exact solutions 

have already been found. It is noted on the tables wherever fatal execution errors occur or 

the optimization does not terminate normally. With 11 or 12 exact results, and altogether 

23 good results, these three rotating axes methods rank highly. 

Fatal errors occur especially frequently in applying the more \thoroughbred" methods, 

the method of Powell and the DFPS strategy. They are not always accompanied by 

termination di culties or bad nal results. The accuracies achieved have therefore been 

evaluated independently of the execution errors. Good approximations, of which there are 

20 (Powell) and 16 (DFPS) out of 28, are also less frequent than in the orthogonalization 

strategies. In many cases both of these methods that are so advantageous in theory 

completely fail to approach the desired solution usually in the same problems that present 

di culties with the much simpler coordinate methods. 

Apart from failure of a line search because of a relative minimum at in nity, the causes 

are: 

The confusion of minima and saddle points because of ambiguity in quadratic interpolation 

(Problem 2.19 for the Powell strategy, Problem 2.27 for the variable metric 

method) 

Discontinuities in the objective function or its derivatives (Problems 2.6, 2.21, 2.22) 

A singular Hessian matrix (Problem 2.14 in the DFPS method)


However, even a completely regular, several times di erentiable objective function of 10th 

order (Problem 2.23) is not managed by either of the quadratically convergent strategies. 

Their concept of using all the data that can be accumulated during the iterations to 

adjust their internal quadratic model apparently leads to completely wrong predictions of 

favorable directions and step lengths if the function is of appreciably higher than second 

order. Not one of the other direct search methods fails on this problem in fact they all 

nd the exact solution. 

With Powell's method one can choose between two di erent convergence criteria. The 

di erence between the stricter one and the simple one is that the former displaces slightly 

the best position obtained after the sequence of iterations has ended normally and searches 

again for the minimum. The search is only nally terminated if both results are the same 

within the speci ed accuracy. Otherwise the search iscontinued after a line search in 

the direction of the di erence vector between the two solutions. Because of the extreme 

accuracy requirements in the present cases the search usually ends with the message that 

rounding errors in the objective function prevent any closer approach to the objective. In 

such cases no additional variation in the nal result is made. Even in other cases, the 

stricter convergence criterion only makes a very slight improvement of the results the 

grades of merit of the results are not changed at all. In four problems the search becomes 

in nite because the step lengths vanish and the termination criterion is no longer tested. 

The search has to be terminated externally. Fatal execution errors occur very frequently. 

In three cases there is a \ oating over ow" and in seven cases a \ oating divide check." 

This concerns a total of eight problems. The DFPS strategy is even more susceptible. 

There are ve occurrences of \ oating over ow" and eleven of \ oating divide check." 

Twelve problems are involved. 

In contrast, the direct search ofHooke and Jeeves works without errors, but even this 

method fails on two problems one because of sharp corners in the pattern of contour lines 

(Problem 2.6) and another in the neighborhood of a stationary point withavery narrow 

valley leading to the objective (Problem 2.19). Nevertheless it yields 6 exact solutions 

and 21 good approximations. 

The overall behavior of the simplex and complex strategies is similar, but there are 

di erences in detail. There are 17 good solutions together with 6 exact ones to set against 

two failures (Problems 2.21 and 2.22). These are provoked by edges on the contour 

surfaces in the multidimensional space. The restart rule in the Nelder-Mead method 

is invoked during 9 of the solutions. The termination criterion based only on function 

values at the simplex corners does not operate in 9 cases. The optimum search becomes 

in nite with no apparent improvement in the objective function values. The results of 

the complex strategy depend strongly on the initial con guration, which is determined 

by random numbers. In this case the evaluation was made for the best of three attempts 

each with di erent sequences of pseudorandom numbers. It is especially worth noting the 

performance of the complex method in solving Problem 2.28, for which it is better than 

all the other methods. 

All three versions of the evolution strategy are distinguished by the fact that in no case 

do they completely fail, and they are able to solve far more than half of all the problems 

exactly (in the sense de ned above). Since their behavior, like that of the complex method,


is in uenced by random numbers, the same rule was followed: namely, out of three tests 

the one with the best end result was accepted. In contrast to the strategy of Box, however, 

the evolution methods prove to be less dependent on the actual sequence of random 

numbers. This is especially true of the multimembered versions. Recombination almost 

always improves the chance of getting very close to the desired solutions. Fatal errors 

due to exceeding the maximum number range or dividing by zero do not occur by virtue 

of the simple computational operations in these strategies. Discontinuities in the partial 

derivatives, saddle points, and the like have noobvious adverse e ects. The search does, 

however, become rather time consuming when the minimum is reached via a long, narrow 

valley. The step lengths or variances that are set in this case are very small and impose 

slow convergence in comparison to methods that can perform a line search along the 

valley. The average rate of progress of an evolution strategy is not, however, a ected by 

bends in the valley, which would retard a one dimensional minimization procedure. Line 

searches only a ord a signi cant advantage to the rate of progress if there are directions in 

the space along which successful steps can be made of a size that is large compared to the 

local radius of curvature of the objective function contour surface. Examples are provided 

by Problems 2.14, 2.15, and 2.28. In these cases, long before reaching the minimum the 

optimal variances of the evolution methods have reached the lower limit as determined 

by the machine accuracy. The desired solution cannot therefore be approximated to the 

required accuracy. In Problems 2.14 and 2.15 the computation time limit did not allow 

the convergence criterion to be satis ed although it was actually progressing slowly but 

surely, the search was terminated. 

Di culties with the termination rule based on function values only occurred in the 

solution of one type of problem (Problems 2.11, 2.12) using the (10 , 100) evolution strategy 

with recombination. The multimembered method selects the 10 best individuals of a 

generation only from the current 100 descendants. Their 10 parents are not included in 

the selection process, for reasons associated with the step length adaptation. In general, 

the objective function value of the best descendant is closer to the solution than that 

of the best parent. In the case of the two problems referred to above, this is initially 

the case. As the solution is approached, however, it happens more and more frequently 

that the best value occurring in a generation is lost again. This is related to the fact 

that because of rounding errors in evaluating values near the minimum, the objective 

function behaves practically stochastically. Thus the population wanders around in the 

neighborhood of the (quasi-singular) optimal solution without being able to satisfy the 

convergence criterion. These di culties do not beset the other search methods, including 

the multimembered evolution without recombination, because they do not come nearly so 

close to the optimum. The fact that the third problem of the same type (Problem 2.10) 

is solved without di culties in a nite time, even with recombination, can be considered 

a uke. Here too the minimum was reached long before the termination criterion was 

satis ed. On the whole, the multimembered evolution strategy with recombination is the 

surest and safest of all the search methods tested. In only 5 out of 28 cases is the solution 

not located exactly, and the greatest deviations of the variables were in the accuracy class 

=10 ;4 .


Table 6.6: Summary of the results from Table 6.5 

Strategy Total number of problems No solution Fatal No normal 

solved in the accuracy class or >10 ;2 computation termination 

10 ;38 10 ;8 10 ;4 10 ;2 errors 

FIBO 3 9 18 19 9 0 0 

GOLD 4 9 18 19 9 0 0 

LAGR 2 7 17 21 7 0 0 

HOJE 6 21 26 26 2 0 0 

DSCG 11 23 24 26 2 2 2 

DSCP 12 24 26 26 2 2 2 

POWE 4 20 21 21 7 8 4 

DFPS 5 16 18 22 6 12 0 

SIMP 7 18 24 26 2 0 9 

ROSE 11 23 26 26 2 0 3 

COMP 5 17 24 26 2 0 0 

EVOL y 17 20 24 28 0 0 0 

GRUP y 18 22 27 28 0 0 0 

REKO y 23 24 28 28 0 0 2 

Table 6.6 presents again a summary of the number of unconstrained problems that 

were solved with given accuracy by the search methods under test, together with the number 

of unsolved problems, the number of cases of fatal execution errors, and the number 

of cases in which the termination criteria failed. 

Constrained Problems 

Tables 6.7 and 6.8 show the results of 5 strategies in the 22 constrained problems. Execution 

errors such as exceeding the number range or dividing by zero did not occur in 

any case. Neither were there any di culties in the termination of the searches. 

The method of Rosenbrock can only be applied if the starting point of the search lies 

within the allowed or feasible region. For this reason the initial values of the variables in 

seven problems had to be altered. All other methods very quickly found a feasible solution 

to start with. As in the unconstrained problems, the strategies that depend on random 

numbers were each run three times with di erent sequences of random numbers. The 

best of the three results was accepted for evaluation. The results of the complex method 

and the two membered evolution turned out to be very variable in quality, whereas the 

multimembered versions of the strategy, especially with recombination, proved to be less 

in uenced by the particular random numbers. Two problems (Problems 2.40 and 2.41) 

caused great di culty to all the search methods. These are simple linear programs that 

can be solved rapidly and exactly by, for example, the simplex method of Dantzig. In 

y Search terminated twice in each case due to too slow convergence


Table 6.7: Results of all strategies in the second comparison test, 

constrained problems 

Problem No. ROSE COMP EVOL GRUP REKO 

2.29 3 1 4 3 3 

2.30 1 5 1 1 1 

2.31 3v 3 1 1 1 

2.32 3v 3 1 1 1 

2.33 3 2 5 4 1 

2.34 1 2 3 3 2 

2.35 3v 1 4 4 4 

2.36 1 1 1 1 1 

2.37 3 1 1 1 1 

2.38 3v 3 1 1 1 

2.39 3 3 4 4 3 

2.40 5 5 5 5 5 

2.41 5 5 5 5 5 

2.42 3 3 2 2 1 

2.43 3v 3 2 2 1 

2.44 1 5 1 1 1 

2.45 3 2 4 2 1 

2.46 3 2 3 3 1 

2.47 3v 1 1 1 1 

2.48 3v 3 1 1 1 

2.49 3 2 4 3 1 

2.50 3 1 1 1 1 

Sum 62 57 55 50 38 

The meaning of the symbols is as in Table 6.5 \v" is 

used in connection with the Rosenbrock method for 

constrained cases: The starting point hadtobe 

displaced since it was not feasible for this method. 

each case the closest to the objective was again the (10 , 100) evolution strategy with 

recombination, but even that result had to be classi ed as \no solution." 

On the whole the evolution methods cope with constrained problems no worse than the 

Rosenbrock or complex strategies, but they do reveal inadequacies that are not apparent 

in unconstrained problems. In particular the 1=5 success rule for adapting the variances 

of the mutation step lengths in the (1+1) evolution strategy appears to be unsuitable for 

attaining an optimal rate of convergence when several constraints become active. 

In problems with active constraints, the tendency of the evolution methods to follow 

the average gradient trajectory causes the search to come quickly up against one or more 

boundaries of the feasible region. The subsequent migration towards the objective along


Table 6.8: Summary of the results from Table 6.7 

Total number of problems 

solved with accuracy class No solution 

Strategy 10 ;38 10 ;8 10 ;4 10 ;2 or >10 ;2 

ROSE 4 4 20 20 2 

COMP 6 11 18 18 4 

EVOL 10 12 14 19 3 

GRUP 10 13 17 20 2 

REKO 16 17 19 20 2 

such edges takes considerable e ort and time. In Figure 6.17 the situation is illustrated 

for the case of two variables and one constraint. 

The contours of the objective function run at a narrow angle to the boundary of 

the region. Foramutation to count as successful it must fall within the feasible region 

as well as improve the objective function value. For simplicity let us assume that all the 

mutations fall on the circumference of a circle about the current starting point. In the case 

of many variables this point ofviewisvery reasonable (see Chap. 5, Sect. 5.1). To start 

with the center of the circle (P 1) will still lie some way from the boundary. If the angle 

between the contours of the objective function and the edge of the feasible region is small 

and the step size, or variance of the mutation step size, is large then only a small fraction 

of the mutations will be successful (thickly drawn part of the circle 1). The 1=5 success 

rule ensures that this fraction is raised to 20%, which if the angle is small enough can 

only be achieved by reducing the variance to 2. The search point P is driven closer and 

closer to the boundary and eventually lies on it (P 2). Since there is no longer any nite 

step size that can provide a su ciently large success rate, the variance is permanently 

reduced to the minimum value speci ed in the program. Depending on the particular 

problem structure and the chosen values of the parameters in the convergence criteria the 

search is either slowly continued or it is terminated before reaching the optimum. The 

more constraints become active during the search, the smaller is the probability that the 

objective will be closely approached. In fact, even in problems with only two variables 

and one constraint (Problem 2.46) the angle between the contours and the edge of the 

feasible region can become vanishingly small in the neighborhood of the minimum. 

Similar situations to the one depicted in Figure 6.17 can even arise in unconstrained 

problems if the objective function displays discontinuities in its rst partial derivatives. 

Examples of this kind of behavior are provided by Problems 2.6 and 2.21. If only a 

few variables are involved there is still a good chance of reaching the objective. Other 

search methods, especially those which execute line searches, are generally defeated by 

such points of discontinuity. 

The multimembered evolution strategy, although it works without a rigid step length 

adaptation, also loses its otherwise reliable convergence characteristics when the region of


Forbidden 

region 

α 

σ1 

σ2 

P 1 

Circles : lines of equal probability of a step 

P 2 

Negative 

gradient 

direction 

Figure 6.17: The situation at active constraints 

σ2 

To the 

minimum 

Lines of 

constant F(x) 

success is very much narrowed down by constraints. While the individuals are not yet at 

the edge of the feasible region, those descendants whose step lengths have become smaller 

have a higher probability of survival. Thus here too the entire population eventually 

concentrates itself in a smaller and smaller area at the edge of the feasible region. 

The theory of the rate of progress in the corridor model did not foresee this kind 

of di culty, indeed it gives an optimal success rate, almost the same as in the sphere 

model, simply because the gradient vector of the objective function always runs parallel 

to the boundaries. In this case the search weaves backwards and forwards between the 

center and side of the corridor. The reduced probability of success at multidimensional 

edges is compensated by the fact that with a uniform probability of occupation over the 

cross section of the corridor, the space that counts as near to the edges represents a very 

small fraction of the total. Provided that the success rate is obtained over long enough 

periods the 1=5 success rule does not lead to permanent reduction of the variances but to 

a constant near optimal step size (it really uctuates) that depends only on the width of 

the corridor and the number of variables. 

The situation is happier than in Figure 6.17 if the constraints are given explicitly as 

xi ai or xi bi 

For anyonevariable, the region of success at a boundary is reduced by one half. If at some 

position m variables are each bounded on one side, then on average it costs 2 m mutations 

before one lands within the feasible region. Here again, the 1=5 success rule for m >2 

will continuously reduce the variances until they reach their minimum value. Depending 

on the route chosen by the search process the limiting values of the variances, which are 

individually adjustable for each variable, will be reached at di erent times. Their relative 

values thereby alter, and with the new combination of step lengths the convergence can 

be faster. 

The extra exibility of the multimembered evolution strategy with recombination, 

in which thevariances of the changes in the variables are individually adaptable during


the whole of the optimization process, is a very clear advantage in solving constrained 

problems. Suitable combinations of variances are set up in this case before the smallest 

possible step lengths are reached. Thus the total computation time is reduced and the 

nal accuracy is better. The recombination option also appears to have a bene cial e ect 

at boundaries that are not explicit it clearly improves the chance that descendants, even 

with a larger step size, will be successful near the boundary. Inany case the population 

clusters more slowly together than when there is no recombination. 

Global Convergence Properties 

Among the 50 test problems there are 8 having at least a second local minimum besides 

the global one. In the reliability test, the accuracy achieved was only assessed with respect 

to the particular optimum that was being approximated. What now is the capability of 

each strategy for locating global minima? Several problems were speci cally designed to 

investigate this question by having very many local optima, namely Problems 2.3, 2.26, 

2.30, and 2.44. In Table 6.9 this aspect of the test results is evaluated. 

Except for one problem (Problem 2.32), whose global minimum was found by all the 

strategies under test, the method of Rosenbrock onlyconverged to local optima. The 

complex method and the (1+1) evolution strategy were only better in one case: namely, 

in Problem 2.45 they both approached the global minimum. 

Table 6.9: Results of all strategies in the second comparison test: 

global convergence properties 

Problem F G L H D D P D S R C E G R 

I O A O S S O F I O O V R E 

B L G J C C W P M S M O U K 

No. O D R E G P E S P E P L P O 

2.3 L1 L1 L3 L1 L7 L7 L1 L3 L1 L6 L1 Lm G G 

2.36 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 G G 

2.30 L4 L1 Lm G G 

2.32 G G G G G 

2.44 L1 L1 L1 G G 

2.45 L G G G G 

2.47 L3 L1 L2 G G 

2.48 L2 Lm Lm GL GL 

Meaning of symbols: 

L Search converges to local minimum. 

L3 Search converges to the 3rd local minimum (in order of decreasing objective 

function values). 

Lm Search converges to various local minima depending on the random numbers. 

G Search converges to global minimum. 

GL Search converges to local or global minimum depending on the random numbers.


The multimembered evolution strategy displays much better global convergence properties, 

with or without recombination. Although its actual path through the space was 

determined bychance, it always found its way to the absolute minimum. Only in Problem 

2.48 was the global optimum not always approached. In this case the feasible region is 

not simply connected: Between the starting point and the global minimum there is no 

connecting line that does not pass through the region excluded by constraints. The path 

of the simple evolution strategy and the initial condition of the complex method are also 

both dependent on the particular sequence of pseudorandom numbers however, the main 

di erence between the results of three trials in each casewas simply that di erent local 

minima were approached. In one case the (1+1) evolution rejected 33 local optima only 

to converge at the 34th (Problem 2.3). 

In spite of the good convergence properties of the multimembered evolution manifested 

in the tests, a certain measure of scepticism is called for. If the search is started with only 

small step lengths in the neighborhood of a local minimum, while the global minimum 

is far removed and is surrounded by only a relatively small region with small objective 

function values, then the probability of getting there can be very small. 

If in addition there are very many variables, so that the step sizes of the mutations 

are small compared to the Euclidean distance between two points in IR n , the search for 

a global optimum among many local optima is like the proverbial search for a needle in 

ahaystack. Locating singular minima, even with only a few variables, is a practically 

hopeless task. Although the multimemberedevolution increases the probability of nding 

global minima compared to other methods, it cannot guarantee to do so because of its 

basically sequential character. 

6.3.3.3 Third Test: Non-Quadratic Problems with Many Variables 

In the rst series of tests weinvestigated the rates of convergence for a quadratic objective 

function, and in the second the reliability of convergence for the general non-linear case. 

The aim of the third test is now to study the computational e ort required for nonquadratic 

problems. Because of their small number of variables, the problems of the second 

test series appear unsuitable for this purpose, as rates of convergence and computation 

times are only of interest in relation to the number of variables. The construction of 

non-quadratic objective functions of a type that can also be extended to an arbitrary 

number of variables is not a trivial problem. Another reason, however, for this third 

strategy comparison being restricted to only 10 di erent problems is that it required a 

large amount of computation time. In some cases CPU times of several hours were needed 

to test just one strategy on one problem with a particular number of variables. Seven 

of the problems are unconstrained and three have constraints. Appendix A, Section A.3 

contains the mathematical formulation of the problems together with their solutions. 

The procedure followed was the same as in the rst test. Besides the termination 

criterion speci c to each strategy, which demanded maximum accuracy, a further convergence 

criterion was applied in common to all strategies. According to the latter the 

search was to be ended when a speci ed distance had been covered from the starting point 

towards the minimum. The number of variables was varied up to the maximum allowed


by the storage capacity, taking the values 3, 10, 30, 100, 300, and 1000. Of course, if a 

problem with, for example, 30 variables could not be solved by a strategy, or if no result 

was forthcoming at the end of the maximum computation time of 8 hours, the number of 

variables was not increased any further. 

As in the rst test, the initial conditions were speci ed by 

x (0) 

i 

Two exceptions are Problem 3.3 with 

= x i + (;1)i 

p n i = 1(1)n 

x (0) 

i = x i + (;1)i 

10 p n 

to ensure that the search always converged to the desired minimum and not to one of the 

many others of equal value and Problem 3.10 with 

x (0) 

i = x i + 1 

p n 

to start the search within the feasible region. Problems 3.8 and 3.9, whose minima are at 

in nity, required special treatment of the starting point and termination conditions (see 

Appendix A, Sect. A.3). 

The results are presented in Table 6.10. For comparison, some of the results of the 

rst test (Problem 1.1) are also displayed. The numbers enable one to assess critically 

on the one hand the reliability of a strategy and on the other the computation times it 

requires.


Table 6.10: Results of all strategies in the third comparison test 

The following notation is used in the tables: 

n: Number of variables 

Case: A label for the convergence behavior, taking the values: 

1 Normal end of search required approximation to the objective 

was achieved. 

2 The search was ended before reaching the desired accuracy. 

3 The search became unending without converging it had to be 

terminated externally. 

4 The maximum computation time of 8 hours was insu cient 

to end the search successfully (occasionally more computation 

time was invested in trials with the multimembered evolution 

strategy that promised to be successful). 

- No trial was undertaken. 

1(2) Depending on the sequence of random numbers various cases 

occurred the entries in the table refer to the rst case de ned. 

OFC: Number of objective function calls. 

CFC: Number of constraint function calls. 

Time: Computation time in seconds (CPU time). 

Iterations, cycles, exploratory cycles, line searches, orthogonalizations, restarts, 

etc., were counted as in the rst comparison test. 

Fatal execution errors were only registered in the Powell and DFPS methods 

and it is not further speci ed here in which problems they occurred. As a rule 

the same types of problem were involved as in the second test. 

In unconstrained problems no numbers are tabulated for the number of 

objective function calls made by theevolution strategies. This can be 

calculated from the number of mutations or generations as follows: 

EVOL: 1+number of mutations 

GRUP, REKO: 10 + 100 times number of generations 

(continued)


3870 

7017 

7013 

5207 

5207 

4849 

2501 

4808 

4790 

3695 

3695 

2509 

27.6 

101.0 

91.2 

42.4 

53.1 

52.5 

17.3 

66.3 

64.4 

30.1 

35.3 

25.5 

1 

2 

2 

1 

1 

2 

1 

2 

2 

1 

1 

1 

1 

1 

1 

- 

1 

2 

1 

1 

1 

1 

1 

- 

1 

2 

1 

1 

1 

1 

1 

- 

1 

2 

1 

1 

1 

1 

1 

- 

1 

2 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

2 

38703 

35300 

35303 

49132 

49132 

47767 

24001 

26415 

26332 

35247 

35247 

42993 

2670 

4740 

4580 

3680 

4750 

6720 

1620 

3600 

3500 

2600 

3240 

7890 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

10562 

11244 

11218 

15111 

15111 

14633 

7201 

8073 

8046 

11270 

11270 

7204 

221 

430 

439 

332 

440 

476 

153 

345 

345 

291 

319 

240 

2 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

3 

10 

4 

1 

1 

1 

2 

1 

2 

10 

4 

1 

1 

1 

1 

158 

415 

1250 

630 

192 

85 

192 

183 

85 

198 

745 

321 

140 

82 

138 

87 

0.10 

0.34 

0.84 

3.96 

0.12 

0.08 

0.14 

0.18 

0.04 

0.16 

0.56 

2.08 

0.10 

0.06 

0.12 

0.06 

1 

1 

1 

2 

1 

2 

1 

1 

1 

1 

1 

2 

1 

2 

1 

1 

1 

5 

9 

9 

1 

1 

1 

1 

5 

9 

24 

1 

1 

1 

458 

1993 

3543 

2143 

589 

589 

296 

271 

1285 

2142 

3178 

436 

437 

264 

0.51 

3.38 

5.54 

136.00 

0.68 

1.10 

0.48 

0.30 

2.22 

3.52 

199.00 

0.56 

0.64 

0.42 

1 

1 

1 

2 

1 

2 

1 

1 

1 

1 

1 

2 

1 

2 

1 

1 

1 

4 

6 

1 

1 

1 

1 

4 

6 

1 

1 

1 

1242 

4441 

6347 

1715 

1715 

816 

781 

2816 

4012 

1274 

1274 

785 

3.14 

18.70 

28.60 

4.68 

6.98 

3.00 

1.80 

12.00 

17.60 

3.20 

4.62 

2.80 

1 

1 

1 

- 

1 

2 

1 

1 

1 

1 

1 

- 

1 

2 

1 

1 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

1 

1 

1 

1 

1 

2 

1 

1 

1 

3 

10 

4 

1 

1 

1 

2 

158 

415 

1250 

630 

192 

85 

192 

183 

0.13 

0.38 

1.04 

4.26 

0.14 

0.08 

0.18 

0.18 

1 

1 

1 

2 

1 

2 

1 

1 

1 

5 

9 

18 

1 

1 

1 

456 

1997 

3470 

4381 

567 

567 

296 

0.53 

4.12 

5.90 

286.00 

0.72 

1.10 

0.48 

1 

1 

1 

2 

1 

2 

1 

1 

1 

4 

6 

1 

1 

1 

1242 

4270 

6472 

1709 

1709 

816 

3.07 

18.90 

27.50 

4.68 

2.92 

6.92 

1 

1 

1 

- 

1 

2 

1 

1 

3870 

7041 

7139 

5193 

5193 

4818 

26.5 

93.5 

94.5 

40.0 

53.9 

52.3 

1 

1 

1 

- 

1 

2 

1 

1 

1 

1 

1 

1 

1 

2 

10562 

11244 

11218 

15127 

15127 

14634 

210 

434 

435 

334 

428 

478 

1 

1 

1 

- 

1 

2 

1 

1 

1 

1 

1 

1 

1 

2 

38701 

35300 

35303 

49255 

49255 

47799 

2500 

4590 

4730 

3670 

4590 

6770 

1 

2 

2 

1 

1 

2 

Probl. 

case 

cycles 

OFC 

time 

case 

cycles 

OFC 

time 

case 

cycles 

OFC 

time 

case 

cycles 

OFC 

time 

case 

cycles 

OFC 

time 

case 

cycles 

OFC 

time 

n = 3 n = 10 n = 30 n = 100 n = 300 n = 1000 

Table 6.10 continued : Coordinate strategies FIBO, GOLD, LAGR (from top to bottom)


Table 6.10 continued : HOJE - Direct search of Hooke and Jeeves 

time 

OFC 

cycles 

case 

time 

OFC 

cycles 

case 

time 

OFC 

cycles 

case 

time 

OFC 

cycles 

case 

time 

OFC 

cycles 

case 

time 

OFC 

cycles 

case 

Probl. 

n = 3 n = 10 n = 30 n = 100 n = 300 n = 1000 

1460 

5710 

23505 

42004 

12 

21 

21 

1 

1 

100 

784 

4954 

18612 

9 

33 

32 

1 

1 

2.37 

4.82 

352 

352 

2 

2 

1 

1 

0.43 

0.74 

168 

168 

3 

3 

3 

1 

1 

0.06 

0.08 

48 

48 

3 

3 

3 

1 

1 

0.02 

0.02 

20 

20 

4 

4 

1 

1 

1.1 

3.1 

5440 

42004 

1 

- 

758 

17714 

1 

- 

4.78 

352 

2 

1 

- 

168 0.74 

130 7493 4210.00 

3 168 0.48 

1 

1 

0.10 

48 

1 

1 

0.02 

0.12 

20 

19 

4 

3 

1 

1 

3.2 

3.3 

1700 

23505 

12 

119 

4954 

9 

2.86 

352 

16.70 

0.06 

237 

48 

12 

3 

1 

2 

1 

2 

2 

1 

2 

1 

2 

1 

2 

0.02 

0.18 

20 

4 

1 

2 

3.4 

3.5 

0.70 

0.62 

151 

20 

2160 

2410 

23505 

23505 

12 

12 

1 

1 

153 

137 

4954 

4954 

9 

2 

1 

1 

3.72 

3.68 

352 

352 

2 

2 

1 

1 

168 

168 

3 

3 

1 

1 

0.08 

0.08 

48 

48 

3 

3 

1 

1 

0.02 

0.02 

20 

25 

4 

4 

1 

1 

3.6 

3.7 

ROSE - Rosenbrock method with Gram-Schmidt orthogonalization 

time 

CFC 

OFC 

orth. 

case 

time 

CFC 

OFC 

orth. 

case 

time 

CFC 

OFC 

orth. 

case 

time 

CFC 

OFC 

orth. 

case 

Probl. 

(max) 

n = 75 

n = 30 

n = 10 

n = 3 

145 

899 

1 

1 

1 

1.18 

121 

0 

3 

3 

1 

1 

1 

0.91 

120 

2 

3 

4 

1 

1 

1 

0.08 

27 

1 

3 

5 

3 

2 

2 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1.1 

242 

342 

2352 

3879 

2 

3 

4 

18.70 

19.10 

575 

575 

1.36 

2.10 

161 

282 

0.14 

0.30 

41 

98 

3.1 

3.2 

- 

1 

1690.00 

3077 

121 

8 

0 

1 

1 

5.02 

1.98 

73 

1 

4 

1 

1 

0.40 

0.14 

45 

44 

3.3 

3.4 

4660 

728 

29 0.12 1 3 

2.02 1 0 121 

1 

44 0.14 1 4 279 2.06 1 0 121 

1.32 1 47 83059 4830 

29 0.10 1 2 152 1.10 1 0 121 

1.32 1 3 1871 

236 

1 1 28 101 0.16 1 1 91 1128 1.60 1 1 241 7861 17.30 1 0 226 33448 85 

1 1 28 28 0.10 1 6 427 309 3.46 1 1 512 215 9.76 1 2 2833 969 194 

83059 

7337 

47 

9 

1.20 

1.36 

279 

295 

3.5 

3.6 

3.7 

3.8 

3.9 

- 

1 8 268 367 1.16 2 12 2953 9766 30.50 2 

3.10


Table 6.10 continued : DSCG - Davies-Swann-Campey method with Gram-Schmidt orthonormalization 

time 

OFC 

lin. search 

orth. 

case 

time 

OFC 

lin. search 

orth. 

case 

time 

OFC 

lin. search 

orth. 

case 

time 

OFC 

lin. search 

orth. 

case 

Probl. 

n = 75 (max) 

n = 30 

n = 10 

n = 3 

6.10 

338 

75 

0 

1 

1 

1 

1 

1 

- 

1 

1 

1 

1 

1.18 

136 

30 

0 

1 

3 

28 

0 

0 

0 

0 

1 

1 

1 

2 

1 

1 

1 

1 

0.20 

56 

10 

0 

2 

3 

8 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

0.04 

20 

0 

1 

3 

2 

0 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1.1 

78.40 

563 

150 

7.58 

321 

91 

0.86 

119 

30 

0.08 

32 

3 

6 

3.1 

79.00 

563 

150 

18.70 

2030.00 

1.18 

1.58 

1.48 

1.26 

535 

3366 

136 

6.68 

78.20 

7.08 

75 

150 

75 

150 

0 

1 

0 

1 

165 

151 

1087 

30 

30 

30 

30 

1.12 

26.00 

0.20 

0.20 

0.22 

0.22 

147 

377 

56 

47 

56 

56 

40 

112 

10 

10 

10 

10 

0.12 

0.28 

0.06 

0.08 

0.06 

0.10 

48 

35 

20 

30 

20 

30 

12 

9 

3 

6 

3.2 

3.3 

3.4 

3.5 

3.6 

76.90 

338 

490 

338 

564 

136 

136 

3 

6 

3.7 

DSCP - Davies-Swann-Campey method with Palmer orthonormalization 

time 

OFC 

lin. search 

orth. 

case 

time 

OFC 

lin. search 

orth. 

case 

time 

OFC 

lin. search 

orth. 

case 

time 

OFC 

lin. search 

orth. 

case 

time 

OFC 

lin. search 

orth. 

case 

Probl. 

n = 95 (max) 

n = 75 (max) 

n = 30 

n = 10 

n = 3 

9.49 

448 

95 

6.10 

338 

75 

0 

1 

1 

1 

1 

1 

- 

1 

1 

1 

1 

1.16 

136 

30 

0 

1 

1 

1 

1 

1 

1 

1 

1 

0.22 

56 

10 

0 

2 

3 

8 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

0.04 

20 

3 

6 

12 

9 

3 

6 

3 

6 

0 

1 

3 

2 

0 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1.1 

11.90 

23.20 

428 

713 

95 

190 

0 

0 

1 

1 

1 

1 

- 

1 

1 

1 

1 

14.90 

14.80 

563 

563 

150 

150 

3.58 

6.56 

321 

527 

91 

151 

932 

30 

30 

30 

30 

1 

3 

0.58 

0.72 

119 

147 

30 

40 

0.08 

0.12 

32 

48 

3.1 

3.2 

1670.00 

9.96 

9.22 

11.1 

11.2 

428 

334 

428 

428 

95 

95 

95 

95 

0 

0 

0 

0 

6.36 

12.90 

7.18 

75 

150 

75 

0 

1 

0 

1 

1.28 

1.60 

1.36 

1.26 

2924 

136 

165 

136 

136 

28 

0 

0 

13.70 

338 

490 

338 

564 

150 

0 

0 

25.50 

0.20 

0.20 

0.22 

0.22 

383 

56 

47 

56 

56 

112 

10 

10 

10 

10 

0.28 

0.06 

0.08 

0.06 

0.08 

35 

20 

31 

20 

30 

3.3 

3.4 

3.5 

3.6 

3.7


Table 6.10 continued : POWE - Powell’s method of conjugate directions 

time 

OFC 

lin. search 

iterations 

case 

time 

OFC 

lin. search 

iterations 

case 

time 

OFC 

lin. search 

iterations 

case 

time 

OFC 

lin. search 

iterations 

case 

time 

OFC 

lin. search 

iterations 

case 

Probl. 

n = 135 (max) 

n = 100 

n = 30 

n = 10 

n = 3 

8.60 

407 

135 

1 

1 

1 

1 

1 

1 

- 

1 

2 

1 

3 

3.72 

202 

100 

1 

2 

2 

1 

1 

1 

3 

1 

2 

1 

3 

0.60 

92 

30 

1 

1 

1 

1 

1 

1 

2 

1 

3 

0.12 

32 

10 

1 

2 

3 

9 

2 

1 

1 

1 

1 

1 

2 

1 

3 

0.02 

11 

3 

7 

11 

7 

3 

3 

3 

13 

1 

2 

3 

2 

1 

1 

1 

4 

1 

1 

1 

1 

1 

2 

1 

1 

1.1 

15.50 

15.50 

541 

541 

135 

135 

15.10 

15.00 

701 

701 

200 

200 

304 2.50 

288 2.32 

1963 1100.00 

152 0.94 

91 

91 

3 

3 

23 

1 

0.36 

0.46 

21 

32 

0.08 

0.12 

32 

38 

3.1 

3.2 

16.20 

811 

135 

1 

7.82 

502 

100 

1 

700 

30 

16.90 

0.44 

84 

112 

257 

128 

96 

20 

19.00 

811 

135 

1 

8.74 

502 

100 

1 

1.14 

152 

30 

1 

0.48 

128 

20 

2 

0.18 

0.06 

0.14 

0.06 

0.46 

23 

20 

55 

20 

166 

3.3 

3.4 

3.5 

3.6 

3.7 

DFPS - Stewart’s modification of the Davidon-Fletcher-Powell method 

time 

OFC 

iterations 

case 

time 

OFC 

iterations 

case 

time 

OFC 

iterations 

case 

time 

OFC 

iterations 

case 

time 

OFC 

iterations 

case 

Probl. 

n = 180 (max) 

n = 100 

n = 30 

n = 10 

n = 3 

9.56 

364 

3.19 

204 

0.32 

64 

1 

1 

1 

1 

1 

1 

2 

1 

2 

0.06 

24 

1 

3 

4 

4 

1 

1 

1 

1 

1 

1 

2 

1 

1 

0.02 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1.1 

84.40 

1274 

22.00 

1 

5 

5 

1 

1 

1 

- 

1 

2 

1 

2 

1.82 

4 

4 

45 

1 

0.20 

0.06 

3 

4 

2 

3.1 

84.20 

1274 

1 

6 

6 

23.10 

612 

612 

10.20 

364 

1 

1 

1 

1 

- 

1 

2 

1 

2 

3.34 

204 

1 

1.78 

932.00 

0.34 

160 

160 

1640 

64 

0.30 

3.96 

0.06 

48 

65 

61 

24 

0.06 

0.12 

0.02 

0.02 

0.02 

0.26 

10 

20 

25 

15 

10 

11 

10 

101 

1 

1 

11.80 

364 

1 

3.92 

204 

1 

0.40 

7.84 

64 

1 

0.08 

1.42 

24 

780 

15 

307 

1 

16 

1 

15 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7


Table 6.10 continued : SIMP - Simplex method of Nelder and Mead (with restart rule) 

time 

OFC 

restarts 

case 

time 

OFC 

restarts 

case 

time 

OFC 

restarts 

case 

time 

OFC 

restarts 

case 

time 

OFC 

restarts 

case 

Probl. 

n = 135 (max) 

n = 100 

n = 30 

n = 10 

n = 3 

5270 

1 5142 

1 

4 

4 

- 

- 

- 

862 

1789 

0 

1 

0 

1 

1 

1 

- 

37.4 

664 

0 

1 

1.49 

138 

0 

1 

1 

1 

1 

1 

1 

0.09 

28 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1.1 

25800 

6190 

42099 

10082 

1.56 1 0 1514 95.2 

3.54 1 0 1277 79.7 

12.10 1 1 12936 8130.0 

1.62 1 7 17357 1100.0 

206.00 2 547 101142 4030.0 

1.68 1 7 17357 1140.0 

39.10 1 181 21387 1020.0 

152 

300 

163 

163 

22362 

163 

3968 

0 

0 

0 

0 

185 

0 

48 

0.10 

0.12 

34 

40 

3.1 

3.2 

4 

- 

0.28 

0.08 

0.08 

32 

25 

32 

3.3 

3.4 

3.5 

- 

- 

4 

4 

1 

1 

0.08 

0.08 

25 

28 

1 

1 

3.6 

3.7 

COMP - Complex method of Box ( no. of vertices = 2n ) 

time 

OFC 

case 

time 

CFC 

OFC 

case 

time 

CFC 

OFC 

case 

time 

CFC 

OFC 

case 

Probl. 

n = 95 (max) 

n = 30 

n = 10 

n = 3 

11000 

13100 

14902 

17397 

13300 

17431 

32196 25300 

1.1 1 69 0.22 1 535 6.72 1 2621 211 1 

3.1 1 76 0.28 1 691 9.30 1 2770 247 1 

3.2 1 83 0.28 1 527 6.99 1 3092 259 1 

3.3 1 58 0.55 1 966 74.30 2(4) 19407 12100 - 

3.4 1 74 0.24 1(2) 529 6.71 1(2) 4722 402 4 

3.5 1 57 0.23 1 1266 17.20 2(4) 16936 1450 4 

3.6 1 85 0.32 1 556 7.34 1(2) 9011 781 4 

3.7 1 76 0.30 1 587 7.68 1 9537 840 1 

3.8 1 40 186 0.33 1 207 4680 8.32 1 1041 75867 240 4 

3.9 1 35 44 0.19 1 384 600 8.46 2 2622 4508 357 - 

3.10 1 33 208 0.25 1(4) 486 8919 12.10 4 -


Table 6.10 continued : EVOL - (1+1) evolution strategy 

time 

CFC 

OFC 

mutations 

case 

time 

CFC 

OFC 

mutations 

case 

time 

CFC 

OFC 

mutations 

case 

Probl. 

n = 30 

n = 3 n = 10 

1 630 

14.0 

1 730 

16.6 

1 1041 

26.5 

1 7103 

4060.0 

1 10939 

244.0 

2 

1 15769 

365.0 

1 1154 26.7 

1 668 300 24629 46.2 

1 1435 579 1436 34.5 

4 

1 224 

1.74 

1 221 

1.88 

1 537 

4.46 

1 2244 

156.00 

1 257 

2.18 

2 200 

1.82 

1 325 

2.87 

1 206 1.74 

1 319 136 3980 5.80 

1 388 148 389 3.33 

1 59757 39076 802511 824.00 

1 49 

0.17 

1 67 

0.22 

1 179 

0.52 

1 61 

0.50 

1 66 

0.20 

1 74 

0.36 

1 89 

0.32 

1 63 0.20 

1 99 45 301 0.54 

1 78 31 79 0.29 

1 925 592 3913 4.06 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

3.10 

time 

CFC 

OFC 

mutations 

case 

time 

CFC 

OFC 

mutations 

case 

time 

CFC 

OFC 

mutations 

case 

Probl. 

n = 100 n = 300 n = 1000 

15600 

17100 

17200 

23607 

23374 

23819 

1 

1 

1 

- 

- 

2 

- 

4 

- 

- 

- 

1 6666 

1310 

1 7638 

1700 

1 6916 

1480 

- 

4 

2 

4 

1 14818 3060 

1 3161 3129 1883309 14100 

1 13035 5374 

13036 2830 

- 

1 2192 

149 

1 2185 

164 

1 2208 

188 

- 

1 47720 

3320 

2 

1 37985 

2710 

1 5145 389 

1 1696 998 255392 803 

1 4633 1870 4634 336 

- 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

3.10


Table 6.10 continued : GRUP - (10,100) evolution strategy 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

Probl. 

n = 30 

n = 3 n = 10 

1 55 

145 

1 77 

201 

1 67 

194 

1 1462 

82100 

1(2) 848 

2140 

2 481 1300 

1 446 

1130 

1 80 200 

1 51 2983 226805 390 

1 96 4801 9610 239 

4 

1 17 

16.8 

1 20 

20.3 

1 25 

24.1 

1 23 

164.0 

1 18 

20.9 

1 21 

21.8 

1 19 

19.6 

1 20 20.8 

1 25 1355 33663 48.0 

1 37 2067 3710 35.4 

1 3707 230227 4745318 5360 

1 4 

1.81 

1 7 

3.18 

1 4 

1.56 

1 5 

5.02 

1 3 

1.44 

1 4 

1.90 

1 5 

2.26 

1 5 2.08 

1 9 540 2940 5.19 

1 6 414 610 2.68 

1 103 6937 45174 58.0 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

3.10 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

Probl. 

n = 100 n = 300 n = 435 (max) 

28300 

30400 

33100 

854 

865 

940 

1 

1 

1 

- 

- 

- 

- 

- 

- 

- 

1 551 

13600 

1 677 

16400 

1 604 

14600 

- 

- 

- 

- 

1 1655 38100 

4 

1 986 47168 

98673 23200 

- 

1 194 

1600 

1 213 

1730 

1 199 

1660 

- 

2 1653 

13200 

- 

2 1650 

13200 

1 473 3790 

1 109 5925 1614891 5090 

1 329 15941 32915 2620 

- 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

3.10


Table 6.10 continued : REKO - (10,100) evolution strategy with recombination 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

Probl. 

n = 30 

n = 3 n = 10 

1 34 

177 

1 32 

170 

1 35 

198 

1 1365 

79700 

1 28 

152 

1(2) 213 1120 

1 29 

148 

1 41 209 

1 45 3920 242482 526 

1 92 6321 9210 487 

4 

1 13 

23.3 

1 15 

31.3 

1 11 

19.8 

1 14 

122.0 

1 15 

26.3 

1 19 

35.7 

1 12 

21.5 

1 17 30.4 

1 20 1410 30006 60.5 

1 28 1978 2810 53.9 

1 456 27875 586357 1050.0 

1 4 

2.67 

1 5 

3.56 

1 6 

4.32 

1 1 

1.40 

1 3 

2.06 

1 5 

3.43 

1 3 

2.22 

1 5 3.60 

1 10 584 3549 8.25 

1 10 638 1010 7.09 

1 8 462 3321 6.88 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

3.10 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

time 

CFC 

OFC 

generations 

case 

Probl. 

n = 100 n = 300 n = 435 (max) 

21100 

22700 

19400 

9340 1 289 

10800 1 305 

10800 1 257 

- 

10800 

1 305 

- 

10600 1 307 

20800 - 

- 

25200 - 

- 

49410 

36708 

1 180 

1 206 

1 212 

- 

1 210 

- 

1 205 

1 399 

4 

1 494 

- 

1420 

1420 

1350 

21900 

1250 

28200 

23000 

1420 

2370 

9080 

3110 

2529790 

18610 

1 84 

1 80 

1 77 

- 

1 77 

2 1653 

1 82 

1 137 

1 128 12354 

1 186 13547 

- 

1.1 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

3.10


With only three variables, nearly all the problems were solved perfectly by all strategies 

i.e., the required approximation to the objective was achieved. The only exception 

is Problem 3.5, which ended in failure for the coordinate strategies, the method of Hooke 

and Jeeves, and the methods of Powell and of Davidon-Fletcher-Powell-Stewart. In apparent 

contradiction to this, the corresponding Problem 2.21 for n =5was satisfactorily 

solved by the Hooke-Jeeves strategy and the DFPS method. The causes are to be found 

in the di erent initial values of the variables. With the variable metric method, fatal 

execution errors occurred in both cases. 

If there are 10 or more variables, even the two membered evolution strategy does not 

nd the minimum in Problem 3.5, due to the extremely unfavorable starting point. The 

probability of making from there a rst step with a lower objective function value is 2 ;n . 

Thus with manyvariables, the termination condition is usually met before a single success 

has been scored. The simplex method of Nelder and Mead with n = 10 took 185 restarts 

to reach the desired approximation to the objective. For more than 10 parameters the 

solution can no longer be su ciently well approximated in spite of an increasing number 

of restarts. With stricter accuracy requirements the simplex method fails much sooner 

(Problem 2.21 with n =5). 

The complex strategy likewise was no longer able to solve the same problem for n 

30. Depending on the sequence of random numbers it either ended the search before 

achieving the required accuracy, oritwas still far from the minimum when the allowed 

computation time (8 hours) expired. The multimembered evolution strategy also proved 

to be dependent, although less strongly, on the particular sequence of random numbers. 

The version without recombination failed on Problem 3.5 for n 30 with recombination 

it failed for n 100. Without recombination and for n 100 it ended the minimum 

search prematurely also in Problems 3.4 and 3.6. The simplex and complex methods had 

convergence di culties with both types of objective function, usually even for only a few 

variables. Several times they had to be interrupted because of exceeding the time limit. 

Further details can be found in the tables and Appendix A, Section A.3. 

The search for the minima in Problems 3.4 and 3.6 presents no di culties to the 

coordinate strategies, and the methods of Hooke and Jeeves, Rosenbrock, Davies-Swann- 

Campey, Powell, and Davidon-Fletcher-Powell-Stewart. The three rotating coordinate 

strategies are the only ones that manage to solve Problem 3.5 satisfactorily for any number 

of variables. Nevertheless it would be hasty to conclude that these methods are 

therefore clearly better than the others an attempt to analyze the reasons for their success 

reveals that only slight changes in the objective functions are enough to undermine 

their apparently advantageous way ofworking. 

The signi cant di erence in this respect between the above group of strategies and 

the others (complex, simplex, and evolution strategies) is that the former operate with 

amuch more limited set of search directions than the latter. There are usually only n 

directions, e.g., the n coordinate directions of the axes-parallel search methods, compared 

to an in nite number (in principle) in the evolution methods. In the case of Problems 3.4 

to 3.6 the most favorable search directions are the n directions of the unit vectors. All 

methods with one dimensional minimizations use precisely these directions in their rst 

iteration cycle, so they do not usually require any further iterations to achieve the required


accuracy. Bykeeping the starting conditions the same but rotating the coordinates with 

respect to the contours of the objective function (Problem 3.6), or slightly tilting the 

contours with respect to the coordinate axes (Problem 3.5), or both together (Problem 

3.4), one could easily cause all the line searches to fail. On the other hand the strategies 

without line searches would not be impaired by these changes. Thus the advantage of 

selected directions can turn into a disadvantage. These coordinated strategies can never 

solve the problem referred to, whereas, as we have seen, the strategies that have a large 

set of search directions at their disposal only fail when a particular number of variables 

is exceeded. Problems 3.4 and 3.6 are therefore suitable for assessing the reliability of 

simplex, complex, and evolution strategies, but not for the other methods. Together they 

belong to the type of problems which Himmelblau designates as \pathological." 

Leaning more to the conservative side are the several times continuously di erentiable 

objective functions of Problems 3.1, 3.2, 3.3, and 3.7. The rst two problems were tackled 

successfully by all the strategies for any number of variables. The simplex method did, 

however, need at least one restart for Problem 3.1 with n 100. For 135 variables it 

exceeded the time limit before Problems 3.1 and 3.2 were solved to su cient accuracy. 

Problem 3.3 gave trouble to several search procedures when there were 10 or more 

variables. The coordinate strategies were the rst to fail. For only n = 10, the step 

lengths of the line searches would have had to be smaller than allowed by thenumber 

precision of the computer used. At n = 30, the DSC strategy with Gram-Schmidt 

orthogonalization also ends without having located the minimum accurately enough. The 

simplex method with one restart still found the solution for n = 30, but the complex 

strategy failed here, either by premature termination of the search orbyreaching the 

maximum permitted computation time. Problem 3.3, because the cost per objective 

function evaluation increases as O(n 2 ), requires the longest computation times for its 

solution. Since the objective function also took O(n 2 ) units of storage, this problem could 

not be used for more than 30 variables. 

Problem 3.7, like the analogous Problem 2.31, gave trouble to the two quadratically 

convergent strategies. The method of Powell was only successful for n = 3. For more 

variables it became stuck in the search process without the termination rule taking e ect. 

The variable metric strategy behaved in just the same way. For n 30, it no longer 

came as near as required to the optimum. Under the stricter conditions of the second set 

of tests it failed already at n = 5. With both methods fatal execution errors occurred 

during the search. No other direct search strategies had any di culty with Problem 3.7, 

which is a simple 10th order polynomial. Only the simplex method would not have found 

the solution su ciently accurately without the restart rule. For n = 100, it reached the 

time limit before the search simplex had collapsed for the rst time. 

The advantage shown by the complex strategy was due to the complex's having 2 n 

vertices, which is almost twice as many asthen + 1 of the simplex. An attempt to solve 

Problems 3.1 to 3.10 for n = 30 with a complex constructed of 40 points failed completely. 

The search ended, in every case, without having reached the required accuracy. 

How do the computation times compare when the problems are no longer only quadratically 

non-linear? For solving the \pathological" Problems 3.4 to 3.6 all the methods with 

a line search take about the same times, with the same number of variables, as they do


for solving the simple quadratic Problem 1.1, if indeed they actually can nd a solution. 

With any of the remaining methods the computation times increase somewhat more 

steeply with the number of variables, up to the limiting number beyond whichconvergence 

cannot be guaranteed in every case. 

The solution times for Problems 3.1 and 3.2 usually turn out to be several times 

greater than those for Problem 1.1. The cost of the coordinate strategies is up to 1000% 

more for a few variables, which reduces to 100% as the number of variables increases. As 

in the case of Problem 1.1, the solution times for Problems 3.1 and 3.2 using the method 

of Hooke and Jeeves increase somewhat faster than the square of the number of variables. 

For very many variables 250% more computation time is required. 

For n 30, the Rosenbrock method requires 70% to 250% more time (depending on 

the number of orthogonalizations) for the rst two problems of the third set of tests than 

for the simple quadratic problem. The computation time still increases as O(n 3 )inall 

cases because of the costly procedure of rotating the coordinates. For example, for n =75, 

up to 90% of the total time is taken up by the orthogonalizations. The DSC strategies 

reached the desired accuracy in Problem 1.1 without orthogonalizations. Since solving 

Problems 3.1 and 3.2 requires more than n line searches in each case, the computation 

times di er signi cantly, depending on the chosen method of orthogonalization. Palmer's 

program holds down the increase in the computation times to O(n 2 ) whereas the Gram- 

Schmidt method leads to an O(n 3 ) increase. It therefore is not meaningful to quote the 

extra cost as a percentage with respect to Problem 1.1. In the extreme case instead of 6 

seconds at n = 75 the procedure took nearly 80 seconds. 

The method of Powell requires two to four times as much time, depending on whether 

one or two extra iterations are needed. However, even for the same number of iterations, 

i.e., also with the same number of line searches (n = 135), the number of function calls 

in Problems 3.1 and 3.2 is greater than in Problem 1.1. The reason for this is that in the 

quadratic reference problem a simpli ed form of the parabolic interpolation can be used. 

The variable metric strategy, in order to solve thetwo non-quadratic problems (Problems 

3.1 and 3.2) with n = 180, requires about nine times as much computation time as for 

Problem 1.1. This factor increases with n since the number of gradient determinations 

increases gradually with n. 

The pattern of behavior of the simplex method of Nelder and Mead is very irregular. 

If the number of variables is small, the computation times for all three problems are about 

equal. However, for n = 100, Problem 3.2 requires about seven times as muchtimetobe 

solved as Problem 1.1 and, because of a restart, Problem 3.1 requires even thirty times 

as much. With n = 135, neither of the two non-quadratic problems can be solved within 

8 hours, whereas 1:5 hours are su cient for Problem 1.1. On the other hand the complex 

strategy requires only slightly more time, about 20%, than in the simple quadratic case, 

provided 2 n vertices are taken. The time taken by this method on the whole for all 

problems, however, exhibits the strongest rate and range of variation with the number of 

parameters. 

The evolution strategies prove to be completely una ected by the altered topology 

of the objective function as compared with the case of spherically symmetrical contour 

surfaces. Within the expected deviations, due to di erent sequences of random numbers,


the measured computation times for all three problems are equal. The results show that 

Rechenberg's (1973) theory of the rate of progress, which does not assume a quadratic 

objective function but merely concentric hypersphere contour surfaces, is valid over a 

wide range of conditions. Even more surprising, however, is the behavior of the (10 

, 100) evolution method with recombination in the solution of Problems 3.4 and 3.6, 

whose objective functions have discontinuous rst derivatives, i.e., their contour surfaces 

display sharp edges and corners. The mixing of the components of variables representing 

individuals on di erent sides of a discontinuity appears sometimes to have a kind of 

smoothing e ect. In any case it can be seen that the strategy with recombination needs 

no more computation time or objective function calls for Problems 3.4 and 3.6 than for 

Problems 1.1, 3.1, and 3.2. 

With all the methods under test, the computation times for solving Problem 3.7 are 

about twice as high as those measured in the simple quadratic case. Only the simplex 

method is signi cantly more demanding of time. Since the search simplex frequently 

collapses in on itself it must repeatedly be reinitialized. 

Since Problem 3.3 could only be tackled with 3, 10, and 30 variables it is not easy to 

analyze the resulting data. In addition, the dependence of the increase in di cultyonthe 

number of parameters is not so clear-cut in this problem. Nevertheless the results seem to 

indicate that at least the number of objective function calls, in many strategies, increases 

with n in a way similar to that in the pure quadratic Problem 1.2. Because an objective 

function evaluation takes about O(n 2 ) operations in Problem 3.3, the total cost generally 

increases as one higher power of n than in Problem 1.2. The cost of the variable metric 

strategy and both versions of the (10 , 100) evolution strategy seems to increase even more 

rapidly. In the latter case there is a suspicion that the chosen initial step lengths are too 

large for this problem when there are very many variables. Their reduction to a suitable 

size then takes a few additional generations. The twomembered evolution strategy, which 

is able to adjust unsuitable initial step lengths relatively quickly, needed about the same 

number of mutations for both Problems 1.2 and 3.3. Since only one experiment per 

strategy and number of variables was performed, the e ect of the particular sequence 

of random numbers on the recorded computation times is not known. The particularly 

advantageous behavior of the DFPS method on exactly quadratic objective functions is 

clearly wasted once the problem deviates from this model structure in fact it seems that 

the search process is appreciably held back byaninterpretation of the measured data in 

terms of an inappropriate internal model. 

So far we have only discussed the results for the seven unconstrained problems, since 

they were amenable to solution by all the search strategies. Problem 3.8, with constraints, 

corresponds to the second model function (corridor model) for which Rechenberg (1973) 

has obtained theoretically the rate of progress of the two membered evolution strategy 

with optimal adaptation of variances. According to his analysis, one expects a linear rate 

of convergence increasing with the width of the corridor and inversely proportional to 

the number of variables. The results of the third set of tests con rm that the number 

of mutations or generations increases linearly with n if the width of the corridor and 

the reference distance to be covered are held constant. The picture for the Rosenbrock 

strategy is as usual: the time consumption increases as O(n 3 ) again. The point atn =75


departs from the general trend of the others simply because no orthogonalizations were 

performed in this case. But the di erence is not dramatic, because the cost of testing the 

constraints is of the same order of magnitude as that of rotating the coordinates. The 

complex method takes computation times that initially increase somewhat more rapidly 

than O(n 3 ). This corresponds to a greater than linearly increasing number of objective 

function evaluations. As we have already seen in other problems, the increase becomes 

even steeper as the number of parameters increases. With n =95variables, the required 

distance was only partially covered within the maximum computation time. 

Problem 3.9 represents a modi cation of Problem 3.8 with respect to the constraints. 

In place of the (2 n ; 2) linear constraints, the corridor is bounded by a single non-linear 

boundary condition. The cost of testing the feasibility of an iteration point is thereby 

greatly reduced. The number of mutations or generations of the evolution strategies is 

higher than in Problem 3.8 but still increases as O(n) the computation times in contrast 

to Problem 3.8 only increase as O(n 2 ). The Rosenbrock method also has no di culty with 

this problem, although the necessary rotations of the coordinate system make the times 

of order O(n 3 ). The complex method could only solve Problem 3.9 for n =3upwards 

of n = 10 it no longer converged. 

The last problem, Problem 3.10, which also has inequality constraints, turned out 

to be extremely di cult for all the search methods in the test. The main problem is 

one of scaling. Convergence in the neighborhood of the minimum can be achieved if, and 

practically only if, the step lengths in the coordinate directions are individually adjustable. 

They have to di er from each other by several powers of 10. For n = 30, no strategy 

managed to solve the problem within the maximum allowed computation time. The 

complex method sometimes failed to end the search within this time for n = 10. The 

intermediate results achieved after 8 hours are presented in Appendix A, Section A.3. All 

of the evolution strategies do better than the methods of Rosenbrock and Box. 

The result that the two membered evolution strategy came closer to the objective 

than the multimembered evolution without recombination was not completely unexpected, 

because considerably fewer generations than mutations can occur within the allowed time. 

What is more surprising is that the (10 , 100) strategy with recombination does almost as 

well as the two membered version. Here once again, the degree of freedom gained by the 

possibilities of recombination shows itself to advantage. The variances of the mutation 

step lengths do adjust themselves individually quite di erently according to the situation 

and thus permit much faster convergence than with equal variances for all variables. 

The other evolution strategies only come as close as they do to the solution because 

the variances reach their relative lower bounds at di erent times, whereby di erences in 

their sizes are introduced. This scaling process is, however, very much slower than the 

continuous process of adaptation brought aboutby the recombination mechanism. 

6.4 Core storage required 

Up to now, only the time has been considered as a measure of the computational cost. 

There is, however, another important characteristic that a ects the applicability of optimization 

strategies, namely the core storage required. (Today nobody would use this

Core storage required 233 

term \core" here, but at the time these tests were performed, it was so called.) All indirect 

methods of quadratic optimization, which solve the linear equations for the extremal, 

require storage of order O(n 2 ) for the matrix of coe cients. The same holds for quasi- 

Newton methods, except that here the signi cant rôleisplayed by the approximation to 

the inverse Hessian matrices. Most strategies that perform line searches in other than 

coordinate directions also require O(n 2 )words for the storage of n vectors, each withn 

coe cients. An exception to this rule is the conjugate gradient method of Fletcher and 

Reeves, which at each stage only needs to retain the latest generated direction vector 

for the subsequent iteration. Of the direct search methods included in the tests, the coordinate 

methods, the method of Hooke and Jeeves,andtheevolution strategies work 

with only O(n) words of core storage. How important the formal storage requirement 

of an optimization method can be is shown by the maximum number of variables for 

the tested strategies in Table 6.2. The limiting values range from 75 to 4,000 under the 

given conditions. There exist, of course, tricks such as segmentation for enabling larger 

programs to be run on smaller machines the cost of the strategy should then take into 

account, however, the extra cost in preparation time for an optimization. (Here again, 

modern virtual storage techniques and the relative cheapness of memory chips make the 

considerations above look rather old-fashioned.) 

In the following Table 6.11, all the strategies compared are listed again, together with 

the order of magnitude of their required computation time as obtained from the rst set of 

tests (columns 1 and 2). The third column shows how the computation time would vary 

if each function call performed O(n 2 ) rather than O(n) operations, as would occur for the 

worst case of a general quadratic objective function. The fourth column gives the storage 

requirement, again only as an order of magnitude, and the fth displays the product 

of the time and storage requirements from the two previous columns. Judging by the 

computation computation time alone, the variable metric strategy seems the best suited 

for true quadratic problems. In the least favorable case, however, it is more expensive 

than an indirect method and only faster in special cases. Problems having a very simple 

structure (e.g., Problem 1.1) can be solved just as well by direct search methods the time 

they take isatworst only a constant factor more than that of a second order method. 

If the total cost is measured by the product of time and storage requirements, all those 

strategies that store a two dimensional array of data, show up badly at least for problems 

with many variables. Since the coordinate methods have shown unreliable convergence, 

the method of Hooke and Jeeves and the evolution strategies remain as the least costly 

optimization methods. Their cost does not exceed that of indirect methods. The product 

of time and storage is not such a bad measure of the total cost in many computing centers 

jobs have been, in fact, charged with the product of storage requested in K words and 

the time in seconds of occupation of the central processing unit (K-core-sec). 

A comparison of the two membered and multimembered evolution strategies seems 

clearly to favor the simpler method. This is not surprising as several individuals in the 

multimembered procedure have to nd their way towards the optimum. In nature, this 

process runs in parallel. Already in the early 1970s, rst e orts towards constructing 

multi-processor computers were undertaken (see Barnes et al., 1968 Miranker, 1971).


Table 6.11: The dependence of the total costs of the search methods 

on the number of variables (n) 

Strategy Computation Computation Computation Core K-core-sec 

time for time for time for gen. storage 

Problem 1.1 Problem 1.2 quadr. probl. 

FIBO,GOLD,LAGR n 2 

HOJE >n 2 

DSCG n 2 

DSCP n 2 

POWE n 2 

DFPS n 2 

SIMP >n 3 

ROSE n 3 

COMP >n 3 

EVOL,GRUP,REKO n 2 

n 3y 

n 3 

n 4 

n 3 

n 3y 

n 2:5 

n 5 

n 4 

n 5y 

n 3 

n 4 

n 4 

n 4 

n 4 

n 4 

n 3:5 

n 5 

n 4 

n 5 

n 4 

n n 5 

n n 5 

n 2 

n 2 

n 2 

n 2 

n 2 

n 2 

n 2 

n 6 

n 6 

n 6 

n 5:5 

n 7 

n 6 

n 7 

n n 5 

On such a parallel computer, supposing it had 100 sub-units, one could simultaneously 

perform all the mutations and objective function evaluations of one generation in the 

(10 , 100) evolution strategy. The time required for the optimization would be about 

two orders of magnitude less than it is with a serially operating machine. In Figures 6.14 

and 6.15 the dotted lines show the results that would be obtained by the (10 , 100) strategy 

without recombination in the hypothetical case of parallel operation. No other methods 

can make use of parallel operations to such an extent. On SIMD (single instructions, 

multiple data) architectures, the possible speedup is sharply limited by the percentages of 

a program's scalar and vector operations. Using array arithmetic for all matrix and vector 

operations, the execution time of a program may be accelerated at most by a factor of ve, 

given that these operations would serially take 80% of the computation time. On MIMD 

(multiple instructions, multiple data) machines, the speedup is limited by thenumber of 

processing units a program can make useofandby the amount ofcommunication needed 

between the processors and the data store(s). Most classical optimization algorithms 

cannot economically employ large MIMD computers{even the less, the more sophisticated 

the procedures are. Multimembered evolution strategies, however, are easily scalable to 

any number of processors and communication links between them. For a taxonomy of 

parallel versions of evolution strategies, see Ho meister and Schwefel (1990). 

y Not sure to converge

Chapter 7 

Summary and Outlook 

So, is the evolution strategy the long-sought-after universal method of optimization? 

Unfortunately, things are not so simple and this question cannot be answered with a clear 

\yes." In two situations, in particular, the evolution strategies proved to be inferior to 

other methods: for linear and quadratic programming problems. These cases demonstrate 

the full e ectiveness of methods that are specially designed for them, and that cannot 

be surpassed by strategies that operate without an adequate internal model. Thus if one 

knows the topology of the problem to be solved and it falls into one of these categories, 

one should always make use of such special methods. For this reason there will always 

rightly exist a number of di erent optimization methods. 

In other cases one would naturally not search for a minimum or maximum iteratively 

if an analytic approach presented itself, i.e., if the necessary existence conditions lead to 

an easily and uniquely soluble system of equations. Nearest to this kind of indirect optimization 

come the hill climbing strategies, which operate with a global internal model. 

They approximate the relation between independent and dependent variables by a function 

(e.g., a polynomial of high order) and then follow the analytic route, but within the 

model rather than reality. Since the approximation will inevitably not be exact, the process 

of analysis and synthesis must be repeated iteratively in order to locate an extremum 

exactly. The rst part, identi cation of the parameters or construction of the model, 

costs a lot in trial steps. The cost increases with n, thenumber of variables, and p, the 

order of the tting polynomial, as O(n p ). For this reason hill climbing methods usually 

keep to a linear model ( rst order strategies, gradient methods) or a quadratic model 

(second order strategies, Newton methods). All the more highly developed methods also 

try as infrequently as possible to adjust the model to the local topology (e.g., the method 

of steepest descents) or to advance towards the optimum during the model construction 

stage (e.g., the quasi-Newton and conjugate gradient strategies). Whether this succeeds, 

and the information gathered is su cient, depends entirely on the optimization problem 

in question. A quadratic model seems obviously more suited to a non-linear problem than 

a linear model, but both have only a limited, local character. Thus in order to prove that 

the sequence of iterations converges and to make general statements about the speed of 

convergence and the Q-properties, very strict conditions must be satis ed by the objec- 

235

236 Summary and Outlook 

tive function and, if they exist, also by the constraints, such as unimodality, convexity, 

continuity, and di erentiability. Linear or quadratic convergence properties require not 

only conditions on the structure of the problem, which frequently cannot be satis ed, 

but also presuppose that the mathematical operations are in principle carried out with 

in nite accuracy . Many an attractive strategy thus fails not only because a problem 

is \pathological," having non-optimal stationary points, an inde nite Hessian matrix, or 

discontinuous partial derivatives, but simply because of inevitable rounding errors in the 

calculation, which works with a nite number of signi cant gures. Theoretical predictions 

are often irrelevant to practical problems and the strength of a strategy certainly lies 

in its capability of dealing with situations that it recognizes as precarious: for example, by 

cyclically erasing the information that has been gathered or by introducing random steps. 

As the test results con rm, the second order methods are particularly susceptible. A 

questionable feature of their algorithms is, for example, the line search for relative optima 

in prescribed directions. Contributions to all conferences in the late 1970s clearly showed 

a leaning towards strategies that do not employ line searches, thereby requiring more iterations 

but o ering greater stability. The simpler its internal model, the less complete the 

required information, the more robust an optimization strategy can be. The more rigid 

the representation of the model is, the more e ect perturbations of the objective function 

have, even those that merely result from the implementation on digital, analogue, or hybrid 

computers. Strategies that accept no worsening of the objective function are very 

easily led astray. 

Every attempt to accelerate the convergence is paid for by loss in reliability. The 

ideal of guaranteed absolute reliability, from which springs the stochastic approximation 

(in which the measured objective function values are assumed to be samples of a 

stochastic, e.g., Gaussian distribution), leads directly to a large reduction in the rates of 

convergence. The starkest contradiction, however, between the requirements for speed 

and reliability can be seen in the problem of discovering a global optimum among several 

local optima. Imagine the situation of a blind person who arrives at New York and 

wishes, without assistance or local knowledge, to reach the summit of Mt. Whitney. For 

how long might he seek? The task becomes far more formidable if there are more than 

two variables (here longitude and latitude) to determine. The most reliable global search 

method is the volume-oriented grid method, which at the same time is the costliest. In the 

multidimensional case its information requirement istoohuge to be satis ed. There is, 

therefore, often no alternative but to strike a compromise between reliability and speed. 

Here we might adopt the sequential random search with normally distributed steps and 

xed variances. It has the property ofalways maintaining a chance of global convergence, 

and is just as reliable (although slower) in the presence of stochastic perturbations. It also 

has a path-oriented character: According to the sizes of the selected standard deviations 

of the random components, it follows more or less exactly the gradient path and thus 

avoids testing systematically the whole parameter space. A further advantage is that 

its storage requirement increases only linearly with the number of variables. This can 

sometimes be a decisive factor in favor of its implementation. Most of the deterministic 

hill climbing methods require storage space of order O(n 2 ). The simple operations of 

the algorithm guarantee the least e ect of rounding errors and are safe from forbidden

Summary and Outlook 237 

numerical operations (division by zero, square root of a negative number, etc.). No 

conditions of continuity or di erentiability are imposed on the objective function. These 

advantages accrue from doing without an internal model, not insisting on an improvement 

at each step, and having an almost unlimited set of search directions and step lengths. It 

is surely not by chance that this method of zero order corresponds to the simplest rules 

of organic evolution, which can also cope, and has coped, with di cult situations. Two 

objections are nevertheless sometimes raised against the analogy of mutations to random 

steps. 

The rst is directed against randomness as such. A common point of view, which 

need not be explicitly countered, is to equate randomness with arbitraryness, even going 

so far as to suppose that \random" events are the result of a superhuman hand sharing 

out luck and misfortune but it is then further asserted that mutations do after all have 

causes, and it is concluded that they should not be regarded as random. Against this it 

can be pointed out that randomness and causality are not contradictory concepts. The 

statistical point of view that is expressed here simply represents an avoidance of statements 

about individual events and their causes. This is especially useful if the causal relation 

is very complicated and one is really only interested in the global behavioral laws of a 

stochastic set of events, as they are expressed by probability density distributions. The 

treatment ofmutations as stochastic events rather than otherwise is purely and simply a 

re ection of the fact that they represent undirected and on average small deviations from 

the initial condition. Since one has had to accept that non-linear dynamic systems rather 

frequently produce behaviors called deterministic chaos (which in turn is used to create 

pseudorandom numbers on computers), arguments against speaking of random events in 

nature have diminished considerably. 

The second objection concerns the unbelievably small probability, asproved by calculation, 

that a living thing, or even a mere wristwatch, could arise from a chance step 

of nature. In this case, biological evolution is implicitly being equated to the simultaneous, 

pure random methods that resemble the grid search. In fact the achievements 

of nature are not explicable with this model concept. If mutations were random events 

evenly distributed in the whole parameter space it would follow that later events would 

be completely independent of the previous results that is to say that descendants of a 

particular parent would bear no resemblance to it. This overlooks the sequential character 

of evolution, which is inherent in the consecutive generations. Only the sequential 

random search can be regarded as an analogue of organic evolution. The changes from 

one generation to the next, expressed as rates of mutation, are furthermore extremely 

small. The fact that this must be so for a problem with so many variables is shown by 

Rechenberg's theory of the two membered evolution strategy: optimal (i.e., fastest possible) 

progress to the optimum is achieved if, and only if, the standard deviations of the 

random components of the vector of changes are inversely proportional to the number of 

variables. The 1=5 success rule for adaptation of the step length parameters does not, 

incidentally, have a biological basis rather it is suited to the requirements of numerical 

optimization. It allows rates of convergence to be achieved that are comparable to those 

of most other direct search strategies. As the comparison tests show, because of its low 

computational cost per iteration the evolution strategy is actually far superior to some


methods for many variables, for example those that employ costly orthogonalization processes. 

The external control of step lengths sometimes, however, worsens the reliabilityof 

the strategy. In \pathological" cases it leads to premature termination of the search and 

reduces besides the chance of global convergence. 

Now instead of concluding like Bremermann that organic evolution has only reached 

stagnation points and not optima, for example, in ecological niches, one should rather 

ask whether the imitation of the natural process is su ciently perfect. One can scarcely 

doubt the capability ofevolution to create optimal adaptations and ever higher levels 

of development the already familiar examples of the achievements of biological systems 

are too numerous. Failures with simulated evolution should not be imputed to nature 

but to the simulation model. The two membered scheme incorporates only the principles 

of mutation and selection and can only be regarded as a very simple basis for a true 

evolution strategy. On the other hand one must proceed with care in copying nature, as 

demonstrated by Lilienthal's abortive attempt, which is ridiculed nowadays, to build a 

ying machine by imitating the birds. The objective, to produce high lift with low drag, 

is certainly the same in both cases, but the boundary conditions (the ow regime, as 

expressed by the Reynolds number) are not. Bionics, the science of evaluating nature's 

patents, teaches us nowadays to beware of imitating in the sense of slavishly copying all the 

details but rather to pay attention to the principle. Thus Bremermann's concept of varying 

the variables individually instead of all together must also be regarded as an inappropriate 

way to go about an optimization with continuously variable quantities. In spite of the 

many, often very detailed investigations made into the phenomenon of evolution, biology 

has o ered no clues as to how an improved imitation should look, perhaps because it has 

hitherto been a largely descriptive rather than analytic science. The di culties of the two 

membered evolution with step length adaptation teach us to look here to the biological 

example. It also alters the standard deviations through the generations, as proved by 

the existence of mutator genes and repair enzymes. Whilst nature cannot in uence the 

mutation-provoking conditions of the environment, it can reduce their e ects to whatever 

level is suitable. The step lengths are genetically determined they can be thought ofas 

strategy parameters of nature that are subject to the mutation-selection process just like 

the object parameters. 

To carry through this principle as the algorithm of an improved evolution strategy 

one has to go over from the two membered toamultimembered scheme. The ( , ) 

strategy does so by employing the population principle and allowing parents in each 

generation to produce descendants, of which the best are selected as parents of the 

following generation. In this way the sequential as well as the simultaneous character 

of organic evolution is imitated the two membered concept only achieves this insofar 

as a single parent produces descendents until it is surpassed by one of them in vitality, 

the biological criterion of goodness. According to Rechenberg's hypothesis that the forms 

and intricacies of the evolutionary process that weobservetoday are themselves the result 

of development towards an optimal optimization strategy, our measures should lead to 

improved results. The test results show that the reliability of the (10 , 100) strategy, 

taken as an example, is indeed better than that of the (1+1) evolution strategy. In 

particular, the chances of locating global optima in multimodal problems have become


considerably greater. Global convergence can even be achieved in the case of a non-convex 

and disconnected feasible region. In the rate of convergence test the (10 , 100) strategy 

does a lot worse, but not by the factor 100 that might be expected. In terms of the number 

of required generations, rather than the computation time, the multimembered strategy 

is actually considerably faster. The increase in speed compared to the two membered 

method comes about because not only the sign of 4F ,thechange in the function value, 

but also its magnitude plays a rôle in the selection process. Nature possesses a way of 

exploiting this advantage that is denied to conventional, serially operating computers: It 

operates in parallel. All descendants of a generation are produced at the same time, and 

their vitality is tested simultaneously. If nature could be imitated in this way, the ( , ) 

strategy would make bothavery reliable and a fast optimization method. 

The following two paragraphs, though completely out of date, have been left in place 

mainly to demonstrate the considerable shift in the development of computers during the 

last 20 years (compare with Schwefel, 1975a). Meanwhile parallel computers are beginning 

to conquer desk tops. 

Long and complicated iterative processes, such as occur in many other branches 

of numerical mathematics, led engineers and scientists of the University of 

Illinois, U.S.A., to consider new ways of reducing the computation times of 

programs. They built their own computer, Illiac IV, which has especially short 

data retrieval and transfer times (Barnes et al., 1968). They were unable to 

approach the 10 20 bits/sec given by Bledsoe (1961) as an upper limit for serial 

computers, but there will inevitably always be technological barriers to 

achieving this physical maximum. 

Anovel organizational principle of Illiac IV is much more signi cant inthis 

connection. A bank of satellite computers are attached to a central unit, each 

with its own processor and access to a common memory. The idea is for the 

sub-units to execute simultaneously various parts of the same program and by 

this true parallel operation to yield higher e ective computation speeds. In 

fact not every algorithm can take advantage of this capability, for it is impossible 

to execute two iterations simultaneously if the result of one in uences 

the next. It may sometimes be necessary to reconsider and make appropriate 

modi cations to conventional methods, e.g., of linear algebra, before the advantages 

of the next generation of computers can be exploited. The potential 

and the problems of implementing parallel computers are already receiving 

close attention: Shedler (1967), Karp and Miranker (1968), Miranker (1969, 

1971), Chazan and Miranker (1970), Abe and Kimura (1970), Sameh (1971), 

Patrick (1972), Gilbert and Chandler (1972), Hansen (1972), Eisenberg and 

McGuire (1972), Casti, Richardson, and Larson (1973), Larson and Tse (1973), 

Miller (1973), Stone (1973a,b). A version of FORTRAN for parallel computers 

has already been devised (Millstein, 1973). 

Another signi cant advantage of the multimembered as against the two membered scheme 

that also holds for serial calculations is that the self-adjustment of step lengths can be 

made individually for each component. An automatic scaling of the variables results from


this, which in certain cases yields a considerable improvement in the rate of progress. It 

can be achieved either by separate variation of the standard deviations i for i = 1(1)n, 

by recombination alone, or, even better, by both measures together. Whereas in the two 

membered scheme, in which (unless the (0) 

i are initially given di erentvalues) the contour 

lines of equiprobable steps are circles, or hyperspherical surfaces, they are now ellipses or 

hyperellipsoids that can extend or contract along the coordinate directions following the 

n-dimensional normal distribution of the set of n random components zi for i = 1(1)n : 

1 

w(z) = 

(2 ) n 2 

nQ 

i=1 

i 

This is not yet, however, the most general form of a normal distribution, which is rather: 

p 

Det A 

w(z) = 

(2 ) n exp 

2 

; 1 

2 (z ; )T A (z ; ) 

The expectation value vector can be regarded as a deterministic part of the random step 

z. However, the comparison made by Schrack andBorowski (1972) between the random 

strategies of Schumer-Steiglitz and Matyas shows that even an ingenious learning scheme 

for adapting to the local conditions only improves the convergence in special cases. A 

much more important feature seems to be the step length adaptation. It is now possible 

for the elements of the matrix A to be chosen so as to give the ellipsoid of variation any 

desired orientation in the space. Its axes, the regression directions of the random vector, 

only coincide with the coordinate axes if A is a diagonal matrix. In that case the old 

scheme is recovered whereby thevariances ii or the 2 

i reappear as diagonal elements of 

the inverse matrix A ;1 . If, however, the other elements, the covariances ij = ji are nonzero, 

the ellipsoids are rotated in the space. The random components zi become mutually 

dependent, or correlated. The simplest kind of correlation is linear, which is the only case 

to yield hyperellipsoids as surfaces of constant step probability. Instead of just n strategy 

parameters i one would now have tovary n 

2 (n + 1) di erent quantities ij. Although in 

principle the multimembered evolution strategy allows an arbitrary number of strategy 

variables to be included in the mutation-selection process, in practice the adaptation of 

so many parameters could take too long and cancel out the advantage of more degrees of 

freedom. Furthermore, the ij must satisfy certain compatibility conditions (Sylvester's 

criteria, see Faddejew and Faddejewa, 1973) to ensure an orthogonal coordinate system 

exp 

; 1 

2 

or a positive de nite matrix A. In the simplest case, n =2,with 

there is only one condition: 

and the quantity de ned by 

12 = 

A ;1 = 

" 

11 12 

21 22 

2 

12 = 2 

21 < 11 22 = 2 

1 

# 

nX 

i=1 

2 

2 

zi 

12 q 

( 1 2) ;1 < 12 < 1 

i 

2 !


is called the correlation coe cient. If the covariances were generated independently by 

amutation process in the multimembered evolution scheme, with subsequent application 

of the rules of Scheuer and Stoller (1962) or Barr and Slezak (1972), there would be no 

guarantee that the surfaces of equal probability density would actually be hyperellipsoids. 

It follows that such a linear correlation of the random changes can be constructed more 

easily by rst generating as before (0 2 

i ) normally distributed, independent random components 

and then making a coordinate rotation through prescribed angles. These angles, 

rather than the covariances ij, represent the additional strategy variables. In the most 

general case there are a total of np = n(n 

; 1) such angles, which can take all values 

2 

between 0 0 and 360 0 (or ; and ). Including the ns = n \step lengths" i, the total 

number of strategy parameters to be speci ed in the population bymutation and selection 

is n 

2 (n + 1). It is convenient to generate the angles j by an additive mutation process 

(cf. Equations (5.36) and (5.37)) 

(g) 

Nj = (g) 

Ej + ^ Z (g) 

j for j = 1(1)np 

where the ^Z (g) 

j can again be normally distributed, for example, with a standard deviation 

4 which is the same for all angles. Let 4x0 i represent themutations as produced by the 

old scheme and 4xi the correlated changes in the object variables produced by the rotation 

for the two dimensional case (n = ns =2np = 1) the coordinate transformation 

for the rotation can simply be read o from Figure 7.1 

4x1 = 4x 0 

1 cos ;4x0 2 sin 

4x 2 = 4x 0 

1 sin + 4x 0 

2 cos 

For n = ns = 3 three consecutive rotations would need to be made: 

In the (4x 1 4x 2) plane through an angle 1 

In the (4x0 1 4x0 

2 ) plane through an angle 2 

In the (4x 00 

2 4x 00 

3) plane through an angle 3 

Starting from the uncorrelated random changes 4x 000 

1 4x 000 

2 4x 000 

3 these rotations would 

have to be made in the reverse order. Thus also, in the general case with n 

2 

(n ; 1) 

rotations, each one only involves two coordinates so that the computational cost increases 

as O(np). The validity of this algorithm has been proved by Rudolph (1992a). 

An immediate simpli cation can be made if not all the ns step lengths are di erent, i.e., 

if the hyperellipsoid of equal probabilityofamutation has rotational symmetry about one 

or more axes. In the extreme case ns = 2 there are n ; ns such axes and only np = n ; 1 

relevant angles of rotation. Except for one distinct principle axis, the ellipsoid resembles 

a sphere. If in the course of the optimization the minimum search leads through a narrow 

valley (e.g., in Problem 2.37 or 3.8 of the catalogue of test problems), it will often be 

quite adequate to work with such a greatly reduced variability of the mutation ellipsoid.


α 

∆ 

x’ 1 

2 

x 

2 

x ’ 

x’ 

∆ x 

1 

Mutation 

∆ x 

2 

∆ 

’ 

x 2 

Figure 7.1: Generation of correlated mutations 

Between the two extreme cases ns = n and ns =2(ns =1would be the uncorrelated 

case with hyperspheres as mutation ellipsoids) any choice of variability is possible. In 

general we have 

2 ns n 

np = n ; ns 

(ns ; 1) 

2 

For a given problem the most suitable choice of ns, the number of di erent step lengths, 

would have to be obtained by numerical experiment. 

For this purpose the subroutine KORR and its associated subroutines listed in Appendix 

B, Section B.3 is exibly coded to give the user considerable freedom in the choice 

of quantities that determine the strategy parameters. This variant oftheevolution strategy 

(Schwefel, 1974) could not be fully included in the strategy test (performed in 1973) 

however, initial results con rmed that, as expected, it is able to construct a kind of variable 

metric for the changes in the object variables by adapting the angles to the local 

topology of the objective function. 

The slow convergence of the two membered evolution strategy can often be traced 

to the fact that the problem has long elliptical (or nearly elliptical) contours of constant 

objective function value. If the function is quadratic, their extension (or eccentricity) 

can be expressed by the condition number of the matrix of second order coe cients. In 

the worst case, in which the search is started at the point of greatest curvature of the 

contour surface F (x) = const:, the rate of progress seems to be inversely proportional 

1 

x 

1


to the product of the number of variables and the square root of the condition number. 

This dependence on the metric would be eliminated if the directions of the axes of the 

variance ellipsoid corresponded to those of the contour ellipsoid, which is exactly what the 

introduction of correlated random numbers should achieve. Extended valleys in other than 

coordinate directions then no longer hinder the search because, after a transition phase, 

an initially elliptical problem is reduced to a spherical one. In this way theevolution 

strategy acquires properties similar to those of the variable metric method of Davidon- 

Fletcher-Powell (DFP). In the test, for just the reason discussed above, the latter proved 

to be superior to all other methods for quadratic objective functions. For such problems 

one should not expect it to be surpassed by the evolution strategy, since compared to the 

Qnproperty of the DFP method the evolution strategy has only a QO(n) property 

i.e., it does not nd the optimum after exactly n iterations but rather it reaches a given 

approximation to the objective after O(n) generations. This disadvantage, only slight in 

practice, is outweighed by the following advantages: 

Greater exibility, hence reliability, in other than quadratic cases 

Simpler computational operations 

Storage required increases only as O(n) (unless one chooses ns = n) 

While one has great hopes for this extension of the multimembered evolution strategy, 

one should not be blinded by enthusiasm to limitations in its capability. It would yield 

computation times no better than O(n3 ) if it turns out that a population of O(n) parents 

is needed for adjusting the strategy parameters and if pure serial rather than parallel 

computation is necessary. 

Does the new scheme still correspond to the biological paradigm? It has been discovered 

that one gene often in uences several phenotypic characteristics of an individual 

(pleiotropy) and conversely that many characteristics depend on the cooperative e ect 

of several genes (polygeny). These interactions just mean that the characteristics are 

correlated. A linear correlation as in Figure 7.1 represents only one of the many conceivable 

types in which (x0 1x0 2) is the plane of the primary, independent genetic changes and 

(x1x2) that of the secondary, mutually correlated changes in the characteristics. Particular 

kinds of such dependence, for example, allometric growth, have beenintensively 

studied (e.g., Grasse, 1973). There is little doubt that the relationships have also adapted, 

during the history of development, to the topological requirements of the objective function. 

The observable di erences between life forms are at least suggestive ofthis. Even 

non-linear correlations may occur. Evolution has indeed to cope with far greater di culties, 

for it has no ordered number system at its disposal. In the rst place it had to create 

a scale of measure{with the genetic code, for example, which has been learned during the 

early stages of life on earth. 

Whether it is ultimately worth proceeding so far or further to mimicevolution is still 

an open question, but it is surely a path worth exploring perhaps not for continuous, but 

for discrete or mixed parameter optimization. Here, in place of the normal distribution 

of random changes, a discrete distribution must be applied, e.g., a binomial or better 

still a distribution with maximum entropy (see Rudolph, 1994b), so that for small \total


step lengths" the probability really is small that two ormorevariables are altered simultaneously. 

Occasional stagnation of the search will only be avoided, in this case, if the 

population allows worsening within a generation. Worsening is not allowed by the two 

membered strategy, but it is by the multimembered ( , ) strategy , in which the parents, 

after producing descendants, no longer enter the selection process. Perhaps this shows 

that the limited life span of individuals is no imperfection of nature, no consequence of 

an inevitable weakness of the system, but rather an intelligent, indeed essential means of 

survival of the species. This conjecture is again supported by the genetically determined, 

in e ect preprogrammed, ending of the process of cell division during the life of an individual. 

Sequential improvement and consequent rapid optimization is only made possible 

by the following of one generation after another. However, one should be extremely wary 

of applying such concepts directly to mankind. Human evolution long ago left the purely 

biological domain and is more active nowadays in the social one. One properly refers 

now to a cultural evolution. There is far too little genetic information to specify human 

behavior completely. 

Little is known of which factors are genetically inherited and which socially acquired, 

as shown by the continuing discussions over the results of behavioral research and the 

diametrically opposite points of view of individual scientists in the eld. The two most 

important evolutionary principles, mutation and selection, also belong to social development 

(Alland, 1970). Actually, even more complicated mechanisms are at work here. 

Oversimpli cations can have quite terrifying consequences, as shown by the example of 

social Darwinism, to which Koch (1973) attributes responsibility for racist and imperialist 

thinking and hence for the two World Wars. No such further speculation with the evolution 

strategy will therefore be embarked upon here. The fact remains that the recognition 

of evolution as representing a sequential optimization process is too valuable to be dismissed 

to oblivion as evolutionism (Goll, 1972). Rather one should consider what further 

factors are known in organic evolution that might beworth imitating, in order to make of 

the evolution strategy an even more general optimization method for up to now several 

developments have con rmed Rechenberg's hypothesis that the strategy can be improved 

by taking into account further factors, at least when this is done adequately and the 

biological and mathematical boundary conditions are compatible with each other. Furthermore, 

by no means all evolutionary principles have yet been adopted for optimizing 

technical systems. 

The search for global optima remains a particularly di cult problem. In such cases 

nature seems to hunt for all, or at least a large number of maxima or minima at the same 

time by the splitting of a population (the isolation principle). After a transition phase 

the individuals of both or all the subpopulations can no longer intermix. Thereafter each 

group only seeks its own speci c local optimum, which might perhaps be the global one. 

This principle could easily be incorporated into the multimembered scheme if a criterion 

could be de ned for performing the splitting process. 

Many evolution principles that appeared later on the scene can be explained as affording 

the greater chance of survival to a population having the better mechanism of 

inheritance (for these are also variable) compared to an other forming a worse \strategy 

of life." In this way the evolution method could itself be optimized by organizing a compe-


tition between several populations that alter the concept of the optimum seeking strategy 

itself. The simplest possibility, for example, would be to vary the number of parents and 

of descendants two or more groups would be set up each with its own values of these 

parameters, each group would be given a xed time to seek the optimum then the group 

that has advanced the most would be allowed to \survive." In this way these strategy 

variables would be determined to best suit the particular problem and computer, with the 

objective of minimizing the required computation time. One might call such an approach 

meta- or hierarchical evolution strategy (see Back, 1994a,b). 

The solution of problems with multiple objectives could also be approached with the 

multimembered evolution strategy. This is really the most common type of problem 

in nature. The selection step, the reduction to the best of the descendants, could be 

subdivided into several partial steps, in each ofwhich only one of the criteria for selection 

is applied. In this way noweighting of the partial objectives would be required. First 

attempts with only two variables and two partial objectives showed that a point onthe 

Pareto line is always approached as the optimum. By unequal distribution of the partial 

selections the solution point could be displaced towards one of the partial objectives. At 

this stage subjective information would have to be applied because all the Pareto-optimal 

solutions are initially equally good (see Kursawe, 1991, 1992). 

Contrary to many statements or conjectures that organic evolution is a particularly 

wasteful optimization process, it proves again and again to be precisely suited to advancing 

with maximum speed without losing reliability ofconvergence, even to better and 

better local optima. This is just what is required in numerical optimization. In both 

cases the available resources limit what can be achieved. In one case these are the limitations 

of food and the nite area of the earth for accommodating life, in the other they 

are the nite number of satellite processors of a parallel-organized mainframe computer 

and its limited (core) storage space. If the evolution strategy can be considered as the 

sought-after universal optimization method, then this is not in the sense that it solves 

a particular problem (e.g., a linear or quadratic function) exactly, with the least iterations 

or generations, but rather refers to its being the most readily extended concept, 

able to solve very di cult problems, problems with particularly many variables, under 

unfavorable conditions such as stochastic perturbations, discrete variables, time-varying 

optima, and multimodal objective functions (see Hammel and Back, 1994). Accordingly, 

the results and assessments introduced in the present work can at best be considered as 

a rst step in the development towards a universal evolution strategy. 

Finally, some early applications of the evolution strategy will be cited. Experimental 

tasks were the starting point for the realization of the rst ideas for an optimization 

strategy based on the example of biological evolution. It was also rst applied here to 

the solution of practical problems (see Schwefel, 1968 Klockgether and Schwefel, 1970 

Rechenberg, 1973). Meanwhile it is being applied just as widely to optimization problems 

that can be expressed in computational or algorithmic form, e.g., in the form of simulation 

models. The following is a list of some of the successful applications, with references to 

the relevant publications. 

1. Optimal dimensioning of the core of a fast sodium-type breeder reactor (Heusener, 

1970)


2. Optimal allocation of investments to various health-service programs in Columbia 

(Schwefel, 1972) 

3. Solving curve- tting problems by combining a least-squares method with the evolution 

strategy (Plaschko and Wagner, 1973) 

4. Minimum-weight designing of truss constructions partly in combination with linear 

programming (Ley ner, 1974 and Ho er, 1976) 

5. Optimal shaping of vaulted reinforced concrete shells (Hartmann, 1974) 

6. Optimal dimensioning of quadruple-joint drives (Anders, 1977) 

7. Approximating the solution of a set of non-linear di erential equations (Rodlo , 

1976) 

8. Optimal design of arm prostheses (Brudermann, 1977) 

9. Optimization of urban and regional water supply systems (Cembrowicz and Krauter, 

1977) 

10. Combining the evolution strategy with factorial design techniques (Kobelt and 

Schneider, 1977) 

11. Optimization within a dynamic simulation model of a socioeconomic system (Krallmann, 

1978) 

12. Optimization of a thermal water jet propulsion system (Markwich, 1978) 

13. Optimization of a regional system for the removal of refuse (von Falkenhausen, 1980) 

14. Estimation of parameters within a model of oods (North, 1980) 

15. Interactive superimposing of di erent direct search techniques onto dynamic simulation 

models, especially models of the energy system of the Federal Republic of 

Germany (Heckler, 1979 Drepper, Heckler, and Schwefel, 1979). 

Much longer lists of references concerning applications as well as theoretical work in 

the eld of evolutionary computation have been compiled meanwhile by Alander (1992, 

1994) and Back, Ho meister, and Schwefel (1993). 

Among the many di erent elds of applications only one will be addressed here, i.e., 

non-linear regression and correlation analysis. In general this leads to a multimodal 

optimization problem when the parameters searched for enter the hypotheses non-linearly, 

e.g., as exponents. Very helpful under such circumstances is a tool with which one can 

switch from one to the other minimization method. Beginning with a multimembered 

evolution strategy and re ning the intermediate results by means of a variable metric 

method has often led to practically useful results (e.g., Frankhauser and Schwefel, 1992). 

In some cases of practical applications of evolution strategies it turns out that the 

number of variables describing the objective function has to vary itself. An example was


the experimental optimization of the shape of a supersonic one-component two-phase ow 

nozzle (Schwefel, 1968). Conically bored rings with xed lengths could be put side by 

side, thus forming potentially millions of di erent inner nozzle contours. But the total 

length of the nozzle had to be varied itself. So the number of rings and thus the number 

of variables (inner diameters of the rings) had to be mutated during the search foran 

optimum shape as well. By imitating gene duplication and gene deletion at randomly 

chosen positions, a rather simple technique was found to solve the variable number of 

variables problem. Such a procedure might be helpful for many structural optimization 

problems (e.g., Rozvany, 1994) as well. 

If the decision variables are to be taken from a discrete set only (the distinct values may 

be equidistant ornotinteger and binary values just form special subclasses), ESs may be 

used sometimes without any change. Within the objective function the real values must 

simply undergo a suitable rounding-o process as shown at the end of Appendix B, Section 

B.3. Since all ESs handle unchanged objective functionvalues as improvements, the selfadaptation 

of the standard deviations on a plateau will always lead to their enlargement, 

until the plateaus F (x) =const: built by rounding o can be left. On a plateau, the ES 

performs a random walk with ever increasing step sizes. 

Towards the end of the search, however, more and more of the individual step sizes 

have to become very small, whereas others{singly or in some combination{should be increased 

to allow hopping from one to the next n-cube in the decision-variable space. The 

chances for that kind of adaptation are good enough as long as sequential improvements 

are possible the last few of them will not happen that way,however. A method of escaping 

from that awkward situation has been shown (Schwefel, 1975b), imitating multicellular 

individuals and introducing so-called somatic mutations. Even in the case of binary variables 

an ES thus can reach the optimum. Since no real application has been done this 

way until now, no further details will be given here. 

An interesting question is whether there are intermediate cases between a plus and a 

comma version of the multimembered ES. The answer must be, \Yes, there are." Instead 

of neglecting the parents during the selection step (within comma-ESs), or allowing them 

to live forever in principle (within plus-ESs only until o spring surpass them, of course), 

one might implant a generation counter into each individual. As soon as a pre xed limit 

is reached, they leave the scene automatically. Such a more general ES version could 

be termed a ( ) strategy, where denotes the maximal number of generations 

(iterations), an individual is allowed to \survive" in the population. For =1wethen 

get the old comma-version, whereas the old plus-version is reached if goes to in nity. 

There are some preliminary results now, but as yet they are too unsystematic to be 

presented here. 

Is the strict synchronization of the evolutionary process within ESs as well as GAs 

the best way to do the job? The answer to this even more interesting question is, \No," 

especially if one makes use of MIMD parallel machines or clusters of workstations. Then 

one should switch to imitating life more closely: Birth and death events mayhappenatthe 

same time. Instead of modelling a central decision maker for the selection process (whichis 

an oversimpli cation) one could use a predator-prey model like that of Lotka andVolterra. 

Adding a neighborhood model (see Gorges-Schleuter, 1991a,b Sprave, 1993, 1994) for the


recombination process would free the whole strategy from all kinds of synchronization 

needs. Initial tests have shown that this is possible. Niching and migration as used by 

Rudolph (1991) will be the next features to be added to the APES (asynchronous parallel 

evolution strategy). 

A couple of earlier attempts towards parallelizing ESs will be mentioned at the end 

of this chapter. Since all of them are somehow intermediate solutions, however, none of 

them will be explained in detail. The reader is referred to the literature. 

A taxonomy, more or less complete with respect to possible ways of parallelizing EAs, 

may be found in Ho meister and Schwefel (1990) or Ho meister (1991). Rudolph (1991) 

has realized a coarse-grained parallel ES with subpopulations on each processor and more 

or less frequent migration events, whereas Sprave (1994) gave preference to a ne-grained 

di usion model. Both of these more volume-oriented approaches delivered great advances 

in solving multimodal optimization problems as compared with the more greedy and pathoriented 

\canonical" ( , ) ES. The comma version, by theway, is necessary to follow a 

nonstationary optimum (see Schwefel and Kursawe, 1992), and only such anESisableto 

solve on-line optimization problems. 

Nevertheless, one should never forget that there are many other specialized optimum 

seeking methods. For a practitioner, a tool box withmany di erent algorithms might 

always be the \optimum optimorum." Whether he or she chooses a special tool by 

hand, so to speak (see Heckler and Schwefel, 1978 Heckler, 1979 Schwefel, 1980, 1981 

Hammel, 1991 Bendin, 1992 Back and Hammel, 1993), or relies upon some knowledgebased 

selection scheme (see Campos, 1989 Campos and Schwefel, 1989 Campos, Peters, 

and Schwefel, 1989 Peters, 1989, 1991 Lehner, 1991) will largely depend on his or her 

experience.

Chapter 8 

References 

Glossary of abbreviations at the end of this list 

Aarts, E., J. Korst (1989), Simulated annealing and Boltzmann machines, Wiley, Chichester 

Abadie, J. (Ed.) (1967), Nonlinear programming, North-Holland, Amsterdam 

Abadie, J. (Ed.) (1970), Integer and nonlinear programming, North-Holland, Amsterdam 

Abadie, J. (1972), Simplex-like methods for non-linear programming, in: Szego (1972), 

pp. 41-60 

Abe, K., M. Kimura (1970), Parallel algorithm for solving discrete optimization problems, 

IFACKyoto Symposium on Systems Engineering Approach to Computer Control, 

Kyoto, Japan, Aug. 1970, paper 35.1 

Ablay, P. (1987), Optimieren mit Evolutionsstrategien, Spektrum der Wissenschaft 

(1987, July), 104-115 (see also discussion in (1988, March), 3-4 and (1988, June), 

3-4) 

Ackley, D.H. (1987), A connectionist machine for genetic hill-climbing, Kluwer Academic, 

Boston 

Adachi, N. (1971), On variable-metric algorithms, JOTA 7, 391-410 

Adachi, N. (1973a), On the convergence of variable-metric methods, Computing 11, 

111-123 

Adachi, N. (1973b), On the uniqueness of search directions in variable-metric algorithms, 

JOTA 11, 590-604 

Adams, R.J., A.Y. Lew (1966), Modi ed sequential random search using a hybrid computer, 

University of Southern California, Electrical Engineering Department, report, 

May 1966 

249

250 References 

Ahrens, J.H., U. Dieter (1972), Computer methods for sampling from the exponential 

and normal distributions, CACM 15, 873-882, 1047 

Aizerman, M.A., E.M. Braverman, L.I. Rozonoer (1965), The Robbins-Monro process 

and the method of potential functions, ARC 26, 1882-1885 

Akaike, H. (1960), On a successive transformation of probability distribution and its 

application to the analysis of the optimum gradient method, Ann. Inst. Stat. 

Math. Tokyo 11, 1-16 

Alander, J.T. (Ed.) (1992), Proceedings of the 1st Finnish Workshop on Genetic Algorithms 

and their Applications, Helsinki, Nov. 4-5, 1992, Bibliography pp. 203-281, 

Helsinki UniversityofTechnology, Department of Computer Science, Helsinki, Finland 

Alander, J.T. (1994), An indexed bibliography of genetic algorithms, preliminary edition, 

Jarmo T. Alander, Espoo, Finland 

Albrecht, R.F., C.R. Reeves, N.C. Steele (Eds.) (1993), Arti cial neural nets and genetic 

algorithms, Proceedings of an International Conference, Innsbruck, Austria, 

Springer, Vienna 

Aleksandrov, V.M., V.I. Sysoyev, V.V. Shemeneva (1968), Stochastic optimization, Engng. 

Cybern. 6(5), 11-16 

Alland, A., Jr. (1970), Evolution und menschliches Verhalten, S. Fischer, Frankfort/Main 

Allen, P., J.M. McGlade (1986), Dynamics of discovery and exploitations|the case of 

the Scotian shelf ground sh sheries, Can. J. Fish. Aquat. Sci. 43, 1187-1200 

Altman, M. (1966), Generalized gradient methods of minimizing a functional, Bull. 

Acad. Polon. Sci. 14, 313-318 

Amann, H. (1968a), Monte-Carlo Methoden und lineare Randwertprobleme, ZAMM 48, 

109-116 

Amann, H. (1968b), Der Rechenaufwand bei der Monte-Carlo Methode mit Informationsspeicherung, 

ZAMM 48, 128-131 

Anders, U. (1977), Losung getriebesynthetischer Probleme mit der Evolutionsstrategie, 

Feinwerktechnik und Me technik 85(2), 53-57 

Anderson, N., A. Bjorck (1973), A new high order method of Regula Falsi type for 

computing a root of an equation, BIT 13, 253-264 

Anderson, R.L. (1953), Recent advances in nding best operating conditions, J. Amer. 

Stat. Assoc. 48, 789-798 

Andrews, H.C. (1972), Introduction to mathematical techniques in pattern recognition, 

Wiley-Interscience, New York

References 251 

Anscombe, F.J. (1959), Quick analysis methods for random balance screening experiments, 

Technometrics 1, 195-209 

Antonov, G.E., V.Ya. Katkovnik (1972), Method of synthesis of a class of random search 

algorithms, ARC 32, 990-993 

Aoki, M. (1971), Introduction to optimization techniques|fundamentals and applications 

of nonlinear programming, Macmillan, New York 

Apostol, T.M. (1957), Mathematical analysis|a modern approach toadvanced calculus, 

Addison-Wesley, Reading MA 

Arrow, K.J., L. Hurwicz (1956), Reduction of constrained maxima to saddle-point problems, 

in: Neyman (1956), vol. 5, pp. 1-20 

Arrow, K.J., L. Hurwicz (1957), Gradient methods for constrained maxima, Oper. Res. 

5, 258-265 

Arrow, K.J., L. Hurwicz, H. Uzawa (Eds.) (1958), Studies in linear and non-linear 

programming, Stanford University Press, Stanford CA 

Asai, K., S. Kitajima (1972), Optimizing control using fuzzy automata, Automatica 8, 

101-104 

Ashby, W.R. (1960), Design for a brain, 2nd ed., Wiley, New York 

Ashby, W.R. (1965), Constraint analysis of many-dimensional relations, in: Wiener and 

Schade (1965), pp. 10-18 

Ashby, W.R. (1968), Some consequences of Bremermann's limit for information-processing 

systems, in: Oestreicher and Moore (1968), pp. 69-76 

Avriel, M., D.J. Wilde (1966a), Optimality proof for the symmetric Fibonacci search 

technique, Fibonacci Quart. 4, 265-269 

Avriel, M., D.J. Wilde (1966b), Optimal search for a maximum with sequences of simultaneous 

function evaluations, Mgmt. Sci. 12, 722-731 

Avriel, M., D.J. Wilde (1968), Golden block search for the maximum of unimodal functions, 

Mgmt. Sci. 14, 307-319 

Axelrod, R. (1984), The evolution of cooperation, Basic Books, New York 

Azencott, R. (Ed.) (1992), Simulated annealing|parallelization techniques, Wiley, New 

York 

Bach, H. (1969), On the downhill method, CACM 12, 675-677 

Back, T. (1992a), Self-adaptation in genetic algorithms, in: Varela and Bourgine (1992), 

pp. 263-271


Back, T. (1992b), The interaction of mutation rate, selection, and self-adaptation within 

a genetic algorithm, in: Manner and Manderick (1992), pp. 85-94 

Back, T. (1993), Optimal mutation rates in genetic search, in: Forrest (1993), pp. 2-9 

Back, T. (1994a), Evolutionary algorithms in theory and practice, Dr. rer. nat. Diss., 

University of Dortmund, Department of Computer Science, Feb. 1994 

Back, T. (1994b), Parallel optimization of evolutionary algorithms, in: Davidor, Schwefel, 

and Manner (1994), pp. 418-427 

Back, T., U. Hammel (1993), Einsatz evolutionarer Algorithmen zur Optimierung von 

Simulationsmodellen, in: Szczerbicka and Ziegler (1993), pp. 1-22 

Back, T., U. Hammel, H.-P.Schwefel (1993), Modelloptimierung mit evolutionaren Algorithmen, 

in: Sydow (1993), pp. 49-57 

Back, T., F. Ho meister, H.-P. Schwefel (1991), A survey of evolution strategies, in: 

Belew and Booker (1991), pp. 2-9 

Back, T., F. Ho meister, H.-P. Schwefel (1993), Applications of evolutionary algorithms, 

technical report SYS-2/92, 4th ext. ed., Systems Analysis Research Group, University 

ofDortmund, Department of Computer Science, July 1993 

Back, T., G. Rudolph, H.-P. Schwefel (1993), Evolutionary programming and evolution 

strategies|similarities and di erences, in: Fogel and Atmar (1993), pp. 11-22 

Back, T., H.-P. Schwefel (1993), An overview of evolutionary algorithms for parameter 

optimization, Evolutionary Computation 1, 1-23 

Baer, R.M. (1962), Note on an extremum locating algorithm, Comp. J. 5, 193 

Balakrishnan, A.V. (Ed.) (1972), Techniques of optimization, Academic Press, New 

York 

Balakrishnan, A.V., M. Contensou, B.F. DeVeubeke, P. Kree, J.L. Lions, N.N. Moiseev 

(Eds.) (1970), Symposium on optimization, Springer, Berlin 

Balakrishnan, A.V., L.W. Neustadt (Eds.) (1964), Computing methods in optimization 

problems, Academic Press, New York 

Balakrishnan, A.V., L.W. Neustadt (Eds.) (1967), Mathematical theory of control, 

Academic Press, New York 

Balinski, M.L., P. Wolfe (Eds.) (1975), Nondi erentiable optimization, vol. 3 of Mathematical 

Programming Studies, North-Holland, Amsterdam 

Bandler, J.W. (1969a), Optimization methods for computer-aided design, IEEE Trans. 

MTT-17, 533-552


Bandler, J.W. (1969b), Computer optimization of inhomogeneous waveguide transformers, 

IEEE Trans. MTT-17, 563-571 

Bandler, J.W., C. Charalambous (1974), Nonlinear programming using minimax techniques, 

JOTA 13, 607-619 

Bandler, J.W., P.A. MacDonald (1969), Optimization of microwave networks by razor 

search, IEEE Trans. MTT-17, 552-562 

Banzhaf, W., M. Schmutz (1992), Some notes on competition among cell assemblies, 

Int'l J. Neural Syst. 2, 303-313 

Bard, Y. (1968), On a numerical instability ofDavidon-like methods, Math. Comp. 22, 

665-666 

Bard, Y. (1970), Comparison of gradient methods for the solution of nonlinear parameter 

estimation problems, SIAM J. Numer. Anal. 7, 157-186 

Barnes, G.H., R.M. Brown, M. Kato, D.J. Kuck, D.L. Slotnick, R.A. Stokes (1968), The 

Illiac IV computer, IEEE Trans. C-17, 746-770 

Barnes, J.G.P. (1965), An algorithm for solving non-linear equations based on the secant 

method, Comp. J. 8, 66-72 

Barnes, J.L. (1965), Adaptive control as the basis of life and learning systems, Proceedings 

of the IFAC Tokyo Symposium on Systems Engineering Control and Systems 

Design, Tokyo, Japan, Aug. 1965, pp. 187-191 

Barr, D.R., N.L. Slezak (1972), A comparison of multivariate normal generators, CACM 

15, 1048-1049 

Bass, R. (1972), A rank two algorithm for unconstrained minimization, Math. Comp. 

26, 129-143 

Bauer, F.L. (1965), Elimination with weighted row combinations for solving linear equations 

and least squares problems, Numer. Math. 7, 338-352 

Bauer, W.F. (1958), The Monte Carlo method, SIAM J. 6, 438-451 

Beale, E.M.L. (1956), On quadratic programming, Nav. Res. Log. Quart. 6, 227-243 

Beale, E.M.L. (1958), On an iterative method for nding a local minimum of a function 

of more than one variable, Princeton University, Statistical Techniques Research 

Group, technical report 25, Princeton NJ, Nov. 1958 

Beale, E.M.L. (1967), Numerical methods, in: Abadie (1967), pp. 133-205 

Beale, E.M.L. (1970), Computational methods for least squares, in: Abadie (1970), pp. 

213-228


Beale, E.M.L. (1972), A derivation of conjugate gradients, in: Lootsma (1972a), pp. 

39-43 

Beamer, J.H., D.J. Wilde (1969), An upper bound on the number of measurements 

required by the contour tangent optimization technique, IEEE Trans. SSC-5, 27- 

30 

Beamer, J.H., D.J. Wilde (1970), Minimax optimization of unimodal functions by variable 

block search, Mgmt. Sci. 16, 529-541 

Beamer, J.H., D.J. Wilde (1973), A minimax search plan for constrained optimization 

problems, JOTA 12, 439-446 

Beckman, F.S. (1967), Die Losung linearer Gleichungssysteme nach der Methode der 

konjugierten Gradienten, in: Ralston and Wilf (1967), pp. 106-126 

Beckmann, M. (Ed.) (1971), Unternehmensforschung heute, Springer, Berlin 

Beier, W., K. Gla (1968), Bionik|eine Wissenschaft der Zukunft, Urania, Leipzig, 

Germany 

Bekey, G.A., M.H. Gran, A.E. Sabro , A. Wong (1966), Parameter optimization by 

random search using hybrid computer techniques, AFIPS Conf. Proc. 29, 191-200 

Bekey, G.A., W.J. Karplus (1971), Hybrid-Systeme, Berliner Union und Kohlhammer, 

Stuttgart 

Bekey, G.A., R.B. McGhee (1964), Gradient methods for the optimization of dynamic 

system parameters by hybrid computation, in: Balakrishnan and Neustadt (1964), 

pp. 305-327 

Belew, R.K., L.B. Booker (Eds.) (1991), Proceedings of the 4th International Conference 

on Genetic Algorithms, University of California, San Diego CA, July 13-16, 1991, 

Morgan Kaufmann, San Mateo CA 

Bell, D.E., R.E. Keeney, H. Rai a (Eds.) (1977), Con icting objectives in decisions, 

vol. 1 of Wiley IIASA International Series on Applied Systems Analysis, Wiley, 

Chichester 

Bell, M., M.C. Pike (1966), Remark on algorithm 178 (E4)|direct search, CACM 9, 

684-685 

Bellman, R.E. (1967), Dynamische Programmierung und selbstanpassende Regelprozesse, 

Oldenbourg, Munich 

Beltrami, E.J., J.P. Indusi (1972), An adaptive random search algorithm for constrained 

minimization, IEEE Trans. C-21, 1004-1008


Bendin, F. (1992), Ein Praktikum zu Verfahren zur Losung zentraler und dezentraler Optimierungsprobleme 

und Untersuchungen hierarchisch zerlegter Optimierungsaufgaben 

mit Hilfe von Parallelrechnern, Dr.-Ing. Diss., Technical University of Ilmenau, 

Germany, Faculty ofTechnical Sciences, Sept. 1992 

Berg, R.L., N.W. Timofejew-Ressowski (1964), Uber Wege der Evolution des Genotyps, 

in: Ljapunov, Kammerer, and Thiele (1964b), pp. 201-221 

Bergmann, H.W. (1989), Optimization|methods and applications, possibilities and limitations, 

vol. 47 of Lecture Notes in Engineering, Springer, Berlin 

Berlin, V.G. (1969), Acceleration of stochastic approximations by a mixed search method, 

ARC 30, 125-129 

Berlin, V.G. (1972), Parallel randomized search strategies, ARC 33, 398-403 

Berman, G. (1966), Minimization by successive approximation, SIAM J. Numer. Anal. 

3, 123-133 

Berman, G. (1969), Lattice approximations to the minima of functions of several variables, 

JACM 16, 286-294 

Bernard, J.W., F.J. Sonderquist (1959), Progress report on OPCON|Dow evaluates 

optimizing control, Contr. Engng. 6(11), 124-128 

Bertram, J.E. (1960), Control by stochastic adjustment, AIEE Trans. II Appl. Ind. 78, 

485-491 

Beveridge, G.S.G., R.S. Schechter (1970), Optimization|theory and practice, McGraw- 

Hill, New York 

Beyer, H.-G. (1989), Ein Evolutionsverfahren zur mathematischen Modellierung stationarer 

Zustande in dynamischen Systemen, Dr. rer. nat. Diss., University of 

Architecture and Civil Engineering, Weimar, Germany, June 1989 

Beyer, H.-G. (1990), Simulation of steady states in dissipative systems by Darwin's 

paradigm of evolution, J. of Non-Equilibrium Thermodynamics 15, 45-58 

Beyer, H.-G. (1992), Some aspects of the èvolution strategy' for solving TSP-like optimization 

problems, in: Manner and Manderick (1992), pp. 361-370 

Beyer, H.-G. (1993), Toward a theory of evolution strategies|some asymptotical results 

from the (1 + ) - theory, Evolutionary Computation 1, 165-188 

Beyer, H.-G. (1994a), Towards a theory of èvolution strategies'|results for (1 + )strategies 

on (nearly) arbitrary tness functions, in: Davidor, Schwefel, and Manner 

(1994), pp. 58-67


Beyer, H.-G. (1994b), Towards a theory of èvolution strategies'|results from the Ndependent 

( ) and the multi-recombinant ( = )theory, technical report SYS- 

5/94, Systems Analysis Research Group, University of Dortmund, Department of 

Computer Science, Oct. 1994 

Biggs, M.C. (1971), Minimization algorithms making use of non-quadratic properties of 

the objective function, JIMA 8, 315-327 (errata in 9 (1972)) 

Biggs, M.C. (1973), A note on minimization algorithms which make use of non-quadratic 

properties of the objective function, JIMA 12, 337-338 

Birkho , G., S. MacLane (1965), A survey of modern algebra, 3rd ed., Macmillan, New 

York 

Blakemore, J.W., S.H. Davis, Jr. (Eds.) (1964), Optimization techniques, AIChE Chemical 

Engineering Progress Symposium Series 60, no.50 

Bledsoe, W.W. (1961), A basic limitation on the speed of digital computers, IRE Trans. 

EC-10, 530 

Blum, J.R. (1954a), Approximation methods which converge with probability one, Ann. 

Math. Stat. 25, 382-386 

Blum, J.R. (1954b), Multidimensional stochastic approximation methods, Ann. Math. 

Stat. 25, 737-744 

Boas, A.H. (1962), What optimization is all about, Chem. Engng. 69(25), 147-152 

Boas, A.H. (1963a), How to use Lagrange multipliers, Chem. Engng. 70(1), 95-98 

Boas, A.H. (1963b), How search methods locate optimum in univariable problems, Chem. 

Engng. 70(3), 105-108 

Boas, A.H. (1963c), Optimizing multivariable functions, Chem. Engng. 70(5), 97-104 

Boas, A.H. (1963d), Optimization via linear and dynamic programming, Chem. Engng. 

70(7), 85-88 

Bocharov, I.N., A.A. Feldbaum (1962), An automatic optimizer for the search for the 

smallest of several minima|a global optimizer, ARC 23, 260-270 

Bohling, K.H., P.P. Spies (Eds.) (1979), 9th GI-Jahrestagung, Bonn, Oct. 1979, 

Springer, Berlin 

Boltjanski, W.G. (1972), Mathematische Methoden der optimalen Steuerung, Hanser, 

Munich 

Booth, A.D. (1949), An application of the method of steepest descents to the solution 

of systems of non-linear simultaneous equations, Quart. J. Mech. Appl. Math. 2, 

460-468


Booth, A.D. (1955), Numerical methods, Butterworths, London 

Booth, R.S. (1967), Location of zeros of derivatives, SIAM J. Appl. Math. 15, 1496-1501 

Boothroyd, J. (1965), Certi cation of algorithm 2|Fibonacci search, Comp. Bull. 9, 

105, 108 

Born, J. (1978), Evolutionsstrategien zur numerischen Losung von Adaptationsaufgaben, 

Dr. rer. nat. Diss., Humboldt University at Berlin 

Box, G.E.P. (1957), Evolutionary operation|a method for increasing industrial productivity, 

Appl. Stat. 6, 81-101 

Box, G.E.P., D.W. Behnken (1960), Simplex-sum designs|a class of second order rotatable 

designs derivable from those of rst order, Ann. Math. Stat. 31, 838-864 

Box, G.E.P., N.R. Draper (1969), Evolutionary operation|a statistical method for process 

improvement, Wiley, New York 

Box, G.E.P., N.R. Draper (1987), Empirical model-building and response surfaces, Wiley, 

New York 

Box, G.E.P., J.S. Hunter (1957), Multi-factor experimental designs for exploring response 

surfaces, Ann. Math. Stat. 28, 195-241 

Box, G.E.P., M.E. Muller (1958), A note on the generation of random normal deviates, 

Ann. Math. Stat. 29, 610-611 

Box, G.E.P., K.B. Wilson (1951), On the experimental attainment ofoptimum conditions, 

J. of the Royal Statistical Society B, Methodological 8, 1-45 

Box, M.J. (1965), A new method of constrained optimization and a comparison with 

other methods, Comp. J. 8, 42-52 

Box, M.J. (1966), A comparison of several current optimization methods and the use of 

transformations in constrained problems, Comp. J. 9, 67-77 

Box, M.J., D. Davies, W.H. Swann (1969), Nonlinear optimization techniques, ICI Monograph 

5, Oliver Boyd, Edinburgh 

Bracken, J., G.P. McCormick (1970), Ausgewahlte Anwendungen nicht-linearer Programmierung, 

Berliner Union und Kohlhammer, Stuttgart 

Brajnes, S.N., V.B. Svecinskij (1971), Probleme der Neurokybernetik und Neurobionik, 

2nd ed., G. Fischer, Stuttgart 

Brandl, V. (1969), Ein wirksames Monte-Carlo-Schatzverfahren zur simultanen Behandlung 

hinreichend eng verwandter Probleme angewandt auf Fragen der Neutronenphysik, 

Tagungsbericht der Reaktortagung des Deutschen Atomforums, Frankfort/ 

Main, April 1969, Sektion 1, pp. 6-7


Branin, F.H., Jr., S.K. Hoo (1972), A method for nding multiple extrema of a function 

of n variables, in: Lootsma (1972a), pp. 231-237 

Brazdil, P.B. (Ed.) (1993), Machine learning|ECML '93, vol. 667 of Lecture Notes in 

Arti cial Intelligence, Springer, Berlin 

Brebbia, C.A., S. Hernandez (Eds.) (1989), Computer aided optimum design of structures|applications, 

Proceedings of the 1st International Conference, Southampton 

UK, June 1989, Springer, Berlin 

Bremermann, H.J. (1962), Optimization through evolution and recombination, in: Yovits, 

Jacobi, and Goldstein (1962), pp. 93-106 

Bremermann, H.J. (1963), Limits of genetic control, IEEE Trans. MIL-7, 200-205 

Bremermann, H.J. (1967), Quantitative aspects of goal-seeking self-organizing systems, 

in: Snell (1967), pp. 59-77 

Bremermann, H.J. (1968a), Numerical optimization procedures derived from biological 

evolution processes, in: Oestreicher and Moore (1968), pp. 597-616 

Bremermann, H.J. (1968b), Principles of natural and arti cial intelligence, AGARD 

report AD-684-952, Sept. 1968, pp. 6c1-6c2 

Bremermann, H.J. (1968c), Pattern recognition, functionals, and entropy, IEEETrans. 

BME-15, 201-207 

Bremermann, H.J. (1970), A method of unconstrained global optimization, Math. Biosci. 

9, 1-15 

Bremermann, H.J. (1971), What mathematics can and cannot do for pattern recognition, 

in: Grusser and Klinke (1971), pp. 31-45 

Bremermann, H.J. (1973a), On the dynamics and trajectories of evolution processes, in: 

Locker (1973), pp. 29-37 

Bremermann, H.J. (1973b), Algorithms and complexityofevolution and self-organization, 

Kybernetik-Kongre der Deutschen Gesellschaft fur Kybernetik und der Nachrichtentechnischen 

Gesellschaft im VDE, Nuremberg, Germany, March1973 

Bremermann, H.J., L.S.-B. Lam (1970), Analysis of spectra with non-linear superposition, 

Math. Biosci. 8, 449-460 

Bremermann, H.J., M. Rogson, S. Sala (1965), Search byevolution, in: Max eld, 

Callahan, and Fogel (1965), pp. 157-167 

Bremermann, H.J., M. Rogson, S. Sala (1966), Global properties of evolution processes, 

in: Pattee et al. (1966), pp. 3-41


Brent, R.P. (1971), An algorithm with guaranteed convergence for nding a zero of a 

function, Comp. J. 14, 422-425 

Brent, R.P. (1973), Algorithms for minimization without derivatives, Prentice-Hall, Englewood 

Cli s NJ 

Bromberg, N.S. (1962), Maximization and minimization of complicated multivariable 

functions, AIEE Trans. I Comm. Electron. 80, 725-730 

Brooks, S.H. (1958), A discussion of random methods for seeking maxima, Oper. Res. 

6, 244-251 

Brooks, S.H. (1959), A comparison of maximum-seeking methods, Oper. Res. 7, 430-457 

Brooks, S.H., M.R. Mickey (1961), Optimum estimation of gradient direction in steepest 

ascent experiments, Biometrics 17, 48-56 

Brown, K.M. (1969), A quadratically convergent Newton-like method based upon Gaussian 

elimination, SIAM J. Numer. Anal. 6, 560-569 

Brown, K.M., J.E. Dennis, Jr. (1968), On Newton-like iteration functions|general 

convergence theorems and a speci c algorithm, Numer. Math. 12, 186-191 

Brown, K.M., J.E. Dennis, Jr. (1972), Derivative free analogues of the Levenberg- 

Marquardt and Gauss algorithms for non-linear least squares approximation, 

Numer. Math. 18, 289-297 

Brown, R.R. (1959), A generalized computer procedure for the design of optimum systems, 

AIEE Trans. I Comm. Electron. 78, 285-293 

Broyden, C.G. (1965), A class of methods for solving nonlinear simultaneous equations, 

Math. Comp. 19, 577-593 

Broyden, C.G. (1967), Quasi-Newton methods and their application to function minimisation, 

Math. Comp. 21, 368-381 

Broyden, C.G. (1969), A new method of solving nonlinear simultaneous equations, Comp. 

J. 12, 94-99 

Broyden, C.G. (1970a), The convergence of single-rank quasi-Newton methods, Math. 

Comp. 24, 365-382 

Broyden, C.G. (1970b), The convergence of a class of double-rank minimization algorithms, 

part 1|general considerations, JIMA 6, 76-90 

Broyden, C.G. (1970c), The convergence of a class of double-rank minimization algorithms, 

part 2|the new algorithm, JIMA 6, 222-231 

Broyden, C.G. (1971), The convergence of an algorithm for solving sparse non-linear 

systems, Math. Comp. 25, 285-294


Broyden, C.G. (1972), Quasi-Newton methods, in: Murray (1972a), pp. 87-106 

Broyden, C.G. (1973), Some condition-number bounds for the Gaussian elimination process, 

JIMA 12, 273-286 

Broyden, C.G., J.E. Dennis, Jr., J.J. More (1973), On the local and superlinear convergence 

of quasi-Newton methods, JIMA 12, 223-245 

Broyden, C.G., M.P. Johnson (1972), A class of rank-l optimization algorithms, in: 

Lootsma (1972a), pp. 35-38 

Brudermann, U. (1977), Entwicklung und Anpassung eines vollstandigen Ansteuersystems 

fur fremdenergetisch angetriebene Ganzarmprothesen, Fortschrittberichte der 

VDI-Zeitschriften, vol. 17 (Biotechnik), no. 6, Dec. 1977 

Bryson, A.E., Jr., Y.C. Ho (1969), Applied optimal control, Blaisdell, Waltham MA 

Budne, T.A. (1959), The application of random balance designs, Technometrics 1, 139- 

155 

Buehler, R.J., B.V. Shah, O. Kempthorne (1961), Some properties of steepest ascent and 

related procedures for nding optimum conditions, Iowa State University, Statistical 

Laboratory, technical report 1, Ames IA, April 1961 

Buehler, R.J., B.V. Shah, O. Kempthorne (1964), Methods of parallel tangents, in: 

Blakemore and Davis (1964), pp. 1-7 

Burkard, R.E. (1972), Methoden der Ganzzahligen Optimierung, Springer, Vienna 

Campbell, D.T. (1960), Blind variation and selective survival as a general strategy in 

knowledge-processes, in: Yovits and Cameron (1960), pp. 205-231 

Campos Pinto, I. (1989), Wissensbasierte Unterstutzung bei der Losung von Optimierungsaufgaben, 

Dr. rer. nat. Diss., University of Dortmund, Department of 

Computer Science, June 1989 

Campos, I., E. Peters, H.-P. Schwefel (1989), Zwei Beitrage zum wissensbasierten Einsatz 

von Optimumsuchverfahren, technical report 311 (green series), University of 

Dortmund, Department of Computer Science 

Campos, I., H.-P. Schwefel (1989), KBOPT|a knowledge based optimisation system, 

in: Brebbia and Hernandez (1989), pp. 211-221 

Canon, M.D., C.D. Cullum, Jr., E. Polak (1970), Theory of optimal control and mathematical 

programming, McGraw-Hill, New York 

Cantrell, J.W. (1969), Relation between the memory gradient method and the Fletcher- 

Reeves method, JOTA 4, 67-71


Carroll, C.W. (1961), The created response surface technique for optimizing nonlinear, 

restrained systems, Oper. Res. 9, 169-185 

Casey, J.K., R.C. Rustay (1966), AID|a general purpose computer program for optimization, 

in: Lavi and Vogl (1966), pp. 81-100 

Casti, J., M. Richardson, R. Larson (1973), Dynamic programming and parallel computers, 

JOTA 12, 423-438 

Cauchy, A. (1847), Methode generale pour la resolution des systemes d'equations simultanees, 

Compt. Rend. Acad. Sci. URSS (USSR), New Ser. 25, 536-538 

Cea, J. (1971), Optimisation|theorie et algorithmes, Dunod, Paris 

Cerny, V. (1985), Thermodynamical approach to the traveling salesman problem|an 

e cient simulation algorithm, JOTA 45, 41-51 

Cembrowicz, R.G., G.E. Krauter (1977), Optimization of urban and regional water supply 

systems, Proceedings of the Conference on Systems Approach for Development, 

Cairo, Egypt, Nov. 1977 

Chang, S.S.L. (1961), Synthesis of optimum control systems, McGraw-Hill, New York 

Chang, S.S.L. (1968), Stochastic peak tracking and the Kalman lter, IEEE Trans. AC- 

13, 750 

Chatterji, B.N., B. Chatterjee (1971), Performance optimization of a self-organizing 

feedback control system in presence of inherent coupling signals, Automatica 7, 

599-605 

Chazan, D., W.L. Miranker (1970), A nongradient and parallel algorithm for unconstrained 

minimization, SIAM J. Contr. 8, 207-217 

Checkland, P., I. Kiss (Eds.) (1987), Problems of constancy and change|the complementarity 

of systems approaches to complexity, papers presented at the 31st Annual 

Meeting of the International Society for General System Research, Budapest, Hungary, 

June 1-5, International Society for General System Research 

Cheng, W.-M. (Ed.) (1988), Proceedings of the International Conference on Systems 

Science and Engineering (ICSSE '88), Beijing, July 25-28, 1988, International Academic 

Publishers/Pergamon Press, Oxford UK 

Chichinadze, V.K. (1960), Logical design problems of self-optimizing and learning-optimizing 

control systems based on random searching, Proceedings of the 1st IFAC 

Congress, Moscow, June-July 1960, vol. 2, pp. 653-657 

Chichinadze, V.K. (1967), Random search to determine the extremum of the functions 

of several variables, Engng. Cybern. 5(1), 115-123


Chichinadze, V.K. (1969), The Psi-transform for solving linear and non-linear programming 

problems, Automatica 5, 347-356 

Cizek, F., D. Hodanova (1971), Evolution als Selbstregulation, G. Fischer, Jena, Germany 

Clayton, D.G. (1971), Algorithm AS-46|Gram-Schmidt orthogonalization, Appl. Stat. 

20, 335-338 

Clegg, J.C. (1970), Variationsrechnung, Teubner, Stuttgart 

Cochran, W.G., G.M. Cox (1950), Experimental designs, Wiley, NewYork 

Cockrell, L.D. (1969), A comparison of several random search techniques for multimodal 

surfaces, Proceedings of the National Electronics Conference, Chicago IL, Dec. 1969, 

pp. 18-23 

Cockrell, L.D. (1970), On search techniques in adaptive systems, Ph.D. thesis, Purdue 

University, Lafayette IN, June 1970 

Cohen, A.I. (1972), Rate of convergence of several conjugate gradient algorithms, SIAM 

J. Numer. Anal. 9, 248-259 

Cohn, D.L. (1954), Optimal systems I|the vascular system, Bull. Math. Biophys. 16, 

59-74 

Collatz, L., W. Wetterling (1971), Optimierungsaufgaben, 2nd ed., Springer, Berlin 

Colville, A.R. (1968), A comparative study on nonlinear programming codes, IBM New 

York Science Center, report 320-2949, June 1968 

Colville, A.R. (1970), A comparative study of nonlinear programming codes, in: Kuhn 

(1970), pp. 487-501 

Conrad, M. (1988), Prolegomena to evolutionary programming, in: Kochen and Hastings 

(1988), pp. 150-168 

Converse, A.O. (1970), Optimization, Holt, Rinehart, Winston, New York 

Cooper, L. (Ed.) (1962), Applied mathematics in chemical engineering, AIChE Engineering 

Progress Symposium Series 58, no. 37 

Cooper, L., D. Steinberg (1970), Introduction to methods of optimization, W.B. Saunders, 

Philadelphia 

Cornick, D.E., A.N. Michel (1972), Numerical optimization of distributed parameter 

systems by the conjugate gradient method, IEEE Trans. AC-17, 358-362 

Courant, R. (1943), Variational methods for the solution of problems of equilibrium and 

vibrations, Bull. Amer. Math. Soc. 49, 1-23


Courant, R., D. Hilbert (1968a), Methoden der mathematischen Physik, 3rd ed., vol. 1, 


Courant, R., D. Hilbert (1968b), Methoden der mathematischen Physik, 2nd ed., vol. 2, 


Cowdrey, D.R., C.M. Reeves (1963), An application of the Monte Carlo method to the 

evaluation of some molecular integrals, Comp. J. 6, 277-286 

Cox, D.R. (1958), Planning of experiments, Wiley, New York 

Cragg, E.E., A.V. Levy (1969), Study on a supermemory gradient method for the minimization 

of functions, JOTA 4, 191-205 

Crippen, G.M., H.A. Scheraga (1971), Minimization of polypeptide energy, X|a global 

search algorithm, Arch. Biochem. Biophys. 144, 453-461 

Crockett, J.B., H. Cherno (1955), Gradient methods of maximization, Pacif. J. Math. 

5, 33-50 

Crowder, H., P. Wolfe (1972), Linear convergence to the conjugate gradient method, 

IBM T.J. Watson Research Center, report RC-3330, Yorktown Heights NY, May 

1972 

Cryer, C.W. (1971), The solution of a quadratic programming problem using systematic 

overrelaxation, SIAM J. Contr. 9, 385-392 

Cullum, J. (1972), An algorithm for minimizing a di erentiable function that uses only 

function values, in: Balakrishnan (1972), pp. 117-127 

Curry, H.B. (1944), The method of steepest descent for non-linear minimization problems, 

Quart. Appl. Math. 2, 258-261 

Curtis, A.R., J.K. Reid (1974), The choice of step lengths when using di erences to 

approximate Jacobian matrices, JIMA 13, 121-126 

Curtiss, J.H. (1956), A theoretical comparison of the e ciencies of two classical methods 

and a Monte Carlo method for computing one component of the solution of a set of 

linear algebraic equations, in: Meyer (1956), pp. 191-233 

Dambrauskas, A.P. (1970), The simplex optimization method with variable step, Engng. 

Cybern. 8, 28-36 

Dambrauskas, A.P. (1972), Investigation of the e ciency of the simplex method of optimization 

with variable step in a noise situation, Engng. Cybern. 10, 590-599 

Daniel, J.W. (1967a), The conjugate gradient method for linear and nonlinear operator 

equations, SIAM J. Numer. Anal. 4, 10-26


Daniel, J.W. (1967b), Convergence of the conjugate gradient method with computationally 

convenient modi cations, Numer. Math. 10, 125-131 

Daniel, J.W. (1969), On the approximate minimization of functionals, Math. Comp. 23, 

573-581 

Daniel, J.W. (1970), A correction concerning the convergence rate for the conjugate 

gradient method, SIAM J. Numer. Anal. 7, 277-280 

Daniel, J.W. (1971), The approximate minimization of functionals, Prentice-Hall, Englewood 

Cli s NJ 

Daniel, J.W. (1973), Global convergence for Newton methods in mathematical programming, 

JOTA 12, 233-241 

Dantzig, G.B. (1966), Lineare Programmierung und Erweiterungen, Springer, Berlin 

Darwin, C. (1859), Die Entstehung der Arten durch naturliche Zuchtwahl, translation 

from \The origin of species by means of natural selection", Reclam, Stuttgart, 1974 

Darwin, C. (1874), Die Abstammung des Menschen, translation of the 2nd rev. ed. of 

\The descent of man", Kroner, Stuttgart, 1966 

Davidon, W.C. (1959), Variable metric method for minimization, Argonne National 

Laboratory, report ANL-5990 rev., Lemont IL,Nov. 1959 

Davidon, W.C. (1968), Variance algorithm for minimization, Comp. J. 10, 406-410 

Davidon, W.C. (1969), Variance algorithm for minimization, in: Fletcher (1969a), pp. 

13-20 

Davidor, Y. (1990), Genetic algorithms and robotics, a heuristic strategy for optimization, 

World Scienti c, Singapore 

Davidor, Y., H.-P.Schwefel (1992), An introduction to adaptive optimization algorithms 

based on principles of natural evolution, in: Soucek (1992), pp. 183-202 

Davidor, Y., H.-P. Schwefel, R. Manner (Eds.) (1994), Parallel problem solving from 

nature 3, Proceedings of the 3rd PPSN Conference, Jerusalem, Oct. 9-14, 1994, vol. 

866 of Lecture Notes in Computer Science, Springer, Berlin 

Davies, D. (1968), The use of Davidon's method in nonlinear programming, ICI Management 

Service report MSDH-68-110, Middlesborough, Yorks, Aug. 1968 

Davies, D. (1970), Some practical methods of optimization, in: Abadie (1970), pp. 87- 

118 

Davies, D., W.H. Swann (1969), Review of constrained optimization, in: Fletcher (1969a), 

pp. 187-202


Davies, M., I.J. Whitting (1972), A modi ed form of Levenberg's correction, in: Lootsma 

(1972a), pp. 191-201 

Davies, O.L. (Ed.) (1954), The design and analysis of industrial experiments, Oliver 

Boyd, London 

Davis, L. (Ed.) (1987), Genetic algorithms and simulated annealing, Pitman, London, 

1987 

Davis, L. (Ed.) (1991), Handbook of genetic algorithms, Van Nostrand Reinhold, New 

York 

Davis, R.H., P.D. Roberts (1968), Method of conjugate gradients applied to self-adaptive 

digital control systems, IEE Proceedings 115, 562-571 

DeGraag, D.P. (1970), Parameter optimization techniques for hybrid computers, Proceedings 

of the VIth International Analogue Computation Meeting, Munich, Aug.- 

Sept. 1970, pp. 136-139 

Dejon, B., P. Henrici (Eds.) (1969), Constructive aspects of the fundamental theorem 

of algebra, Wiley-Interscience, London 

De Jong, K. (1975), An analysis of the behavior of a class of genetic adaptive systems, 

Ph.D. thesis, University ofMichigan, Ann Arbor MI 

De Jong, K. (Ed.) (1993), Evolutionary computation (journal), MIT Press, Cambridge 

MA 

De Jong, K., W. Spears (1993), On the state of evolutionary computation, in: Forrest 

(1993), pp. 618-623 

Dekker, L., G. Savastano, G.C. Vansteenkiste (Eds.) (1980), Simulation of systems '79, 

Proceedings of the 9th IMACS Congress, Sorento, Italy, North-Holland, Amsterdam 

Dekker, T.J. (1969), Finding a zero by means of successive linear interpolation, in: Dejon 

and Henrici (1969), pp. 37-48 

Demyanov, V.F., A.M. Rubinov (1970), Approximate methods in optimization problems, 

Elsevier, New York 

Denn, M.M. (1969), Optimization by variational methods, McGraw-Hill, New York 

Dennis, J.E., Jr. (1970), On the convergence of Newton-like methods, in: Rabinowitz 

(1970), pp. 163-181 

Dennis, J.E., Jr. (1971), On the convergence of Broyden's method for nonlinear systems 

of equations, Math. Comp. 25, 559-567 

Dennis, J.E., Jr. (1972), On some methods based on Broyden's secant approximation to 

the Hessian, in: Lootsma (1972a), pp. 19-34


D'Esopo, D.A. (1956), A convex programming procedure, Nav. Res. Log. Quart. 6, 

33-42 

DeVogelaere, R. (1968), Remark on algorithm 178 (E4)|direct search, CACM 11, 498 

Dickinson, A.W. (1964), Nonlinear optimization|some procedures and examples, Proceedings 

of the XIXth ACM National Conference, Philadelphia, Aug. 1964, paper 

E1.2 

Dijkhuis, B. (1971), An adaptive algorithm for minimizing a unimodal function of one 

variable, ZAMM 51(Sonderheft), T45-T46 

Dinkelbach, W. (1969), Sensitivitatsanalysen und parametrische Programmierung, Springer, 

Berlin 

Dixon, L.C.W. (1972a), Nonlinear optimization, English University Press, London 

Dixon, L.C.W. (1972b), The choice of step length, a crucial factor in the performance of 

variable metric algorithms, in: Lootsma (1972a), pp. 149-170 

Dixon, L.C.W. (1972c), Variable metric algorithms|necessary and su cient conditions 

for identical behavior of nonquadratic functions, JOTA 10, 34-40 

Dixon, L.C.W. (1973), Conjugate directions without linear searches, JOTA 11, 317-328 

Dixon, L.C.W., M.C. Biggs (1972), The advantages of adjoint-control transformations 

when determining optimal trajectories by Pontryagin's Maximum Principle, Aeronautical 

J. 76, 169-174 

Dobzhansky, T. (1965), Dynamik der menschlichen Evolution|Gene und Umwelt, S. 

Fischer, Frankfort/Main 

Dowell, M., P. Jarratt (1972), The Pegasus method for computing the root of an equation, 

BIT 12, 503-508 

Drenick, R.F. (1967), Die Optimierung linearer Regelsysteme, Oldenbourg, Munich 

Drepper, F.R., R. Heckler, H.-P. Schwefel (1979), Ein integriertes System von Schatzverfahren, 

Simulations- und Optimierungstechnik zur rechnergestutzten Langfristplanung, 

in: Bohling and Spies (1979), pp. 115-129 

Dueck, G. (1993), New optimization heuristics, the great deluge algorithm and the 

record-to-record-travel, J. Computational Physics 104, 86-92 

Dueck, G., T. Scheuer (1990), Threshold accepting|a general purpose optimization 

algorithm appearing superior to simulated annealing, J. Computational Physics 90, 

161-175 

Du n, R.J., E.L. Peterson, C. Zener (1967), Geometric programming|theory and application, 

Wiley, New York


Dvoretzky, A. (1956), On stochastic approximation, in: Neyman (1956), pp. 39-56 

Ebeling, W. (1992), The optimization of a class of functionals based on developmental 

strategies, in: Manner and Manderick (1992), pp. 463-468 

Edelbaum, T.N. (1962), Theory of maxima and minima, in: Leitmann (1962), pp. 1-32 

Edelman, G.B. (1987), Neural Darwinism|the theory of group selection, Basic Books, 

New York 

Eigen, M. (1971), Self-organization of matter and the evolution of biological macromolecules, 

Naturwissenschaften 58, 465-523 

Eisenberg, M.A., M.R. McGuire (1972), Further comments on Dijkstra's concurrent 

programming control problem, CACM 15, 999 

Eisenhart, C., M.W. Hastay, W.A. Wallis (Eds.) (1947), Selected techniques of statistical 

analysis for scienti c and industrial research and production and management 

engineering, McGraw-Hill, New York 

Elkin, R.M. (1968), Convergence theorems for Gauss-Seidel and other minimization algorithms, 

University of Maryland, Computer Science Center, technical report 68-59, 

College Park MD, Jan. 1968 

Elliott, D.F., D.D. Sworder (1969a), A variable metric technique for parameter optimization, 

Automatica 5, 811-816 

Elliott, D.F., D.D. Sworder (1969b), Design of suboptimal adaptive regulator systems 

via stochastic approximation, Proceedings of the National Electronics Conference, 

Chicago IL, Dec. 1969, pp. 29-33 

Elliott, D.F., D.D. Sworder (1970), Applications of a simpli ed multidimensional stochastic 

approximation algorithm, IEEE Trans. AC-15, 101-104 

Elliott, D.G. (Ed.) (1970), Proceedings of the 11th Symposium on Engineering Aspects 

of Magnetohydrodynamics, Caltech, March 24-26, 1970, California Institute 

of Technology, Pasadena CA 

Emery, F.E., M. O'Hagan (1966), Optimal design of matching networks for microwave 

transistor ampli ers, IEEE Trans. MTT-14, 696-698 

Engelhardt, M. (1973), On upper bounds for variances in stochastic approximation, 

SIAM J. Appl. Math. 24, 145-151 

Engeli, M., T. Ginsburg, H. Rutishauser, E. Stiefel (1959), Re ned iterative methods 

for computation of the solution and the eigen-values of self-adjoint boundary value 

problems, Mitteilungen des Instituts fur Angewandte Mathematik, Technical University 

(ETH) of Zurich, Switzerland, Birkhauser, Basle, Switzerland


Erlicki, M.S., J. Appelbaum (1970), Solution of practical optimization problems, IEEE 

Trans. SSC-6, 49-52 

Ermakov, S. (Ed.) (1992), Int'l J. on Stochastic Optimization and Design, Nova Science, 

New York 

Ermoliev, Yu. (1970), Random optimization and stochastic programming, in: Moiseev 

(1970), pp. 104-115 

Ermoliev, Yu., R.J.-B. Wets (1988), Numerical techniques for stochastic optimization, 


Faber, M.M. (1970), Stochastisches Programmieren, Physica-Verlag, Wurzburg, Germany 

Fabian, V. (1967), Stochastic approximation of minimawithimproved asymptotic speed, 

Ann. Math. Stat. 38, 191-200 

Fabian, V. (1968), On the choice of design in stochastic approximation methods, Ann. 

Math. Stat. 39, 457-465 

Faddejew, D.K., W.N. Faddejewa (1973), Numerische Methoden der linearen Algebra, 

3rd ed., Oldenbourg, Munich 

Falkenhausen, K. von (1980), Optimierung regionaler Entsorgungssysteme mit der Evolutionsstrategie, 

Proceedings in Operations Research 9,Physica-Verlag, Wurzburg, 

Germany, pp. 46-51 

Favreau, R.F., R. Franks (1958), Random optimization by analogue techniques, Proceedings 

of the IInd Analogue Computation Meeting, Strasbourg, Sept. 1958, pp. 

437-443 

Feigenbaum, E.A., J. Feldman (Eds.) (1963), Computers and thought, McGraw-Hill, 

New York 

Feistel, R., W. Ebeling (1989), Evolution of complex systems, Kluwer, Dordrecht, The 

Netherlands 

Feldbaum, A.A. (1958), Automatic optimalizer, ARC 19, 718-728 

Feldbaum, A.A. (1960), Statistical theory of gradient systems of automatic optimization 

for objects with quadratic characteristics, ARC 21, 111-118 

Feldbaum, A.A. (1962), Rechengerate in automatischen Systemen, Oldenbourg, Munich 

Fend, F.A., C.B. Chandler (1961), Numerical optimization for multi-dimensional problems, 

General Electric, General Engineering Laboratory, report 61-GL-78, March 

1961


Fiacco, A.V. (1974), Convergence properties of local solutions of sequences of mathematical 

programming problems in general spaces, JOTA 13, 1-12 

Fiacco, A.V., G.P. McCormick (1964), The sequential unconstrained minimization technique 

for nonlinear programming|a primal-dual method, Mgmt. Sci. 10, 360-366 

Fiacco, A.V., G.P. McCormick (1968), Nonlinear programming|sequential unconstrained 

minimization techniques, Wiley, New York 

Fiacco, A.V., G.P. McCormick (1990), Nonlinear programming|sequential unconstrained 

minimization techniques, vol. 63 of CBMS-NSF Regional Conference Series 

on Applied Mathematics and vol. 4 of Classics in Applied Mathematics, SIAM, 

Philadelphia 

Fielding, K. (1970), Algorithm 387 (E4)|function minimization and linear search, 

CACM 13, 509-510 

Fisher, R.A. (1966), The design of experiments, 8th ed., Oliver Boyd, Edinburgh 

Fletcher, R. (1965), Function minimization without evaluating derivatives|a review, 

Comp. J. 8, 33-41 

Fletcher, R. (1966), Certi cation of algorithm 251 (E4)|function minimization, CACM 

9, 686-687 

Fletcher, R. (1968), Generalized inverse methods for the best least squares solution of 

systems of non-linear equations, Comp. J. 10, 392-399 

Fletcher, R. (Ed.) (1969a), Optimization, Academic Press, London 

Fletcher, R. (1969b), A review of methods for unconstrained optimization, in: Fletcher 

(1969a), pp. 1-12 

Fletcher, R. (1970a), A class of methods for nonlinear programming with termination 

and convergence properties, in: Abadie (1970), pp. 157-176 

Fletcher, R. (1970b), A new approach tovariable metric algorithms, Comp. J. 13, 

317-322 

Fletcher, R. (1971), A modi ed Marquardt subroutine for non-linear least squares, 

UKAEA Research Group, report AERE-R-6799, Harwell, Oxon 

Fletcher, R. (1972a), Conjugate direction methods, in: Murray (1972a), pp. 73-86 

Fletcher, R. (1972b), A survey of algorithms for unconstrained optimization, in: Murray 

(1972a), pp. 123-129 

Fletcher, R. (1972c), A Fortran subroutine for minimization by the method of conjugate 

gradients, UKAEA Research Group, report AERE-R-7073, Harwell, Oxon


Fletcher, R. (1972d), Fortran subroutines for minimization by quasi-Newton methods, 


Fletcher, R., M.J.D. Powell (1963), A rapidly convergent descent method for minimization, 

Comp. J. 6, 163-168 

Fletcher, R., C.M. Reeves (1964), Function minimization by conjugate gradients, Comp. 

J. 7, 149-154 

Flood, M.M., A. Leon (1964), A generalized direct search code for optimization, University 

ofMichigan, Mental Health Research Institute, preprint 129, Ann Arbor MI, 

June 1964 

Flood, M.M., A. Leon (1966), A universal adaptive code for optimization|GROPE, in: 

Lavi and Vogl (1966), pp. 101-130 

Floudas, C.A., P.M. Pardalos (1990), A collection of test problems for constrained global 

optimization algorithms, vol. 455 of Lecture Notes in Computer Science, Springer, 

Berlin 

Fogarty, L.E., R.M. Howe (1968), Trajectory optimization by a direct descent process, 

Simulation 11, 127-135 

Fogarty, L.E., R.M. Howe (1970), Hybrid computer solution of some optimization problems, 

Proceedings of the VIth International Analogue Computation Meeting, Munich, 

Aug.-Sept. 1970, pp. 127-135 

Fogel, D.B. (1991), System identi cation through simulated evolution, Ginn Press, Needham 

Heights MA 

Fogel, D.B. (1992), Evolving arti cial intelligence, Ph.D. thesis, University of California 

at San Diego 

Fogel, D.B., J.W. Atmar (Eds.) (1992), Proceedings of the 1st Annual Conference on 

Evolutionary Programming, San Diego, Feb. 21-22, 1992, Evolutionary Programming 

Society, La Jolla CA 

Fogel, D.B., J.W. Atmar (Eds.) (1993), Proceedings of the 2nd Annual Conference on 

Evolutionary Programming, San Diego, Feb. 25-26, 1993, Evolutionary Programming 

Society, La Jolla CA 

Fogel, L.J. (1962), Autonomous automata, Ind. Research 4, 14-19 

Fogel, L.J., A.J. Owens, M.J. Walsh (1965), Arti cial intelligence through a simulation 

of evolution, in: Max eld, Callahan, and Fogel (1965), pp. 131-155 

Fogel, L.J., A.J. Owens, M.J. Walsh (1966a), Adaption of evolutionary programming 

to the prediction of solar ares, General Dynamics-Convair, report NASA-CR-417, 

San Diego CA


Fogel, L.J., A.J. Owens, M.J. Walsh (1966b), Arti cial intelligence through simulated 

evolution, Wiley, NewYork 

Forrest, S. (Ed.) (1993), Proceedings of the 5th International Conference on Genetic 

Algorithms, University of Illinois, Urbana-Champaign IL, July 17-21, 1993, Morgan 

Kaufmann, San Mateo CA 

Forsythe, G.E. (1968), On the asymptotic directions of the s-dimensional optimum gradient 

method, Numer. Math. 11, 57-76 

Forsythe, G.E. (1969), Remarks on the paper by Dekker, in: Dejon and Henrici (1969), 

pp. 49-51 

Forsythe, G.E., T.S. Motzkin (1951), Acceleration of the optimum gradient method, 

Bull. Amer. Math. Soc. 57, 304-305 

Fox, R.L. (1971), Optimization methods for engineering design, Addison-Wesley, Reading 

MA 

Frankhauser, P., H.-P. Schwefel (1992), Making use of the Weidlich-Haag model in the 

case of reduced data sets, in: Gritzmann et al. (1992), pp. 320-323 

Frankovic, B., S. Petras, J. Skakala, B. Vykouk (1970), Automatisierung und selbsttatige 

Steuerung, Verlag Technik, Berlin 

Fraser, A.S. (1957), Simulation of genetic systems by automatic digital computers, Australian 

J. Biol. Sci. 10, 484-499 

Friedberg, R.M. (1958), A learning machine I, IBM J. Res. Dev. 2, 2-13 

Friedberg, R.M., B. Dunham, J.H. North (1959), A learning machine II, IBM J. Res. 

Dev. 3, 282-287 

Friedmann, M., L.J. Savage (1947), Planning experiments seeking maxima, in: Eisenhart, 

Hastay, and Wallis (1947), pp. 365-372 

Friedrichs, K.O., O.E. Neugebauer, J.J. Stoker (Eds.) (1948), Studies and essays, 

Courant anniversary volume, Interscience, New York 

Fu, K.S., L.D. Cockrell (1970), On search techniques for multimodal surfaces, IFAC 

Kyoto Symposium on Systems Engineering Approach to Computer Control, Kyoto, 

Japan, Aug. 1970, paper 17.3 

Fu, K.S., Z.J. Nikolic (1966), On some reinforcement techniques and their relation to 

the stochastic approximation, IEEE Trans. AC-11, 756-758 

Furst, H., P.H. Muller, V. Nollau (1968), Eine stochastische Methode zur Ermittlung der 

Maximalstelle einer Funktion von mehreren Veranderlichen mit experimentell ermittelbaren 

Funktionswerten und ihre Anwendung bei chemischen Prozessen, Chemie- 

Technik 20, 400-405


Gaidukov, A.I. (1966), Primeneniye sluchainovo poiska pri optimalnom projektirovanii, 

Prikladnye zadichi tekhnicheskoi kibernetiki (1966), 420-436 

Gal, S. (1971), Sequential minimax search for a maximum when prior information is 

available, SIAM J. Appl. Math. 21, 590-595 

Gal, S. (1972), Multidimensional minimax search for a maximum, SIAM J. Appl. Math. 

23, 513-526 

Galar, R. (1994), Evolutionary simulations and insights into progress, in: Sebald and 

Fogel (1994), pp. 344-352 

Galar, H., H. Kwasnicka, W. Kwasnicki (1980), Simulation of some processes of development, 

in: Dekker, Savastano, and Vansteenkiste (1980), pp. 133-142 

Gar nkel, R.S., G.L. Nemhauser (1972), Integer programming, Wiley, NewYork 

Gar nkel, R.S., G.L. Nemhauser (1973), A survey of integer programming emphasizing 

computation and relations among models, in: Hu and Robinson (1973), pp. 77-155 

Gauss, C.F. (1809), Determinatio orbitae observationibus quotcumque quam proxime 

satisfacientis, Werke, Band 7 (Theoria motus corporum coelestium in sectionibus 

conicis solem ambientium), Liber secundus, Sectio III, pp. 236-257, Hamburgi sumtibus 

Frid. Perthes et I.H. Besser, 1809 reprint: Teubner, Leipzig, Germany, 1906 

Gaviano, M., E. Fagiuoli (1972), Remarks on the comparison between random search 

methods and the gradient method, in: Szego (1972), pp. 337-349 

Gelfand, I.M. M.L. Tsetlin (1961), The principle of nonlocal search in automatic optimization 

systems, Soviet Physics Doklady 6(3), 192-194 

Geo rion, A.M. (Ed.) (1972), Perspectives on optimization, Addison-Wesley, Reading 

MA 

Gerardin, L. (1968), Natur als Vorbild|die Entdeckung der Bionik, Kindler, Munich 

Gersht, A.M., A.I. Kaplinskii (1971), Convergence of the continuous variant ofthe 

Robbins-Monro procedure, ARC 32, 71-75 

Gessner, P., K. Spremann (1972), Optimierung in Funktionenraumen, Springer, Berlin 

Gessner, P., H. Wacker (1972), Dynamische Optimierung|Einfuhrung, Modelle, Computerprogramme, 

Hanser, Munich 

Gilbert, E.G. (1967), A selected bibliography on parameter optimization methods suitable 

for hybrid computation, Simulation 8, 350-352 

Gilbert, P., W.J. Chandler (1972), Interference between communicating parallel processes, 

CACM 15, 427-437


Gill, P.E., W. Murray (1972), Quasi-Newton methods for unconstrained optimization, 

JIMA 9, 91-108 

Ginsburg, T. (1963), The conjugate gradient method, Numer. Math. 5, 191-200 

Girsanov, I.V. (1972), Lectures on mathematical theory of extremum problems, Springer, 

Berlin 

Glass, H., L. Cooper (1965), Sequential search|a method for solving constrained optimization 

problems, JACM 12, 71-82 

Glover, F. (1986), Future paths for integer programming and links to arti cial intelligence, 

Comp. Oper. Res. 13, 533-549 

Glover, F. (1989), Tabu search|part I, ORSA-J. on Computing 1, 190-206 

Glover, F., H.-J. Greenberg (1989), New approaches for heuristic search|a bilateral 

linkage with arti cial intelligence, Europ. J. Oper. Res. 39, 119-130 

Gnedenko, B.W. (1970), Lehrbuch der Wahrscheinlichkeitsrechnung, 6th ed., Akademie- 

Verlag, Berlin 

Goldberg, D.E. (1989), Genetic algorithms in search, optimization, and machine learning, 

Addison-Wesley, Reading MA 

Goldfarb, D. (1969), Su cient conditions for the convergence of a variable metric algorithm, 

in: Fletcher (1969a), pp. 273-282 

Goldfarb, D. (1970), A family of variable-metric methods derived by variational means, 

Math. Comp. 24, 23-26 

Goldfeld, S.M., R.E. Quandt, H.F. Trotter (1966), Maximization by quadratic hillclimbing, 

Econometrica 34, 541-551 

Goldfeld, S.M., R.E. Quandt, H.F. Trotter (1968), Maximization by improved quadratic 

hill-climbing and other methods, Princeton University, Econometric Research Program, 

research memo. RM-95, Princeton NJ, April 1968 

Goldstein, A.A. (1962), Cauchy's method of minimization, Numer. Math. 4, 146-150 

Goldstein, A.A. (1965), On Newton's method, Numer. Math. 7, 391-393 

Goldstein, A.A., J.F. Price (1967), An e ective algorithm for minimization, Numer. 

Math. 10, 184-189 

Goldstein, A.A., J.F. Price (1971), On descent from local minima, Math. Comp. 25, 

569-574 

Golinski, J., Z.K. Lesniak (1966), Optimales Entwerfen von Konstruktionen mit Hilfe 

der Monte-Carlo-Methode, Bautechnik 43, 307-311


Goll, R. (1972), Der Evolutionismus|Analyse eines Grundbegri s neuzeitlichen Denkens, 

Beck, Munich 

Golub, G.H. (1965), Numerical methods for solving linear least squares problems, Numer. 

Math. 7, 206-216 

Golub, G.H., M.A. Saunders (1970), Linear least squares and quadratic programming, 

in: Abadie (1970), pp. 229-256 

Gonzalez, R.S. (1970), An optimization study on a hybrid computer, Ann. Assoc. Int'l 

Calcul Analog. 12, 138-148 

Gorges-Schleuter, M. (1991a), Explicit parallelism of genetic algorithms through population 

structures, in: Schwefel and Manner (1991), pp. 150-159 

Gorges-Schleuter, M. (1991b), Genetic algorithms and population structures|a massively 

parallel algorithm, Dr. rer. nat. Diss., University ofDortmund, Department 

of Computer Science, Jan. 1991 

Gorvits, G.G., O.I. Larichev (1971), Comparison of search methods for the solution of 

nonlinear identi cation problems, ARC 32, 272-280 

Gottfried, B.S., J. Weisman (1973), Introduction to optimization theory, Prentice-Hall, 

Englewood Cli s NJ 

Gould, S.J., N. Eldredge (1977), Punctuated equilibria|the tempo and mode of evolution 

reconsidered, Paleobiology 3, 115-151 

Gould, S.J., N. Eldredge (1993), Punctuated equilibrium comes of age, Nature 366, 

223-227 

Gran, R. (1973), On the convergence of random search algorithms in continuous time 

with applications to adaptive control, IEEE Trans. SMC-3, 62-66 

Grasse, P.P. (1973), Allgemeine Biologie, vol. 5|Evolution, G. Fischer, Stuttgart 

Grassmann, P. (1967), Verfahrenstechnik und Biologie, Chemie Ingenieur Technik 39, 

1217-1226 

Grassmann, P. (1968), Verfahrenstechnik und Medizin, Chemie Ingenieur Technik 40, 

1094-1100 

Grauer, M., A. Lewandowski, A.P. Wierzbicki (Eds.) (1982), Multiobjective and stochastic 

optimization, Proceedings of the IIASA Task Force Meeting, Nov. 30 - Dec. 4, 

1981, IIASA Proceedings Series CP-82-S12, Laxenburg, Austria 

Grauer, M., D.B. Pressmar (Eds.) (1991), Parallel computing and mathematical optimization, 

vol. 367 of Lecture Notes in Economics and Mathematical Systems, 

Springer, Berlin


Graves, R.L., P. Wolfe (Eds.) (1963), Recent advances in mathematical programming, 

McGraw-Hill, New York 

Greenberg, H. (1971), Integer programming, Academic Press, New York 

Greenstadt, J. (1967a), On the relative e ciencies of gradient methods, Math. Comp. 

21, 360-367 

Greenstadt, J. (1967b), Bestimmung der Eigenwerte einer Matrix nach der Jacobi- 

Methode, in: Ralston and Wilf (1967), pp. 152-168 

Greenstadt, J. (1970), Variations on variable-metric methods, Math. Comp. 24, 1-22 

Greenstadt, J. (1972), A quasi-Newton method with no derivatives, Math. Comp. 26, 

145-166 

Grefenstette, J.J. (Ed.) (1985), Proceedings of the 1st International Conference on 

Genetic Algorithms, Carnegie-Mellon University, Pittsburgh PA, July 24-26, 1985, 

Lawrence Erlbaum, Hillsdale NJ 

Grefenstette, J.J. (Ed.) (1987), Proceedings of the 2nd International Conference on 

Genetic Algorithms, MIT, Cambridge MA, July 28-31, 1987, Lawrence Erlbaum, 

Hillsdale NJ 

Gritzmann, P., R. Hettich, R. Horst, E. Sachs (Eds.) (1992), Operations Research '91, 

Extended Abstracts of the 16th Symposium on Operations Research, Trier, Sept. 

9-11, 1991, Physica-Verlag, Heidelberg 

Grusser, O.J., R. Klinke (Eds.) (1971), Zeichenerkennung durch biologische und technische 

Systeme, Springer, Berlin 

Guilfoyle, G., I. Johnson, P. Wheatley (1967), One-dimensional search combining golden 

section and cubic t techniques, Analytical Mechanics Associates Inc., quarterly 

report 67-1, Westbury, Long Island NY, Jan. 1967 

Guin, J.A. (1968), Modi cation of the complex method of constrained optimization, 

Comp. J. 10, 416-417 

Gurin, L.S. (1966), Random search in the presence of noise, Engng. Cybern. 4(3), 

252-260 

Gurin, L.S., V.P. Lobac (1963), Combination of the Monte Carlo method with the 

method of steepest descents for the solution of certain extremal problems, AIAA J. 

1, 2708-2710 

Gurin, L.S., L.A. Rastrigin (1965), Convergence of the random search methodinthe 

presence of noise, ARC 26, 1505-1511


Hadamard, J. (1908), Memoire sur le probleme d'analyse relatif al'equilibre des plaques 

elastiques encastrees, Memoires presentes par divers savants a l' Academie des sciences 

de l'Institut national de France, 2nd Ser., vol. 33 (savants etrangers), no. 4, 

pp. 1-128 

Hadley, G. (1962), Linear programming, Addison-Wesley, Reading MA 

Hadley, G. (1969), Nichtlineare und dynamische Programmierung, Physica-Verlag, Wurzburg, 

Germany 

Haefner, K. (Ed.) (1992), Evolution of information processing systems|an interdisciplinary 

approach for a new understanding of nature and society, Springer, Berlin 

Hague, D.S., C.R. Glatt (1968), An introduction to multivariable search techniques for 

parameter optimization and program AESOP, Boeing Space Division, report NASA- 

CR-73200, Seattle WA, March 1968 

Hamilton, P.A., J. Boothroyd (1969), Remark on algorithm 251 (E4)|function minimization, 

CACM 12, 512-513 

Hammel, U. (1991), Cartoon|combining modular simulation, regression, and optimization 

in an object-oriented environment, in: Kohler (1991), pp. 854-855 

Hammel, U., T. Back (1994), Evolution strategies on noisy functions|how toimprove 

convergence properties, in: Davidor, Schwefel, and Manner (1994), pp. 159-168 

Hammer, P.L. (Ed.) (1984), Stochastics and optimization, Annals of Operations Research, 

vol. 1, Baltzer, Basle, Switzerland 

Hammersley, J.M., D.C. Handscomb (1964), Monte Carlo methods, Methuen, London 

Hancock, H. (1960), Theory of maxima and minima, Dover, New York 

Hansen, P.B. (1972), Structured multiprogramming, CACM 15, 574-578 

Harkins, A. (1964), The use of parallel tangents in optimization, in: Blakemore and 

Davis (1964), pp. 35-40 

Hartmann, D. (1974), Optimierung balkenartiger Zylinderschalen aus Stahlbeton mit 

elastischem und plastischem Werksto verhalten, Dr.-Ing. Diss., University of Dortmund, 

July 1974 

Haubrich, J.G.A. (1963), Algorithm 205 (E4)|ative, CACM 6, 519 

Heckler, R. (1979), OASIS|optimization and simulation integrating system|status report, 

technical report KFA-STE-IB-2/79, Nuclear Research Center (KFA) Julich, 

Germany, Dec. 1979


Heckler, R., H.-P. Schwefel (1978), Superimposing direct search methods for parameter 

optimization onto dynamic simulation models, in: Highland, Nielsen, and Hull 

(1978), pp. 173-181 

Heinhold, J., K.W. Gaede (1972), Ingenieur-Statistik, 3rd ed., Oldenbourg, Munich 

Henn, R., H.P. Kunzi (1968), Einfuhrung in die Unternehmensforschung I und II, Springer, 

Berlin 

Herdy, M. (1992), Reproductive isolation as strategy parameter in hierarchical organized 

evolution strategies, in: Manner and Manderick (1992), pp. 207-217 

Herschel, R. (1961), Automatische Optimisatoren, Elektronische Rechenanlagen 3, 30-36 

Hertel, H. (1963), Biologie und Technik, Band 1: Struktur - Form - Bewegung, Krausskopf 

Verlag, Mainz 

Hesse, R. (1973), A heuristic search procedure for estimating a global solution of nonconvex 

programming problems, Oper. Res. 21, 1267-1280 

Hestenes, M.R. (1956), The conjugate-gradient method for solving linear systems, Proc. 

Symp. Appl. Math. 6, 83-102 

Hestenes, M.R. (1966), Calculus of variations and optimal control theory, Wiley, New 

York 

Hestenes, M.R. (1969), Multiplier and gradient methods, in: Zadeh, Neustadt, and Balakrishnan 

(1969a), pp. 143-163 

Hestenes, M.R. (1973), Iterative methods for solving linear equations, JOTA 11, 323-334 

(reprint of the original from 1951) 

Hestenes, M.R., M.L. Stein (1973), The solution of linear equations by minimization, 

JOTA 11, 335-359 (reprint of the original from 1951) 

Hestenes, M.R., E. Stiefel (1952), Methods of conjugate gradients for solving linear 

systems, NBS J. Research 49, 409-436 

Heusener, G. (1970), Optimierung natriumgekuhlter schneller Brutreaktoren mit Methoden 

der nichtlinearen Programmierung, report KFK-1238, Nuclear Research Center 

(KfK) Karlsruhe, Germany, July1970 

Heydt, G.T. (1970), Directed random search, Ph.D. thesis, Purdue University, Lafayette 

IN, Aug. 1970 

Heynert, H. (1972), Einfuhrung in die allgemeine Bionik, Deutscher Verlag der Wissenschaften, 

Berlin 

Highland, H.J., N.R. Nielsen, L.G. Hull (Eds.) (1978), Proceedings of the Winter Simulation 

Conference, Miami Beach FL, Dec. 4-6, 1978


Hildebrand, F.B. (1956), Introduction to numerical analysis, McGraw-Hill, New York 

Hill, J.C. (1964), A hill-climbing technique using piecewise cubic approximation, Ph.D. 

thesis, Purdue University, Lafayette IN, June 1964 

Hill, J.C., J.E. Gibson (1965), Hill-climbing on hills with many minima, Proceedings 

of the IInd IFAC Symposium on the Theory of Self Adaptive Control Systems, 

Teddington UK, Sept. 1965, pp. 322-334 

Hill, J.D. (1969), A search technique for multimodal surfaces, IEEE Trans. SSC-5, 2-8 

Hill, J.D., K.S. Fu (1965), A learning control system using stochastic approximation 

for hill-climbing, VIth Joint Automatic Control Conference, Troy NY, June 1965, 

session 14, paper 2 

Hill, J.D., G.J. McMurtry, K.S.Fu (1964), A computer-simulated on-line experiment in 

learning control systems, AFIPS Conf. Proc. 25, 315-325 

Himmelblau, D.M. (1972a), A uniform evaluation of unconstrained optimization techniques, 

in: Lootsma (1972b), pp. 69-97 

Himmelblau, D.M. (1972b), Applied nonlinear programming, McGraw-Hill, New York 

Himsworth, F.R. (1962), Empirical methods of optimisation, Trans. Inst. Chem. Engrs. 

40, 345-349 

Hock, W., K. Schittkowski (1981), Test examples for nonlinear programming codes, vol. 

187 of Lecture Notes in Economics and Mathematical Systems, Springer, Berlin 

Hofestadt, R., F. Kruckeberg, T. Lengauer (Eds.) (1993), Informatik in der Biowissenschaft, 


Ho mann, U., H. Hofmann (1970), Einfuhrung in die Optimierung mit Anwendungsbeispielen 

aus dem Chemie-Ingenieur-Wesen, Verlag Chemie, Weinheim 

Ho meister, F. (1991), Scalable parallelism by evolutionary algorithms, in: Grauer and 

Pressmar (1991), pp. 177-198 

Ho meister, F., T. Back (1990), Genetic algorithms and evolution strategies|similarities 

and di erences, technical report 365 (green series), University of Dortmund, Department 

of Computer Science, Nov. 1990 


and di erences, in: Schwefel and Manner (1991), pp. 445-469 


and di erences, technical report SYS-1/92, Systems Analysis Research Group, University 

of Dortmund, Department of Computer Science, Feb. 1992


Ho meister, F., H.-P. Schwefel (1990), A taxonomy of parallel evolutionary algorithms, 

in: Wolf, Legendi, and Schendel (1990), pp. 97-107 

Ho er, A. (1976), Formoptimierung von Leichtbaufachwerken durch Einsatz einer Evolutionsstrategie, 

Dr.-Ing. Diss., Technical University of Berlin, Department ofTransportation 

Technologies, June 1976 

Ho er, A., U. Ley ner, J. Wiedemann (1973), Optimization of the layout of trusses 

combining strategies based on Michell's theorem and on the biological principles of 

evolution, IInd Symposium on Structural Optimization, Milan, April 1973, AGARD 

Conf. Proc. 123, appendix A 

Holland, J.H. (1975), Adaptation in natural and arti cial systems, University of Michigan 

Press, Ann Arbor MI 

Holland, J.H. (1992), Adaptation in natural and arti cial systems, 2nd ed., MIT Press, 

Cambridge MA 

Holland, J.H., K.J. Holyoak, R.E. Nisbett, P.R. Thagard (1986), Induction|process of 

interference, learning, and discovery, MIT Press, Cambridge MA 

Hollstien, R.B. (1971), Arti cial genetic adaptation in computer control systems, Ph.D. 

thesis, University of Michigan, Ann Arbor MI 

Hooke, R. (1957), Control by automatic experimentation, Chem. Engng. 64(6), 284-286 

Hooke, R., T.A. Jeeves (1958), Comments on Brooks' discussion of random methods, 

Oper. Res. 6, 881-882 

Hooke, R., T.A. Jeeves (1961), Direct search solution of numerical and statistical problems, 

JACM 8, 212-229 

Hooke, R., R.I. VanNice (1959), Optimizing control by automatic experimentation, ISA 

J. 6(7), 74-79 

Hopper, M.J. (Ed.) (1971), Harwell subroutine library|a catalogue of subroutines, 


Horst, R. (Ed.) (1991), J. of Global Optimization, Kluwer, Dordrecht, The Netherlands 

Hoshino, S. (1971), On Davies, Swann, and Campey minimisation process, Comp. J. 

14, 426-427 

Hoshino, S. (1972), A formulation of variable metric methods, JIMA 10, 394-403 

Hotelling, H. (1941), Experimental determination of the maximum of a function, Ann. 

Math. Stat. 12, 20-45 

House, F.R. (1971), Remark on algorithm 251 (E4)|function minimisation, CACM 14, 

358


Householder, A.S. (1953), Principles of numerical analysis, McGraw-Hill, New York 

Householder, A.S. (1970), The numerical treatment of a single nonlinear equation, Mc- 

Graw-Hill, New York 

Houston, B.F., R.A. Hu man (1971), A technique which combines modi ed pattern 

search methods with composite designs and polynomial constraints to solve constrained 

optimization problems, Nav. Res. Log. Quart. 18, 91-98 

Hu, T.C. (1972), Ganzzahlige Programmierung und Netzwerk usse, Oldenbourg, Munich 

Hu, T.C., S.M. Robinson (Eds.) (1973), Mathematical programming, Academic Press, 

New York 

Huang, H.Y. (1970), Uni ed approach to quadratically convergent algorithms for function 

minimization, JOTA 5, 405-423 

Huang, H.Y. (1974), Method of dual matrices for function minimization, JOTA 13, 

519-537 

Huang, H.Y., J.P. Chambliss (1973), Quadratically convergent algorithms and onedimensional 

search schemes, JOTA 11, 175-188 

Huang, H.Y., J.P. Chambliss (1974), Numerical experiments on dual matrix algorithms 

for function minimization, JOTA 13, 620-634 

Huang, H.Y., A.V. Levy (1970), Numerical experiments on quadratically convergent 

algorithms for function minimization, JOTA 6, 269-282 

Huberman, B.A. (Ed.) (1988), The ecology of computation, North Holland, Amsterdam 

Huelsman, L.P. (1968), GOSPEL|a general optimization software package for electrical 

network design, University of Arizona, Department of Electrical Engineering, report, 

Tucson AZ, Sept. 1968 

Hull, T.E. (1967), Random-number generation and Monte-Carlo methods, in: Klerer 

and Korn (1967), pp. 63-78 

Humphrey, W.E., B.J. Cottrell (1962/66), A general minimizing routine, University of 

California, Lawrence Radiation Laboratory, internal memo. P-6, Livermore CA, 

July 1962, rev. March 1966 

Hupfer, P. (1970), Optimierung von Baukonstruktionen, Teubner, Stuttgart 

Hwang, C.L., A.S.M. Masud (1979), Multiple objective decision making|methods and 

applications, vol. 164 of Lecture Notes in Economics and Mathematical Systems, 


Hyslop, J. (1972), A note on the accuracy of optimisation techniques, Comp. J. 15, 140


Idelsohn, J.M. (1964), Ten ways to nd the optimum, Contr. Engng. 11(6), 97-102 

Imamura, H., K. Uosaki, M. Tasaka, T. Suzuki (1970), Optimization methods in the 

multimodal case and their application to automatic lens design, IFAC Kyoto Symposium 

on Systems Engineering Approach to Computer Control, Kyoto, Japan, 

Aug. 1970, paper 7.4 

Ivakhnenko, A.G. (1970), Heuristic self-organization in problems of engineering cybernetics, 


Jacobson, D.H., D.Q. Mayne (1970), Di erential dynamic programming, Elsevier, New 

York 

Jacoby, S.L.S., J.S. Kowalik, J.T. Pizzo (1972), Iterative methods for nonlinear optimization 

problems, Prentice-Hall, Englewood Cli s NJ 

Jahnke-Emde-Losch (1966), Tafeln hoherer Funktionen, 7th ed., Teubner, Stuttgart 

Janac, K. (1971), Adaptive stochastic approximations, Simulation 16, 51-58 

Jarratt, P. (1967), An iterative method for locating turning points, Comp. J. 10, 82-84 

Jarratt, P. (1968), A numerical method for determining points of in ection, BIT 8, 31-35 

Jarratt, P. (1970), A review of methods for solving nonlinear algebraic equations in one 

variable, in: Rabinowitz (1970), pp. 1-26 

Jarvis, R.A. (1968), Hybrid computer simulation of adaptive strategies, Ph.D. thesis, 

University ofWestern Australia, Nedlands WA, March 1968 

Jarvis, R.A. (1970), Adaptive global search in a time-variant environment using a probabilistic 

automaton with pattern recognition supervision, IEEE Trans. SSC-6, 

209-217 

Jeeves, T.A. (1958), Secant modi cation of Newton's method, CACM 1, 9-10 

Johnk, M.D. (1969), Erzeugen und Testen von Zufallszahlen, Physica-Verlag, Wurzburg, 

Germany 

Johannsen, G. (1970), Entwicklung und Optimierung eines vielparametrigen nichtlinearen 

Modells fur den Menschen als Regler in der Fahrzeugfuhrung, Dr.-Ing. Diss., 

Technical University of Berlin, Department ofTransportation Technologies, Oct. 

1970 

Johannsen, G. (1973), Optimierung vielparametriger Bezugsmodelle mit Hilfe von Zufallssuchverfahren, 

Regelungstechnische Proze -Datenverarbeitung 21, 234-239 

John, F. (1948), Extremum problems with inequalities as subsidiary conditions, in: 

Friedrichs, Neugebauer, and Stoker (1948), pp. 187-204


John, P.W.M. (1971), Statistical design and analysis of experiments, Macmillan, New 

York 

Johnson, S.M. (1956), Best exploration for maximum is Fibonaccian, RAND Corporation, 

report P-856, Santa Monica CA 

Jones, A. (1970), Spiral|a new algorithm for non-linear parameter estimation using 

least squares, Comp. J. 13, 301-308 

Jones, D.S. (1973), The variable metric algorithm for non-de nite quadratic functions, 

JIMA 12, 63-71 

Joosen, W., E. Milgrom (Eds.) (1992), Parallel computing|from theory to sound practice, 

Proceedings of the European Workshop on Parallel Computing (EWPC '92), 

Barcelona, Spain, March 1992, IOS Press, Amsterdam 

Jordan, P. (1970), Schopfung und Geheimnis, Stalling, Oldenburg, Germany 

Kamiya, A., T. Togawa (1972), Optimal branching structure of the vascular tree, Bull. 

Math. Biophys. 34, 431-438 

Kammerer, W.J., M.Z. Nashed (1972), On the convergence of the conjugate gradient 

method for singular linear operator equations, SIAM J. Numer. Anal. 9, 165-181 

Kantorovich, L.V. (1940), A new method of solving of some classes of extremal problems, 

Compt. Rend. Acad. Sci. URSS (USSR), New Ser. 28, 211-214 

Kantorovich, L.V. (1945), On an e ective method of solving extremal problems for 

quadratic functionals, Compt. Rend. Acad. Sci. URSS (USSR), New Ser. 48, 

455-460 

Kantorovich, L.V. (1952), Functional analysis and applied mathematics, NBS report 

1509, March 1952 

Kaplinskii, A.I., A.I. Propoi (1970), Stochastic approach to non-linear programming 

problems, ARC 31, 448-459 

Kappler, H. (1967), Gradientenverfahren der nichtlinearen Programmierung, O. Schwartz, 

Gottingen, Germany 

Karmarkar, N. (1984), A new polynomial-time algorithm for linear programming, Combinatorica 

4, 373-395 

Karnopp, D.C. (1961), Search theory applied to parameter scan optimization problems, 

Ph.D. thesis, MIT, Cambridge MA, June 1961 

Karnopp, D.C. (1963), Random search techniques for optimization problems, Automatica 

1, 111-121


Karnopp, D.C. (1966), Ein direktes Rechenverfahren fur implizite Variationsprobleme 

bei optimalen Prozessen, Regelungstechnik 14, 366-368 

Karp, R.M., W.L. Miranker (1968), Parallel minimax search foramaximum, J.Comb. 

Theory 4, 19-35 

Karreman, H.F. (Ed.) (1968), Stochastic optimization and control, Wiley, New York 

Karumidze, G.V. (1969), A method of random search for the solution of global extremum 

problems, Engng. Cybern. 7(6), 27-31 

Katkovnik, V.Ya., O.Yu. Kulchitskii (1972), Convergence of a class of random search 

algorithms, ARC 33, 1321-1326 

Katkovnik, V.Ya., L.I. Shimelevich (1972), A class of heuristic methods for solution of 

partially-integer programming problems, Engng. Cybern. 10, 390-394 

Kaupe, A.F. (1963), Algorithm 178 (E4)|direct search, CACM 6, 313-314 

Kaupe, A.F. (1964), On optimal search techniques, CACM 7, 38 

Kavanaugh, W.P., E.C. Stewart, D.H. Brocker (1968), Optimal control of satellite attitude 

acquisition by a random search algorithm on a hybrid computer, AFIPS Conf. 

Proc. 32, 443-452 

Kawamura, K., R.A. Volz (1973), On the rate of convergence of the conjugate gradient 

reset method with inaccurate linear minimizations, IEEE Trans. AC-18, 360-366 

Kelley, H.J. (1962), Methods of gradients, in: Leitmann (1962), pp. 205-254 

Kelley, H.J., G.E. Myers (1971), Conjugate direction methods for parameter optimization, 

Astron. Acta 16, 45-51 

Kelley, H.J., J.L. Speyer (1970), Accelerated gradient projection, in: Balakrishnan et 

al. (1970), pp. 151-158 

Kempthorne, O. (1952), The design and analysis of experiments, Wiley, NewYork 

Kenworthy, I.C. (1967), Some examples of simplex evolutionary operation in the paper 

industry, Appl. Stat. 16, 211-224 

Kesten, H. (1958), Accelerated stochastic approximation, Ann. Math. Stat. 29, 41-59 

Khachiyan, L.G. (1979), (abstract on the ellipsoid method), Doklady Akademii Nauk 

SSSR (USSR) 244, 1093-1096 

Khovanov, N.V. (1967), Stochastic optimization of parameters by the method of variation 

of the search region, Engng. Cybern. 5(4), 34-39


Kiefer, J. (1953), Sequential minimax search for a maximum, Proc. Amer. Math. Soc. 

4, 502-506 

Kiefer, J. (1957), Optimum sequential search and approximation methods under minimum 

regularity assumptions, SIAM J. 5, 105-136 

Kiefer, J., J. Wolfowitz (1952), Stochastic estimation of the maximum of a regression 

function, Ann. Math. Stat. 23, 462-466 

King, R.F. (1973), An improved Pegasus method for root nding, BIT 13, 423-427 

Kirkpatrick, S., C.D. Gelatt, M.P. Vecchi (1983), Optimization by simulated annealing, 

Science 220, 671-680 

Kivelidi, V.Kh., Ya.I. Khurgin (1970), Construction of probabilistic search, ARC 31, 

1892-1894 

Kiwiel, K.C. (1985), Methods of descent for nondi erentiable optimization, vol. 1133 of 

Lecture Notes in Mathematics, Springer, Berlin 

Kjellstrom, G. (1965), Network optimization by random variation of component values, 

Ericsson Technical 25, 133-151 

Klerer, M., G.A. Korn (Eds.) (1967), Digital computer user's hand-book, McGraw-Hill, 

New York 

Klessig, R., E. Polak (1972), E cient implementations of the Polak-Ribiere conjugate 

gradient algorithm, SIAM J. Contr. 10, 524-549 

Klessig, R., E. Polak (1973), An adaptive precision gradient method for optimal control, 

SIAM J. Contr. 11, 80-93 

Klingman, W.R., D.M. Himmelblau (1964), Nonlinear programming with the aid of a 

multiple-gradient summation technique, JACM 11, 400-415 

Klir, G.J. (Ed.) (1978), Applied general systems research, Plenum Press, New York 

Klockgether, J., H.-P. Schwefel (1970), Two-phase nozzle and hollow core jet experiments, 

in: Elliott (1970), pp. 141-148 

Klotzler, R. (1970), Mehrdimensionale Variationsrechnung, Birkhauser, Basle, Switzerland 

Kobelt, D., G. Schneider (1977), Optimierung im Dialog unter Verwendung von Evolutionsstrategie 

und Ein u gro enrechnung, Chemie-Technik 6, 369-372 

Koch, H.W. (1973), Der Sozialdarwinismus|seine Genese und sein Ein u auf das imperialistische 

Denken, Beck, Munich


Kochen, M., H.M. Hastings (Eds.) (1988), Advances in cognitive science|steps toward 

convergence, AAAS Selected Symposium 104 

Kohler, E. (Ed.) (1991), 36th International Scienti c Colloquium, Ilmenau, Oct. 21-24, 

1991, Technical University of Ilmenau, Germany 

Kopp, R.E. (1967), Computational algorithms in optimal control, IEEE Int'l Conv. 

Record 15, part 3 (Automatic Control), 5-14 

Korbut, A.A., J.J. Finkelstein (1971), Diskrete Optimierung, Akademie-Verlag, Berlin 

Korn, G.A. (1966), Random process simulation and measurement, McGraw-Hill, New 

York 

Korn, G.A. (1968), Hybrid computer Monte Carlo techniques, in: McLeod (1968), pp. 

223-234 

Korn, G.A., T.M. Korn (1961), Mathematical handbook for scientists and engineers, 


Korn, G.A., T.M. Korn (1964), Electronic analog and hybrid computers, McGraw-Hill, 

New York 

Korn, G.A., H. Kosako (1970), A proposed hybrid-computer method for functional optimization, 

IEEE Trans. C-19, 149-153 

Kovacs, Z., S.A. Lill (1971), Note on algorithm 46|a modi ed Davidon method for 

nding the minimum of a function, using di erence approximation for derivatives, 

Comp. J. 14, 214-215 

Kowalik, J. (1967), A note on nonlinear regression analysis, Austral. Comp. J. 1, 51-53 

Kowalik, J., J.F. Morrison (1968), Analysis of kinetic data for allosteric enzyme reactions 

as a nonlinear regression problem, Math. Biosci. 2, 57-66 

Kowalik, J., M.R. Osborne (1968), Methods for unconstrained optimization problems, 

Elsevier, New York 

Koza, J. (1992), Genetic programming, MIT Press, Cambridge MA 

Krallmann, H. (1978), Evolution strategy and social sciences, in: Klir (1978), pp. 891- 

903 

Krasnushkin, E.V. (1970), Multichannel automatic optimizer having a variable sign for 

the feedback, ARC 31, 2057-2061 

Krasovskii, A.A. (1962), Optimal methods of search incontinuous and pulsed extremum 

control systems, Proceedings of the 1st IFAC Symposium on Optimization and 

Adaptive Control, Rome, April 1962, pp. 19-33


Krasovskii, A.A. (1963), Problems of continuous systems theory of extremal control of 

industrial processes, Proceedings of the IInd IFAC Congress, Basle, Switzerland, 

Aug.-Sept. 1963, vol. 1, pp. 519-526 

Krasulina, T.P. (1972), Robbins-Monro process in the case of several roots, ARC 33, 

580-585 

Kregting, J., R.C. White, Jr. (1971), Adaptive random search, Eindhoven University of 

Technology, Department of Electrical Engineering, Group Measurement and Control, 

report TH-71-E-24, Eindhoven, The Netherlands, Oct. 1971 

Krelle, W., H.P. Kunzi (1958), Lineare Programmierung, Verlag Industrielle Organisation, 

Zurich, Switzerland 

Krolak, P.D. (1968), Further extensions of Fibonaccian search to linear programming 

problems, SIAM J. Contr. 6, 258-265 

Krolak, P.D., L. Cooper (1963), An extension of Fibonaccian search to several variables, 

CACM 6, 639-641 

Kuester, J.L., J.H. Mize (1973), Optimization techniques with Fortran, McGraw-Hill, 

New York 

Kuhn, H.W. (Ed.) (1970), Proceedings of the Princeton Symposium on Mathematical 

Programming, Aug. 1967, Princeton University Press, Princeton NJ 

Kuhn, H.W., A.W. Tucker (1951), Nonlinear programming, in: Neyman (1951), pp. 

481-492 

Kulchitskii, O.Yu. (1972), A non-gradient random search method for an extremum in a 

Hilbert space, Engng. Cybern. 10, 773-780 

Kunzi, H.P. (1967), Mathematische Optimierung gro er Systeme, Ablauf- und Planungsforschung 

8, 395-407 

Kunzi, H.P., W. Krelle (1969), Einfuhrung in die mathematische Optimierung, Verlag 

Industrielle Organisation, Zurich, Switzerland 

Kunzi, H.P., W. Krelle, W. Oettli (1962), Nichtlineare Programmierung, Springer, Berlin 

Kunzi, H.P., W. Oettli (1969), Nichtlineare Optimierung|neuere Verfahren|Bibliographie, 


Kunzi, H.P., S.T. Tan (1966), Lineare Optimierung gro er Systeme, Springer, Berlin 

Kunzi, H.P., H.G. Tzschach, C.A. Zehnder (1966), Numerische Methoden der mathematischen 

Optimierung, Teubner, Stuttgart


Kunzi, H.P., H.G. Tzschach, C.A. Zehnder (1970), Numerische Methoden der mathematischen 

Optimierung mit Algol- und Fortran-Programmen|Gebrauchsversion der 

Computerprogramme, Teubner, Stuttgart 

Kuo, F.F., J.F. Kaiser (Eds.) (1966), System analysis by digital computer, Wiley, New 

York 

Kursawe, F. (1991), A variant ofevolution strategies for vector optimization in: Schwefel 

and Manner (1991), pp. 193-197 

Kursawe, F. (1992), Evolution strategies for vector optimization, in: Tzeng and Yu 

(1992), vol. 3, pp. 187-193 

Kushner, H.J. (1963), Hill climbing methods for the optimization of multiparameter 

noise disturbed systems, Trans. ASME D, J. Basic Engng. (1963), 157-164 

Kushner, H.J. (1972), Stochastic approximation algorithms for the local optimization of 

functions with nonunique stationary points, IEEE Trans. AC-17, 646-654 

Kussul, E., A. Luk (1971), Evolution als Optimierungsproze , Ideen des exakten Wissens 

(1971), 821-826 

Kwakernaak, H. (1965), On-line iterative optimization of stochastic control systems, 


Kwakernaak, H. (1966), On-line dynamic optimization of stochastic control systems, 

Proceedings of the IIIrd IFAC Congress, London, June 1966, paper 29-D 

Kwatny, H.G. (1972), A note on stochastic approximation algorithms in system identication, 

IEEE Trans. AC-17, 571-572 

Laarhoven, P.J.M. van, E.H.L. Aarts (1987), Simulated annealing, theory and applications, 

Reidel, Dordrecht, The Netherlands 

Land, A.H., S. Powell (1973), Fortran codes for mathematical programming|linear, 

quadratic and discrete, Wiley, London 

Lange-Nielsen, T., G.M. Lance (1972), A pattern search algorithm for feedback-control 

system parameter optimization, IEEE Trans. C-21, 1222-1227 

Langguth, V. (1972), Ein Identi kationsverfahren fur lineare Systeme mit Hilfe von stochastischen 

Suchverfahren und unter Anwendung der Sequentialanalyse fur stochastische 

Fehlersignale, messen-steuern-regeln 15, 293-296 

Langton, C.G. (Ed.) (1989), Arti cial life, Proceedings of an Interdisciplinary Workshop 

on the Synthesis and Simulation of Living Systems, Los Alamos NM, Sept. 1987, 

Proceedings vol. VI of Santa Fe Institute Studies in the Science of Complexity, 

Addison-Wesley, Redwoood City CA


Langton, C.G. (Ed.) (1994a), Arti cial life III, Proceedings of the Workshop on Arti cial 

Life, Santa Fe NM, June 1992, Proceedings vol. XVII of Santa Fe Institute Studies 

in the Science of Complexity, Addison-Wesley, Reading MA 

Langton, C.G. (Ed.) (1994b), Arti cial life (journal), MIT Press, Cambridge MA 

Langton, C.G., C. Taylor, J.D. Former, S. Rasmussen (Eds.) (1992), Arti cial life II, 

Proceedings of the Second Interdisciplinary Workshop on the Synthesis and Simulation 

of Living Systems, Santa Fe NM,Feb. 1990, Proceedings vol. X of Santa Fe 

Institute Studies in the Science of Complexity, Addison-Wesley, Reading MA 

Lapidus, L., E. Shapiro, S. Shapiro, R.E. Stillman (1961), Optimization of process performance, 

AIChE J. 7(2), 288-294 

Larichev, O.I., G.G. Gorvits (1974), New approach to comparison of search methods 

used in nonlinear programming problems, JOTA 13, 635-659 

Larson, R.E., E. Tse (1973), Parallel processing algorithms for the optimal control of 

nonlinear dynamic systems, IEEE Trans. C-22, 777-786 

Lasdon, L.S. (1970), Conjugate direction methods for optimal control, IEEE Trans. 

AC-15, 267-268 

Lau ermair, T. (1992a), Hyper achen-Annealing|ein paralleles Optimierungsverfahren 

basierend auf selbstorganisierter Musterbildung durch Relaxation auf gekrummten 

Hyper achen, Dr. rer. nat. Diss., Technical University of Munich, Department of 

Mathematics and Computer Science, April 1992 

Lau ermair, T. (1992b), Hyperplane annealing and activator-inhibitor-systems, in: Manner 

and Manderick (1992), pp. 521-530 

Lavi, A., T.P. Vogl (Eds.) (1966), Recent advances in optimization techniques, Wiley, 

New York 

Lawler, E.L., J.K. Lenstra, A.H.G. Rinooy Kan, D.B. Shmoys (Eds.) (1985), The 

travelling salesman problem, a guided tour of combinatorial optimization, Wiley- 

Interscience, New York 

Lawrence, J.P., III, F.P. Emad (1973), An analytic comparison of random searching for 

the extremum and gradient searching of a known objective function, IEEE Trans. 

AC-18, 669-671 

Lawrence, J.P., III, K. Steiglitz (1972), Randomized pattern search, IEEE Trans. C-21, 

382-385 

LeCam, L.M., J. Neyman (Eds.) (1967), Proceedings of the Vth Berkeley Symposium 

on Mathematical Statistics and Probability, 1965/66,vol. 4: Biology and Problems 

of Health, University of California Press, Berkeley CA


LeCam, L.M., J. Neyman, E.L. Scott (Eds.) (1972), Proceedings of the VIth Berkeley 

Symposium on Mathematical Statistics and Probability, 1970/71, vol. 5: Darwinian, 

Neo-Darwinian and Non-Darwinian Evolution, University of California Press, Berkeley 

CA 

Lee, R.C.K. (1964), Optimal estimation, identi cation, and control, MIT Press, Cambridge 

MA 

Lehner, K. (1991), Einsatz wissensbasierter Systeme in der Strukturoptimierung dargestellt 

am Beispiel Fachwerkoptimierung, Dr.-Ing. Diss., University ofBochum, 

Faculty of Civil Engineering, May 1991 

Leibniz, G.W. (1710), Theodicee, 4th rev. ed., Forster, Hannover, 1744 

Leitmann, G. (Ed.) (1962), Optimization techniques with applications to aerospace 

systems, Academic Press, New York 

Leitmann, G. (1964), Einfuhrung in die Theorie optimaler Steuerung und der Di erentialspiele|eine 

geometrische Darstellung, Oldenbourg, Munich 

Leitmann, G. (Ed.) (1967), Topics in optimization, Academic Press, New York 

Lemarechal, C., R. Mi in (Eds.) (1978), Nonsmooth optimization, vol. 3 of IIASA 

Proceedings Series, Pergamon Press, Oxford UK 

Leon, A. (1966a), A comparison among eight known optimizing procedures, in: Lavi and 

Vogl (1966), pp. 23-46 

Leon, A. (1966b), A classi ed bibliography on optimization, in: Lavi and Vogl (1966), 

pp. 599-649 

Lerner, A.Ja., E.A. Rosenman (1973), Optimale Steuerungen, Verlag Technik, Berlin 

Lesniak, Z.K. (1970), Methoden der Optimierung von Konstruktionen unter Benutzung 

von Rechenautomaten, W. Ernst, Berlin 

Levenberg, K. (1944), A method for the solution of certain non-linear problems in least 

squares, Quart. Appl. Math. 2, 164-168 

Levine, L. (1964), Methods for solving engineering problems using analog computers, 


Levine, M.D., T. Vilis (1973), On-line learning optimal control using successive approximation 

techniques, IEEE Trans. AC-18, 279-284 

Lew, H.S. (1972), An arithmetical approach to the mechanics of blood ow in small 

caliber blood vessels, J. Biomech. 5, 49-69


Ley ner, U. (1974), Uber den Einsatz Linearer Programmierung beim Entwurf optimaler 

Leichtbaustabwerke, Dr.-Ing. Diss., Technical University of Berlin, Department of 

Transportation Technologies, June 1974 

Lill, S.A. (1970), Algorithm 46|a modi ed Davidon method for nding the minimum 

of a function, using di erence approximation for derivatives, Comp. J. 13, 111-113 

Lill, S.A. (1971), Note on algorithm 46|a modi ed Davidon method, Comp. J. 14, 106 

Little, W.D. (1966), Hybrid computer solutions of partial di erential equations by Monte 

Carlo methods, AFIPS Conf. Proc. 29, 181-190 

Ljapunov, A.A. (Ed.), W. Kammerer, H. Thiele (Eds.) (1964a), Probleme der Kybernetik, 

vol. 4, Akademie-Verlag, Berlin 

Ljapunov, A.A. (Ed.), W. Kammerer, H. Thiele (Eds.) (1964b), Probleme der Kybernetik, 

vol. 5, Akademie-Verlag, Berlin 

Locker, A. (Ed.) (1973), Biogenesis - evolution - homeostasis, Springer, Berlin 

Loginov, N.V. (1966), Methods of stochastic approximation, ARC 27, 706-728 

Lohmann, R. (1992), Structure evolution and incomplete induction, in: Manner and 

Manderick (1992), pp. 175-185 

Lootsma, F.A. (Ed.) (1972a), Numerical methods for non-linear optimization, Academic 

Press, London 

Lootsma, F.A. (1972b), A survey of methods for solving constrained minimization problems 

via unconstrained minimization, in: Lootsma (1972a), pp. 313-347 

Lowe, C.W. (1964), Some techniques of evolutionary operation, Trans. Inst. Chem. 

Engrs. 42, T334-T344 

Lucas, E. (1876), Note sur l'application des series recurrentes a la recherche de la loi de 

distribution de nombres premiers, Compt. Rend. Hebdomad. Seances Acad. Sci. 

Paris 82, 165-167 

Luce, A.D., H. Rai a (1957), Games and decisions, Wiley, NewYork 

Luenberger, D.G. (1972), Mathematical programming and control theory|trends of 

interplay, in: Geo rion (1972), pp. 102-133 

Luenberger, D.G. (1973), Introduction to linear and nonlinear programming, Addison- 

Wesley, Reading MA 

Machura, M., A. Mulawa (1973), Algorithm 450 (E4)|Rosenbrock function minimization, 

CACM 16, 482-483 

Madsen, K. (1973), A root- nding algorithm based on Newton's method, BIT 13, 71-75


Mamen, R., D.Q. Mayne (1972), A pseudo Newton-Raphson method for function minimization, 

JOTA 10, 263-277 

Mandischer, M. (1993), Representation and evolution of neural networks, in: Albrecht, 

Reeves, and Steele (1993), pp. 643-649 

Mangasarian, O.L. (1969), Nonlinear programming, McGraw-Hill, New York 

Manner, R., B. Manderick (Eds.) (1992), Parallel problem solving from nature 2, Proceedings 

of the 2nd PPSN Conference, Brussels, Sept. 28-30, 1992, North-Holland, 

Amsterdam 

Marfeld, A.F. (1970), Kybernetik des Gehirns|ein Kompendium der Grundlagenforschung, 

Safari Verlag, Berlin 

Markwich, P. (1978), Der thermische Wasserstrahlantrieb auf der Grundlage des o enen 

Clausius-Rankine-Prozesses|Konzeption und hydrothermodynamische Analyse, 

Dr.-Ing. Diss., Technical University of Berlin, Department ofTransportation 

Technologies 

Marquardt, D.W. (1963), An algorithm for least-squares estimation of non linear parameters, 

SIAM J. 11, 431-441 

Marti, K. (1980), On accelerations of the convergence in random search methods, Methods 

of Oper. Res. 37, 391-406 

Masters, C.O., H. Drucker (1971), Observations on direct search procedures, IEEE Trans. 

SMC-1, 182-184 

Matthews, A., D. Davies (1971), A comparison of modi ed Newton methods for unconstrained 

optimisation, Comp. J. 14, 293-294 

Matyas, J. (1965), Random optimization ARC 26, 244-251 

Matyas, J. (1967), Das zufallige Optimierungsverfahren und seine Konvergenz, Proceedings 

of the Vth International Analogue Computation Meeting, Lausanne, Aug.-Sept. 

1967, vol. 1, pp. 540-544 

Max eld, M., A. Callahan, L.J. Fogel (Eds.) (1965), Biophysics and cybernetic systems, 

Spartan, Washington, DC 

Maybach, R.L. (1966), Solution of optimal control problems on a high-speed hybrid 

computer, Simulation 9, 238-245 

McArthur, D.S. (1961), Strategy in research|alternative methods for design of experiments, 

IRE Trans. EM-8, 34-40 

McCormick, G P. (1969), Anti-zig-zagging by bending, Mgmt. Sci. 15, 315-320


Meissinger, H.F., G.A. Bekey (1966), An analysis of continuous parameter identi cation 

methods, Simulation 6, 94-102 

Meredith, D.L., C.L. Karr, K.K. Kumar (1992), The use of genetic algorithms in the 

design of fuzzy-logic controllers, 3rd Workshop on Neural Networks|Academic / 

Industrial / Defence (WNN '92), vol. SPIE-1721, pp. 545-555, International Society 

of Optical Engineering 

Merzenich, W. (1972), Ein einfaches mathematisches Evolutionsmodell, GMD Mitteilungen 

21, Bonn 

Metropolis, N., A.W. Rosenbluth, M.W. Rosenbluth, A.H. Teller, E. Teller (1953), Equations 

of state calculations by fast computing machines, J. Chem. Phys. 21, 1087- 

1092 

Meyer, H.A. (Ed.) (1956), Symposium on Monte Carlo methods, Wiley, New York 

Meyer, J.-A. (Ed.) (1992), Adaptive behavior (journal), MIT Press, Cambridge MA 

Meyer, J.-A., H.L. Roitblat, S.W. Wilson (Eds.) (1993), From animals to animats 2, 

Proceedings of the 2nd International Conference on Simulation of Adaptive Behavior 

(SAB '92), Honolulu HI, Dec. 7-11, 1992, MIT Press, Cambridge MA 

Meyer, J.-A., S.W. Wilson (Eds.) (1991), From animals to animats, Proceedings of 

the 1st International Conference on Simulation of Adaptive Behavior (SAB), Paris, 

Sept. 24-28, 1990, MIT Press, Cambridge MA 

Michalewicz, Z. (1992), Genetic algorithms + data structures = evolution programs, 


Michalewicz, Z. (1994), Genetic algorithms + data structures = evolution programs, 2nd 

ext. ed., Springer, Berlin 

Michie, D. (1971), Heuristic search, Comp. J. 14, 96-102 

Miele, A. (1969), Variational approach to the gradient method|theory and numerical 

experiments, in: Zadeh, Neustadt, and Balakrishnan (1969b), pp. 143-157 

Miele, A., J.W. Cantrell (1969), Study on a memory gradient method for the minimization 

of functions, JOTA 3, 459-470 

Miele, A., J.W. Cantrell (1970), Memory gradient method for the minimization of functions, 

in: Balakrishnan et al. (1970), pp. 252-263 

Miele, A., J.N. Damoulakis, J.R. Cloutier, J.L. Tietze (1974), Sequential gradientrestoration 

algorithm for optimal control problems with nondi erential constraints, 

JOTA 13, 218-255


Miele, A., H.Y. Huang, J.C. Heidemann (1969), Sequential gradient-restoration algorithm 

for the minimization of constrained functions|ordinary and conjugate gradient 

versions, JOTA 4, 213-243 

Miele, A., A.V. Levy, E.E. Cragg (1971), Modi cations and extensions of the conjugate 

gradient-restoration algorithm for mathematical programming problems, JOTA 7, 

450-472 

Miele, A., J.L. Tietze, A.V. Levy (1972), Summary and comparison of gradient-restoration 

algorithms for optimal control problems, JOTA 10, 381-403 

Miller, R.E. (1973), A comparison of some theoretical models of parallel computation, 

IEEE Trans. C-22, 710-717 

Millstein, R.E. (1973), Control structures in Illiac IV Fortran, CACM 16, 621-627 

Minot, O.N. (1969), Arti cial intelligence and new simulations, Simulation 13, 214-215 

Minsky, M. (1961), Steps toward arti cial intelligence, IRE Proc. 49, 8-30 

Miranker, W.L. (1969), Parallel methods for approximating the root of a function, IBM 

J. Res. Dev. 13, 297-301 

Miranker, W.L. (1971), A survey of parallelism in numerical analysis, SIAM Review 13, 

524-547 

Mitchell, B.A., Jr. (1964), A hybrid analog-digital parameter optimizer for Astrac II, 

AFIPS Conf. Proc. 25, 271-285 

Mitchell, R.A., J.L. Kaplan (1968), Nonlinear constrained optimization by a non-random 

complex method, NBS J. Res. C, Engng. Instr. 72, 249-258 

Mlynski, D. (1964a), Der Wirkungsgrad experimenteller Optimierungsstrategien, Dr.- 

Ing. Diss., Technical University (RWTH) of Aachen, Germany, Dec. 1964 

Mlynski, D. (1964b), Maximalisierung durch logische Suchprozesse, in: Steinbuch and 

Wagner (1964), pp. 82-94 

Mlynski, D. (1966a), Ein Beitrag zur statistischen Theorie der Optimierungsstrategien I 

and II, Regelungstechnik 14, 209-215 and 325-330 

Mlynski, D. (1966b), E ciency of experimental strategies for optimising feedbackcontrol 

of disturbed processes, Proceedings of the IIIrd IFAC Congress, London, June 1966, 

paper 29-G 

Mockus, J.B. see also under Motskus, I.B. 

Mockus, J.B. (1971), On the optimization of power distribution systems, in: Schwarz 

(1971), technical papers, vol. 3, pp. 6.3.2-1 to 6.3.2-14


Moiseev, N.N. (Ed.) (1970), Colloquium on methods of optimization, Springer, Berlin 

Moran, P.A.P. (1967), Unsolved problems in evolutionary theory, in: LeCam and Neyman 

(1967), pp. 457-480 

More, J.J., S.J. Wright (1993), Optimization software guide, vol. 14 of Frontiers in 

Applied Mathematics, SIAM, Philadelphia 

Morrison, D.D. (1968), Optimization by least squares, SIAM J. Numer. Anal. 5, 83-88 

Motskus, I.B. see also under Mockus, J.B. 

Motskus, I.B. (1965), Some experiments related to the capabilities of man in solving 

multiextremal problems heuristically, Engng. Cybern. 3(3), 40-44 

Motskus, I.B. (1967), Mnogoekstremalnye sadachi v projektirovanii, Nauka, Moscow 

Motskus, I.B., A.A. Feldbaum (1963), Symposium on multiextremal problems, Trakay, 

June 1963, Engng. Cybern. 1(5), 154-155 

Movshovich, S.M. (1966), Random search and the gradient method in optimization problems, 

Engng. Cybern. 4(6), 39-48 

Mufti, I.H. (1970), Computational methods in optimal control problems, Springer, Berlin 

Mugele, R.A. (1961), A nonlinear digital optimizing program for process control systems, 

AFIPS Conf. Proc. 19, 15-32 

Mugele, R.A. (1962), A program for optimal control of nonlinear processes, IBM Systems 

J. 1, 2-17 

Mugele, R.A. (1966), The probe and edge theorems for non-linear optimization, in: Lavi 

and Vogl (1966), pp. 131-144 

Muhlenbein, H., D. Schlierkamp-Voosen (1993a), Predictive models for the breeder genetic 

algorithm I. Continuous Parameter Optimization, Evolutionary Computation 

1, 25-49 

Muhlenbein, H., D. Schlierkamp-Voosen (1993b), Optimal interaction of mutation and 

crossover in the breeder genetic algorithm, in: Forrest (1993), pp. 648 

Muller-Merbach, H. (1971), Operations Research|Methoden und Modelle der Optimalplanung, 

2nd ed., F. Vahlen, Berlin 

Munson, J.K., A.I. Rubin (1959), Optimization by random search on the analog computer, 

IRE Trans. EC-8, 200-203 

Murata, T. (1963), The use of adaptive constrained descent in systems design, University 

of Illinois, Coordinated Science Laboratory, report R-189, Urbana IL, Dec. 1963


Murray, W. (Ed.) (1972a), Numerical methods for unconstrained optimization, Academic 

Press, London 

Murray, W. (1972b), Second derivative methods, in: Murray (1972a), pp. 57-71 

Murray, W. (1972c), Failure, the causes and cures, in: Murray (1972a), pp. 107-122 

Murtagh, B.A. (1970), A short description of the variable-metric method, in: Abadie 

(1970), pp. 525-528 

Murtagh, B.A., R.W.H. Sargent (1970), Computational experience with quadratically 

convergent minimisation methods, Comp. J. 13, 185-194 

Mutseniyeks, V.A., L.A. Rastrigin (1964), Extremal control of continuous multiparameter 

systems by the method of random search, Engng. Cybern. 2(1), 82-90 

Myers, G.E. (1968), Properties of the conjugate-gradient andDavidon methods, JOTA 

2, 209-219 

Nachtigall, W. (1971), Biotechnik|statische Konstruktionen in der Natur, Quelle und 

Meyer, Heidelberg, Germany 

Nachtigall, W. (Ed.) (1992), Technische Biologie und Bionik 1, Proceedings of the 1st 

Congress on Bionics, Wiesbaden, June 11-13, 1992, BIONA report 8, G. Fischer, 

Stuttgart 

Nake, F. (1966), Zerti kat zu Algorithmus 2|Orthonormierung von Vektoren nach E. 

Schmidt, Computing 1, 281 

Neave, H.R. (1973), On using the Box-Muller transformation with multiplicative congruential 

pseudo-random number generators, Appl. Stat. 22, 92-97 

Nelder, J.A., R. Mead (1965), A simplex method for function minimization, Comp. J. 

7, 308-313 

Nenonen, L.K., B. Pagurek (1969), Conjugate gradient optimization applied to a copper 

converter model, Automatica 5, 801-810 

Neumann, J. von (1960), Die Rechenmaschine und das Gehirn, Oldenbourg, Munich 

Neumann, J. von (1966), Theory of self-reproducing automata, University of Illinois 

Press, Urbana-Champaign IL 

Neumann, J. von, O. Morgenstern (1961), Spieltheorie und wirtschaftliches Verhalten, 

Physica-Verlag, Wurzburg, Germany 

Newman, D.J. (1965), Location of the maximum on unimodal surfaces, JACM 12, 395- 

398


Neyman, J. (Ed.) (1951), Proceedings of the IInd Berkeley Symposium on Mathematical 

Statistics and Probability, 1950, University of California Press, Berkeley CA 

Neyman, J. (Ed.) (1956), Proceedings of the IIIrd Berkeley Symposium on Mathematical 

Statistics and Probability, 1954/55, University of California Press, Berkeley CA 

Neyman, J. (Ed.) (1961), Proceedings of the IVth Berkeley Symposium on Mathematical 

Statistics and Probability, 1960, University of California Press, Berkeley CA 

Nickel, K. (1967), Allgemeine Forderungen an einen numerischen Algorithmus, ZAMM 

47(Sonderheft), T67-T68 

Nickel, K., K. Ritter (1972), Termination criterion and numerical convergence, SIAM J. 

Numer. Anal. 9, 277-283 

Niederreiter, H. (1992), Random number generation and quasi-Monte Carlo methods, 

vol. 63 of CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, 

Philadelphia 

Niemann, H. (1974), Methoden der Mustererkennung, Akademische Verlagsgesellschaft, 

Frankfort/Main 

Nikolic, Z.J., K.S. Fu (1966), An algorithm for learning without external supervision and 

its application to learning control systems, IEEE Trans. AC-11, 414-442 

Nissen, V. (1993), Evolutionary algorithms in management science, report 9303 of the 

European Study Group for Evolutionary Economics 

Nissen, V. (1994), Evolutionare Algorithmen|Darstellung, Beispiele, betriebswirtschaftliche 

Anwendungsmoglichkeiten, DUV Deutscher Universitatsverlag, Wiesbaden 

Norkin, K.B. (1961), On one method for automatic search for the extremum of a function 

of many variables, ARC 22, 534-538 

North, M. (1980), Time-dependent stochastic model of oods, Proc. ASCE, J. Hydraulics 

Div. 106-HY5, 649-665 

Nurminski, E.A. (Ed.) (1982), Progress in nondi erentiable optimization, IIASA Collaborative 

Proceedings Series CP-82-58, International Institute for Applied Systems 

Analysis, Laxenburg, Austria 

Odell, P.L. (1961), An empirical study of three stochastic approximation techniques 

applicable to sensitivity testing, report NAVWEPS-7837 

Oestreicher, H.L., D.R. Moore (Eds.) (1968), Cybernetic problems in bionics, Gordon 

Breach, New York 

Oi, K., H. Sayama, T. Takamatsu (1973), Computational schemes of the Davidon- 

Fletcher-Powell method in in nite-dimensional space, JOTA 12, 447-458


Oldenburger, R. (Ed.) (1966), Optimal and self optimizing control, MIT Press, Cambridge 

MA 

Oliver, L.T., D.J. Wilde (1964), Symmetric sequential minimax search for a maximum, 

Fibonacci Quart. 2, 169-175 

O'Neill, R. (1971), Algorithm AS 47|function minimization using a simplex procedure, 

Appl. Stat. 20, 338-345 

Opacic, J. (1973), A heuristic method for nding most extrema of a nonlinear functional, 

IEEE Trans. SMC-3, 102-107 

Oren, S.S. (1973), Self-scaling variable metric algorithms without line search for unconstrained 

minimization, Math. Comp. 27, 873-885 

Ortega, J.M., W.C. Rheinboldt (1967), Monotone iterations for nonlinear equations with 

application to Gauss-Seidel methods, SIAM J. Numer. Anal. 4, 171-190 

Ortega, J.M., W.C. Rheinboldt (1970), Iterative solution of nonlinear equations in several 

variables, Academic Press, New York 

Ortega, J.M., W.C. Rheinboldt (1972), A general convergence result for unconstrained 

minimization methods, SIAM J. Numer. Anal. 9, 40-43 

Ortega, J.M., M.L. Rocko (1966), Nonlinear di erence equations and Gauss-Seidel type 

iterative methods, SIAM J. Numer. Anal. 3, 497-513 

Osborne, M.R. (1972), Some aspects of nonlinear least squares calculations, in: Lootsma 

(1972a), pp. 171-189 

Osche, G. (1972), Evolution|Grundlagen, Erkenntnisse, Entwicklungen der Abstammungslehre, 

Herder, Freiburg, Germany 

Ostermeier, A. (1992), An evolution strategy with momentum adaptation of the random 

number distribution, in: Manner and Manderick (1992), pp. 197-206 

Ostrowski, A.M. (1966), Solution of equations and systems of equations, 2nd ed., Academic 

Press, New York 

Ostrowski, A.M. (1967), Contributions to the theory of the method of steepest descent, 

Arch. Ration. Mech. Anal. 26, 257-280 

Overholt, K.J. (1965), An instability in the Fibonacci and golden section search methods, 

BIT 5, 284-286 

Overholt, K.J. (1967a), Note on algorithm 2|Fibonacci search, and algorithm 7|Minx, 

and the golden section search, Comp. J. 9, 414 

Overholt, K.J. (1967b), Algorithm 16|Gold, Comp. J. 9, 415


Overholt, K.J. (1967c), Algorithm 17|Goldsec, Comp J. 9, 415 

Overholt, K.J. (1973), E ciency of the Fibonacci search method, BIT 13, 92-96 

Page, S.E., D.W. Richardson (1992), Walsh functions, schema variance, and deception, 

Complex Systems 6, 125-135 

Pagurek, B., C.M. Woodside (1968), The conjugate gradient method for optimal control 

problems with bounded control variables, Automatica 4, 337-349 

Palmer, J.R. (1969), An improved procedure for orthogonalising the search vectors in 

Rosenbrock's and Swann's direct search optimisation methods, Comp. J. 12, 69-71 

Papageorgiou, M. (1991), Optimierung|Statische, dynamische, stochastische Verfahren 

fur die Anwendung, Oldenbourg, Munich 

Papentin, F. (1972), A Darwinian evolutionary system, Dr. rer. nat. Diss., University 

of Tubingen, Germany 

Pardalos, P.M., J.B. Rosen (1987), Constrained global optimization|algorithms and 

applications, vol. 268 of Lecture Notes in Computer Science, Springer, Berlin 

Parkinson, J.M., D. Hutchinson (1972a), A consideration of non-gradient algorithms for 

the unconstrained optimization of functions of high dimensionality, in: Lootsma 

(1972a), pp. 99-113 

Parkinson, J.M., D. Hutchinson (1972b), An investigation into the e ciency of variants 

on the simplex method, in: Lootsma (1972a), pp. 115-135 

Pask, G. (1962), Physical and linguistic evolution in self-organizing systems, Proceedings 

of the 1st IFAC Symposium on Optimization and Adaptive Control, Rome, April 

1962, pp. 199-227 

Pask, G. (1971), A cybernetic experimental method and its underlying philosophy, Int'l 

J. Man-Machine Stud. 3, 279-337 

Patrick, M.L. (1972), A highly parallel algorithm for approximating all zeros of a polynomial 

with only real zeros, CACM 15, 952-955 

Pattee, H.H., E.A. Edelsack, L. Fein, A.B. Callahan (Eds.) (1966), Natural automata 

and useful simulations, Spartan, Washington, DC 

Paviani, D.A., D.M. Himmelblau (1969), Constrained nonlinear optimization by heuristic 

programming, Oper. Res. 17, 872-882 

Pearson, J.D. (1969), Variable metric methods of minimization, Comp. J. 12, 171-178 

Peckham, G. (1970), A new method for minimising a sum of squares without calculating 

gradients, Comp. J. 13, 418-420


Peschel, M. (1980), Ingenieurtechnische Entscheidungen|Modellbildung und Steuerung 

mit Hilfe der Polyoptimierung, Verlag Technik, Berlin 

Peters, E. (1989), OptimiEst|an optimizing expert system using topologies, in: Brebbia 

and Hernandez (1989), pp. 222-232 

Peters, E. (1991), Ein Beitrag zur wissensbasierten Auswahl und Steuerung von Optimierverfahren, 

Dr. rer. nat. Diss., University of Dortmund, Department of Computer 

Science, May 1991 

Pierre, D.A. (1969), Optimization theory with applications, Wiley, New York 

Pierson, B.L., S.G. Rajtora (1970), Computational experience with the Davidon method 

applied to optimal control problems, IEEE Trans. SSC-6, 240-242 

Pike, M.C., I.D. Hill, F.D. James (1967), Note on algorithm 2|Fibonacci search, and 

on algorithm 7|Minx, and algorithm 2 modi ed|Fibonacci search, Comp. J. 9, 

416-417 

Pike, M.C., J. Pixner (1965), Algorithm 2|Fibonacci search, Comp. Bull. 8, 147 

Pincus, M. (1970), A Monte Carlo method for the approximate solution of certain types 

of constrained optimization problems, Oper. Res. 18, 1225-1228 

Pinkham, R.S. (1964), Random root location, SIAM J. 12, 855-864 

Pinsker, I.Sh., B.M. Tseitlin (1962), A nonlinear optimization problem, ARC 23, 1510- 

1518 

Plane, D.R., C. McMillan, Jr. (1971), Discrete optimization|integer programming and 

network analysis for management decisions, Prentice-Hall, Englewood Cli s NJ 

Plaschko, P., K. Wagner (1973), Evolutions-Linearisierungs-Programm zur Darstellung 

von numerischen Daten durchbeliebigeFunktionen, report DLR-FB-73-55, DFVLR 

Porz-Wahn, Germany 

Pluznikov, L.N., V.O. Andreyev, E.S. Klimenko (1971), Use of random search method 

in industrial planning, Engng. Cybern. 9, 229-235 

Polak, E. (1971), Computational methods in optimization|a uni ed approach, Academic 


Polak, E. (1972), A survey of methods of feasible directions for the solution of optimal 

control problems, IEEE Trans. AC-17, 591-596 

Polak, E. (1973), An historical survey of computational methods in optimal control, 

SIAM Review 15, 553-584 

Polak, E., G. Ribiere (1969), Note sur la convergence de methodes de directions conjuguees, 

Rev. Franc. Inf. Rech. Oper. 3(16), 35-43


Polyak, B.T. (1969), The conjugate gradient method in extremal problems, USSR Comp. 

Math. and Math. Phys. 9(4), 94-112 

Ponstein J. (1967), Seven kinds of convexity, SIAM Review 9, 115-119 

Pontrjagin, L.S., V.G. Boltjanskij, R.V. Gamkrelidze, E.F. Miscenko (1967), Mathematische 

Theorie optimaler Prozesse, 2nd ed., Oldenbourg, Munich 

Powell, D.R., J.R. MacDonald (1972), A rapidly convergent iterative method for the 

solution of the generalized nonlinear least squares problem, Comp. J. 15, 148-155 

Powell, M.J.D. (1962), An iterative method for nding stationary values of a function of 

several variables, Comp. J. 5, 147-151 

Powell, M.J.D. (1964), An e cient method for nding the minimum of a function of 

several variables without calculating derivatives, Comp. J. 7, 155-162 

Powell, M.J.D. (1965), A method for minimizing a sum of squares of nonlinear functions 

without calculating derivatives, Comp. J. 7, 303-307 

Powell, M.J.D. (1966), Minimization of functions of several variables, in: Walsh (1966), 

pp. 143-158 

Powell, M.J.D. (1968a), On the calculation of orthogonal vectors, Comp. J. 11, 302-304 

Powell, M.J.D. (1968b), A Fortran subroutine for solving systems of non-linear algebraic 

equations, UKAEA Research Group, report AERE-R-5947, Harwell, Oxon 

Powell, M.J.D. (1969), A theorem on rank one modi cations to a matrix and its inverse, 

Comp. J. 12, 288-290 

Powell, M.J.D. (1970a), Rank one methods for unconstrained optimization, in: Abadie 

(1970), pp. 139-156 

Powell, M.J.D. (1970b), A survey of numerical methods for unconstrained optimization, 


Powell, M.J.D. (1970c), A Fortran subroutine for unconstrained minimization, requiring 

rst derivatives of the objective function, UKAEA Research Group, report AERE- 

R-6469, Harwell, Oxon 

Powell, M.J.D. (1970d), A hybrid method for nonlinear equations, in: Rabinowitz (1970), 

pp. 87-114 

Powell, M.J.D. (1970e), A Fortran subroutine for solving systems of nonlinear algebraic 

equations, in: Rabinowitz (1970), pp. 115-161 

Powell, M.J.D. (1970f), Subroutine VA04A (Fortran), updated May 20th, 1970, in: Hopper 

(1971), p. 72


Powell, M.J.D. (1970g), Recent advances in unconstrained optimization, UKAEA Research 

Group, technical paper AERE-TP-430, Harwell, Oxon, Nov. 1970 

Powell, M.J.D. (1971), On the convergence of the variable metric algorithm, JIMA 7, 

21-36 

Powell, M.J.D. (1972a), Some properties of the variable metric algorithm, in: Lootsma 

(1972a), pp. 1-17 

Powell, M.J.D. (1972b), Quadratic termination properties of minimization algorithms 

I|Statement and discussion of results, JIMA 10, 333-342 

Powell, M.J.D. (1972c), Quadratic termination properties of minimization algorithms 

II|Proofs and theorems, JIMA 10, 343-357 

Powell, M.J.D. (1972d), A survey of numerical methods for unconstrained optimization, 

in: Geo rion (1972), pp. 3-21 

Powell, M.J.D. (1972e), Problems related to unconstrained optimization, in: Murray 

(1972a), pp. 29-55 

Powell, M.J.D. (1972f), Unconstrained minimization algorithms without computation 

of derivatives, UKAEA Research Group, technical paper AERE-TP-483, Harwell, 

Oxon, April 1972 

Poznyak, A.S. (1972), Use of learning automata for the control of random search, ARC 

33, 1992-2000 

Press, W.H., S.A. Teukolsky, W.T. Vetterling, F.B. Flannery (1992), Numerical recipes 

in Fortran, 2nd ed., Cambridge University Press, Cambridge UK (especially Chap. 

7, Random numbers, pp. 266-319) 

Prusinkiewicz, P., A. Lindenmayer (1990), The algorithmic beauty ofplants|the virtual 

laboratory, Springer, Berlin 

Pugachev, V.N. (1970), Determination of the characteristics of complex systems using 

statistical trials and analytical investigation, Engng. Cybern. 8, 1109-1117 

Pugh, E.L. (1966), A gradient technique of adaptive Monte Carlo, SIAM Review 8, 

346-355 

Pun, L. (1969), Introduction to optimization practice, Wiley, New York 

Rabinowitz, P. (Ed.) (1970), Numerical methods for nonlinear algebraic equations, Gordon 

Breach, London 

Ralston, A., H.S. Wilf (Eds.) (1967), Mathematische Methoden fur Digitalrechner, Oldenbourg, 

Munich


Ralston, A., H.S. Wilf (Eds.) (1969), Mathematische Methoden fur Digitalrechner II, 

Oldenbourg, Munich 

Rappl, G. (1984), Konvergenzraten von Random-Search-Verfahren zur globalen Optimierung, 

Dr. rer. nat. Diss., Hochschule der Bundeswehr, Munich-Neubiberg, 

Department of Computer Science, Nov. 1984 

Rastrigin, L.A. (1960), Extremal control by the method of random scanning, ARC 21, 

891-896 

Rastrigin, L.A. (1963), The convergence of the random search method in the extremal 

control of a many-parameter system, ARC 24, 1337-1342 

Rastrigin, L.A. (1965a), Sluchainyi poisk v zadachakh optimisatsii mnogoparametricheskikh 

sistem, Zinatne, Riga, (for a translation into English see next item) 

Rastrigin, L.A. (1965b), Random search in optimization problems for multiparameter 

systems, Air Force System Command, Foreign Technical Division, FTD-HT-67-363 

Rastrigin, L.A. (1966), Stochastic methods of complicated multi-parameter system optimization, 

Proceedings of the IIIrd IFAC Congress, London, June 1966, paper 3-F 

Rastrigin, L.A. (1967), Raboty po teorii i primeneniyu statisticheskikh metodov optimisatsii 

v institute elektroniki i vychislitelnoi tekhniki Akademii Nauk Latviiskoi 

SSR, Avtomatika iVychislitelnaya Tekhnika (1967, 5), 31-40 

Rastrigin, L.A. (1968), Statisticheskiye metody poiska, Nauka, Moscow 

Rastrigin, L.A. (1969), Teorija i primenenije sluchainovo poiska, Zinatne, Riga 

Rastrigin, L.A. (1972), Adaptivnye sistemy, vol. 1, Zinatne, Riga 

Rauch, S.W. (1973), A convergence theory for a class of nonlinear programming problems, 

SIAM J. Numer. Anal. 10, 207-228 

Rawlins, G.J.E. (Ed.) (1991), Foundations of genetic algorithms, Morgan Kaufmann, 

San Mateo CA 

Rechenberg, I. (1964), Cybernetic solution path of an experimental problem, Royal Aircraft 

Establishment, Library Translation 1122, Farnborough, Hants, Aug. 1965, 

English translation of the unpublished written summary of the lecture \Kybernetische 

Losungsansteuerung einer experimentellen Forschungsaufgabe", delivered at 

the joint annual meeting of the WGLR and DGRR, Berlin, 1964 

Rechenberg, I. (1973), Evolutionsstrategie|Optimierung technischer Systeme nach Prinzipien 

der biologischen Evolution, Frommann-Holzboog, Stuttgart 

Rechenberg, I. (1978), Evolutionsstrategien, in: Schneider and Ranft (1978), pp. 83-114


Rechenberg, I. (1989), Evolution strategy|nature's way of optimization, in: Bergmann 

(1989), pp. 106-126 

Rechenberg, I. (1994), Evolutionsstrategie '94, Frommann-Holzboog, Stuttgart 

Rein, H., M. Schneider (1971), Einfuhrung in die Physiologie des Menschen, Springer, 

Berlin 

Rhead, D.G. (1971), Some numerical experiments on Zangwill's method for unconstrained 

minimization, University of London, Institute of Computer Science, working 

paper ICSI-319 

Ribiere, G. (1970), Sur la methodedeDavidon-Fletcher-Powell pour la minimisation des 

fonctions, Mgmt. Sci. 16, 572-592 

Rice, J.R. (1966), Experiments on Gram-Schmidt orthogonalization, Math. Comp. 20, 

325-328 

Richardson, J.A., J.L. Kuester (1973), Algorithm 454 (E4)|the complex method for 

constrained optimization, CACM 16, 487-489 

Riedl, R. (1976), Die Strategie der Genesis, Piper, Munich 

Robbins, H., S. Monro (1951), A stochastic approximation method, Ann. Math. Stat. 

22, 400-407 

Roberts, P.D., R.H. Davis (1969), Conjugate gradients, Control 13, 206-210 

Roberts, S.M., H.I. Lyvers (1961), The gradient method in process control, Ind. Engng. 

Chem. 53, 877-882 

Rodlo , R.K. (1976), Bestimmung der Geschwindigkeit von Versetzungsgruppen in neutronen-bestrahlten 

Kupfer-Einkristallen, Dr. rer. nat. Diss., Technical University 

of Braunschweig, Germany, Sept. 1976 

Rosen, J.B. (1960), The gradient projection method for nonlinear programming I| 

Linear constraints, SIAM J. 8, 181-217 

Rosen, J.B. (1961), The gradient projection method for nonlinear programming II| 

Nonlinear constraints, SIAM J. 9, 514-532 

Rosen, J.B. (1966), Iterative solution of nonlinear optimal control problems, SIAM J. 

Contr. 4, 223-244 

Rosen, J.B., O.L. Mangasarian, K. Ritter (Eds.) (1970), Nonlinear programming, Academic 


Rosen, J.B., S. Suzuki (1965), Construction of nonlinear programming test problems, 

CACM 8, 113


Rosen, R. (1967), Optimality principles in biology, Butterworths, London 

Rosenblatt, F. (1958), The perceptron|a probabilistic model for information storage 

and organization in the brain, Psychol. Rev. 65, 386-408 

Rosenbrock, H.H. (1960), An automatic method for nding the greatest or least value 

of a function, Comp. J. 3, 175-184 

Rosenbrock, H.H., C. Storey (1966), Computational techniques for chemical engineers, 

Pergamon Press, Oxford UK 

Ross, G.J.S. (1971), The e cient use of function minimization in non-linear maximumlikelihood 

estimation, Appl. Stat. 19, 205-221 

Rothe, R. (1959), Hohere Mathematik fur Mathematiker, Physiker und Ingenieure, I| 

Di erentialrechnung und Grundformeln der Integralrechnung nebst Anwendungen, 

18th ed., Teubner, Leipzig, Germany 

Roughgarden, J.W. (1979), Theory of population genetics and evolutionary ecology, 

Macmillan, New York 

Rozvany, G. (Ed.) (1994), J. on Structural Optimization, Springer, Berlin 

Rudolph, G. (1991), Global optimization by means of distributed evolution strategies, 

in: Schwefel and Manner (1991), pp. 209-213 

Rudolph, G. (1992a), On Correlated mutation in evolution strategies, in: Manner and 

Manderick (1992), pp. 105-114 

Rudolph, G. (1992b), Parallel approaches to stochastic global optimization, in: Joosen 

and Milgrom (1992), pp. 256-267 

Rudolph, G. (1993), Massively parallel simulated annealing and its relation to evolutionary 

algorithms, Evolutionary Computation 1(4), 361-383 

Rudolph, G. (1994a), Convergence analysis of canonical genetic algorithms, IEEE Trans. 

NN-5, 96-101 

Rudolph, G. (1994b), An evolutionary algorithm for integer programming, in: Davidor, 

Schwefel, and Manner (1994), pp. 139-148 

Rutishauser, H. (1966), Algorithmus 2|Orthonormierung von Vektoren nachE.Schmidt, 

Computing 1, 159-161 

Rybashov, M.V. (1965a), The gradient method of solving convex programming problems 

on electronic analog computers, ARC 26, 1886-1898 

Rybashov, M.V. (1965b), Gradient method of solving linear and quadratic programming 

problems on electronic analog computers, ARC 26, 2079-2089


Rybashov, M.V. (1969), Insensitivity of gradient systems in the solution of linear problems 

on analog computers, ARC 30, 1679-1687 

Ryshik, I.M., I.S. Gradstein (1963), Summen-, Produkt- und Integraltafeln, 2nd ed., 

Deutscher Verlag der Wissenschaften, Berlin 

Saaty, T.L. (1955), The number of vertices of a polyhedron, Amer. Math. Monthly 62, 

326-331 

Saaty, T.L. (1963), A conjecture concerning the smallest bound on the iterations in linear 

programming, Oper. Res. 11, 151-153 

Saaty, T.L. (1970), Optimization in integers and related extremal problems, McGraw- 

Hill, New York 

Saaty, T.L., J. Bram (1964), Nonlinear mathematics, McGraw-Hill, New York 

Sacks, J. (1958), Asymptotic distribution of stochastic approximation procedures, Ann. 

Math. Stat. 29, 373-405 

Sameh, A.H. (1971), On Jacobi and Jacobi-like algorithms for a parallel computer, Math. 

Comp. 25, 579-590 

Samuel, A.L. (1963), Some studies in machine learning using the game of checkers, in: 

Feigenbaum and Feldman (1963), pp. 71-105 

Sargent, R.W.H., D.J. Sebastian (1972), Numerical experience with algorithms for unconstrained 

minimization, in: Lootsma (1972a), pp. 45-68 

Sargent, R.W.H., D.J. Sebastian (1973), On the convergence of sequential minimization 

algorithms, JOTA 12, 567-575 

Saridis, G.N. (1968), Learning applied to successive approximation algorithms, Proceedings 

of the 1968 Joint Automatic Control Conference, Ann Arbor MI, pp. 1007-1013 

Saridis, G.N. (1970), Learning applied to successive approximation algorithms, IEEE 

Trans. SSC-6, 97-103 

Saridis, G.N., H.D. Gilbert (1970), Self-organizing approach to the stochastic fuel regulator 

problem, IEEE Trans. SSC-6, 186-191 

Satterthwaite, F.E. (1959a), REVOP or random evolutionary operation, Merrimack College, 

report 10-10-59, North Andover MA 

Satterthwaite, F.E. (1959b), Random balance experimentation, Technometrics 1, 111- 

137 

Satterthwaite, F.E., D. Shainin (1959), Pinpoint important process variable with polyvariable 

experimentation, J. Soc. Plast. Engrs. 15, 225-230


Savage, J.M. (1966), Evolution, Bayerischer Landwirtschafts-Verlag, Munich 

Sawaragi, Y., T. Takamatsu, K. Fukunaga, E. Nakanishi, H. Tamura (1971), Dynamic 

version of steady state optimizing control of distillation column by trial method, 


Scha er, J.D. (Ed.) (1989), Proceedings of the 3rd International Conference on Genetic 

Algorithms, George Mason University, Fairfax VA, June 4-7, 1989, Morgan 

Kaufmann, San Mateo CA 

Schechter, R.S. (1962), Iteration methods for nonlinear problems, Trans. Amer. Math. 

Soc. 104, 179-189 

Schechter, R.S. (1968), Relaxation methods for convex problems, SIAM J. Numer. Anal. 

5, 601-612 

Schechter, R.S. (1970), Minimization of a convex function by relaxation, in: Abadie 

(1970), pp. 177-190 

Schee er, L. (1886), Uber die Bedeutung der Begri e \Maximum und Minimum"inder 

Variationsrechnung, Mathematische Annalen 26, 197-208 

Scheel, A. (1985), Beitrag zur Theorie der Evolutionsstrategie, Dr.-Ing. Diss., Technical 

University of Berlin, Department of Process Engineering 

Scheuer, E.M., D.S. Stoller (1962), On the generation of normal random vectors, Technometrics 

4, 278-281 

Schinzinger, R. (1966), Optimization in electromagnetic system design, in: Lavi and 

Vogl (1966), pp. 163-214 

Schittkowski, K. (1980), Nonlinear programming codes, vol. 183 of Lecture Notes in 

Economics and Mathematical Systems, Springer, Berlin 

Schley, C.H., Jr. (1968), Conjugate gradient methods for optimization, General Electric 

Research and Development Center, report 68-C-008, Schenectady NY, Jan. 1968 

Schmalhausen, I.I. (1964), Grundlagen des Evolutionsprozesses vom kybernetischen 

Standpunkt, in: Ljapunov, Kammerer, and Thiele (1964a), pp. 151-188 

Schmetterer, L. (1961), Stochastic approximation, in: Neyman (1961), vol. 1, pp. 587- 

609 

Schmidt, J.W., H. Schwetlick (1968), Ableitungsfreie Verfahren mit hoherer Konvergenzgeschwindigkeit, 

Computing 3, 215-226 

Schmidt, J.W., H.F. Trinkaus (1966), Extremwertermittlung mit Funktionswerten bei 

Funktionen von mehreren Veranderlichen, Computing 1, 224-232


Schmidt, J.W., K. Vetters (1970), Ableitungsfreie Verfahren fur nichtlineare Optimierungsprobleme, 

Numer. Math. 15, 263-282 

Schmitt, E. (1969), Adaptive computer algorithms for optimization and root- nding, 

NTZ-report 6, VDE Verlag, Berlin 

Schneider, B., U. Ranft (Eds.) (1978), Simulationsmethoden in der Medizin und Biologie, 


Schrack, G., N. Borowski (1972), An experimental comparison of three random searches, 

in: Lootsma (1972a), pp. 137-147 

Schumer, M.A. (1967), Optimization by adaptive random search, Ph.D. thesis, Princeton 

University, Princeton NJ, Nov. 1967 

Schumer, M.A. (1969), Hill climbing on a sample function of a Gaussian Markov process, 

JOTA 4, 413-418 

Schumer, M.A., K. Steiglitz (1968), Adaptive step size random search, IEEE Trans. 

AC-13, 270-276 

Schuster, P. (1972), Vom Makromolekul zur primitiven Zelle|die Entstehung biologischer 

Funktion, Chemie in unserer Zeit 6(1), 1-16 

Schwarz, H. (Ed.) (1971), Multivariable technical control systems, North-Holland, Amsterdam 

Schwarz, H.R., H. Rutishauser, E. Stiefel (1968), Numerik symmetrischer Matrizen, 

Teubner, Stuttgart 

Schwefel, D. et al. (1972), Gesundheitsplanung im Departamento del Valle del Cauca, 

report of the German Development Institute, Berlin, July 1972 

Schwefel, H.-P. (1968), Experimentelle Optimierung einer Zweiphasenduse Teil I, report 

35 for the project MHD-Staustrahlrohr, AEG Research Institute, Berlin, Oct. 1968 

Schwefel, H.-P. (1974), Adaptive Mechanismen in der biologischen Evolution und ihr 

Ein u auf die Evolutionsgeschwindigkeit, Internal report of the Working Group 

of Bionics and Evolution Techniques at the Institute for Measurement andControl 

Technology,Technical University of Berlin, Department of Process Engineering, July 

1974 

Schwefel, H.-P. (1975a), Evolutionsstrategie und numerische Optimierung, Dr.-Ing. Diss., 

Technical University of Berlin, Department of Process Engineering 

Schwefel, H.-P. (1975b), Binare Optimierung durch somatische Mutation, Internal report 

of the Working Group of Bionics and Evolution Techniques at the Institute 

of for Measurement and Control Technology, Technical University of Berlin (and 

the Central Animal Laboratory of the Medical High School of Hannover, SFB 146 

Versuchstierforschung of the Veterinary High School of Hannover), May 1975


Schwefel, H.-P. (1980), Subroutines EVOL, GRUP, KORR|Listings and User's Guides, 

Internal report of the Programme Group of Systems Analysis and Technological Development, 

KFA-STE-IB-2/80, April 1980, Nuclear Research Center (KFA) Julich, 

Germany 

Schwefel, H.-P. (1981), Optimum Seeking Methods|User's Guides, Internal report of 

the Programme Group of Systems Analysis and Technological Development, KFA- 

STE-IB-7/81, Oct. 1981, Nuclear Research Center (KFA) Julich, Germany 

Schwefel, H.-P. (1987), Collective phenomena in evolutionary systems, in: Checkland 

and Kiss (1987), vol. 2, pp. 1025-1033 

Schwefel, H.-P. (1988), Towards large-scale long-term systems analysis, in: Cheng (1988), 

pp. 375-381 

Schwefel, H.-P., F. Kursawe (1992), Kunstliche Evolution als Modell fur naturliche Intelligenz, 

in: Nachtigall (1992), pp. 73-91 

Schwefel, H.-P., R. Manner (Eds.) (1991), Parallel problem solving from nature, Proceedings 

of the 1st PPSN Workshop, Dortmund, Oct. 1-3, 1990, vol. 496 of Lecture 

Notes in Computer Science, Springer, Berlin 

Schwetlick, H. (1970), Algorithmus 12|Ein ableitungsfreies Verfahren zur Losung endlich-dimensionaler 

Gleichungssysteme, Computing 5, 82-88 and 393 

Sebald, A.V., L.J. Fogel (Eds.) (1994), Proceedings of the 3rd Annual Conference on 

Evolutionary Programming, San Diego CA, Feb. 24-26, 1994, World Scienti c, 

Singapore 

Sebastian, H.-J., K. Tammer (Eds.) (1990), System Modelling and Optimization, vol. 

143 of Lecture Notes in Control and Information Sciences, Springer, Berlin 

Sergiyevskiy, G.M., A.P. Ter-Saakov (1970), Factor experiments in many-dimensional 

stochastic approximation of an extremum, Engng. Cybern. 8, 949-954 

Shah, B.V., R.J. Buehler, O. Kempthorne (1964), Some algorithms for minimizing a 

function of several variables, SIAM J. 12, 74-92 

Shanno, D.F. (1970a), Parameter selection for modi ed Newton methods for function 

minimization, SIAM J. Numer. Anal. 7, 366-372 

Shanno, D.F. (1970b), Conditioning of quasi-Newton methods for function minimization, 

Math. Comp. 24, 647-656 

Shanno, D.F., P.C. Kettler (1970), Optimal conditioning of quasi-Newton methods, 

Math. Comp. 24, 657-664 

Shapiro, I.J., K.S. Narendra (1969), Use of stochastic automata for parameter selfoptimization 

with multimodal performance criteria, IEEE Trans. SSC-5, 352-360


Shedler, G.S. (1967), Parallel numerical methods for the solution of equations, CACM 

10, 286-291 

Shimizu, T. (1969), A stochastic approximation method for optimization problems, 

JACM 16, 511-516 

Shubert, B.O. (1972), A sequential method seeking the global maximum of a function, 

SIAM J. Numer. Anal. 9, 379-388 

Sigmund, K. (1993), Games of life|explorations in ecology, evolution, and behavior, 

Oxford University Press, Oxford UK 

Silverman, G. (1969), Remark on algorithm 315 (E4)|the damped Taylor's series method 

for minimizing a sum of squares and for solving systems of non-linear equations, 

CACM 12, 513 

Singer, E. (1962), Simulation and optimization of oil re nery design, in: Cooper (1962), 

pp. 62-74 

Sirisena, H.R. (1973), Computation of optimal controls using a piecewise polynomial 

parameterization, IEEE Trans. AC-18, 409-411 

Slagle, J.R. (1972), Einfuhrung in die heuristische Programmierung|kunstliche Intelligenz 

und intelligente Maschinen, Verlag Moderne Industrie, Munich 

Smith, C.S. (1962), The automatic computation of maximum likelihood estimates, National 

Coal Board, Scienti c Department, report SC-846-MR-40, London, June 1962 

Smith, D.E. (1973), An empirical investigation of optimum-seeking in the computer 

simulation situation, Oper. Res. 21, 475-497 

Smith, F.B., Jr., D.F. Shanno (1971), An improved Marquardt procedure for non-linear 

regressions, Technometrics 13, 63-74 

Smith, J. Maynard (1982), Evolution and the theory of games, Cambridge University 

Press, Cambridge UK 

Smith, J. Maynard (1989), Evolutionary genetics, Oxford University Press, Oxford UK 

Smith, L.B. (1969), Remark on algorithm 178 (E4)|direct search, CACM 12, 638 

Smith, N.H., D.F. Rudd (1964), The feasibility of directed random search, University of 

Wisconsin, Department of Chemical Engineering, report 

Snell, F.M. (Ed.) (1967), Progress in theoretical biology, vol. 1, Academic Press, New 

York 

Sorenson, H.W. (1969), Comparison of some conjugate direction procedures for function 

minimization, J. Franklin Inst. 288, 421-441


Soucek, B. and the IRIS Group (Eds.) (1992), Dynamic, genetic, and chaotic programming, 

vol. 5 of Sixth-Generation Computer Technology Series, Wiley-Interscience, 

New York 

Southwell, R.V. (1940), Relaxation methods in engineering science|a treatise on approximate 

computation, Oxford University Press, Oxford UK 

Southwell, R.V. (1946), Relaxation methods in theoretical physics, Clarendon Press, 

Oxford UK 

Spath, H. (1967), Algorithm 315 (E4, C5)|the damped Taylor's series method for minimizing 

a sum of squares and for solving systems of nonlinear equations, CACM 10, 

726-728 

Spang, H.A., III (1962), A review of minimization techniques for nonlinear functions, 


Spears, W.M., K.A. De Jong, T. Back, D.B. Fogel, H. de Garis (1993), An overview of 

evolutionary computation, in: Brazdil (1993), pp. 442-459 

Spedicato, E. (1973), Stability of Huang's update for the conjugate gradient method, 

JOTA 11, 469-479 

Spendley, W. (1969), Nonlinear least squares tting using a modi ed simplex minimization 

method, in: Fletcher (1969a), pp. 259-270 

Spendley, W., G.R. Hext, F.R. Himsworth (1962), Sequential application of simplex 

designs in optimisation and evolutionary operation, Technometrics 4, 441-461 

Speyer, J.L., H.J. Kelley, N. Levine, W.F. Denham (1971), Accelerated gradient projection 

technique with application to rocket trajectory optimization, Automatica 7, 

37-43 

Sprave, J. (1993), Zellulare Evolutionare Algorithmen zur Parameteroptimierung, in: 

Hofestadt, Kruckeberg, and Lengauer (1993), pp. 111-120 

Sprave, J. (1994), Linear neighborhood evolution strategy, in: Sebald and Fogel (1994), 

pp. 42-51 

Stanton, E.L. (1969), A discrete element analysis of elasto-plastic plates by energy minimization, 

Ph.D. thesis, Case Western Reserve University, Jan. 1969 

Stark, R.M., R.L. Nicholls (1972), Mathematical foundations for design|civil engineering 

systems, McGraw-Hill, New York 

Stebbins, G.L. (1968), Evolutionsprozesse, G. Fischer, Stuttgart 

Stein, M.L. (1952), Gradient methods in the solution of systems of linear equations, NBS 

J. Research 48, 407-413


Steinbuch, K. (1971), Automat und Mensch, 4th ed., Springer, Berlin 

Steinbuch, K., S.W. Wagner (Eds.) (1964), Neuere Ergebnisse der Kybernetik, Oldenbourg, 

Munich 

Stender, J. (Ed.) (1993), Parallel genetic algorithms|theory and applications, IOS 

Press, Amsterdam 

Steuer, R.E. (1986), Multiple criteria optimization|theory, computation, and application, 

Wiley, New York 

Stewart, E.C., W.P.Kavanaugh, D.H. Brocker (1967), Study of a global search algorithm 

for optimal control, Proceedings of the Vth International Analogue Computation 

Meeting, Lausanne, Aug.-Sept. 1967, pp. 207-230 

Stewart, G.W. (1967), A modi cation of Davidon's minimization method to accept difference 

approximations of derivatives, JACM 14, 72-83 

Stewart, G.W. (1973), Conjugate direction methods for solving systems of linear equations, 

Numer. Math. 21, 285-297 

Stiefel, E. (1952), Uber einige Methoden der Relaxationsrechnung, ZAMP 3, 1-33 

Stiefel, E. (1965), Einfuhrung in die numerische Mathematik, 4th ed., Teubner, Stuttgart 

Stoer, J., C. Witzgall (1970), Convexity and optimization in nite dimensions I, Springer, 

Berlin 

Stolz, O. (1893), Grundzuge der Di erential- und Integralrechnung, erster Teil|reelle 

Veranderliche und Functionen, Abschnitt V|die gro ten und kleinsten Werte der 

Functionen, pp. 199-258, Teubner, Leipzig, Germany 

Stone, H.S. (1973a), Parallel computation|an introduction, IEEE Trans. C-22, 709-710 

Stone, H.S. (1973b), An e cient parallel algorithm for the solution of a tri-diagonal 

linear system of equations, JACM 20, 27-38 

Storey, C. (1962), Applications of a hill climbing method of optimization, Chem. Engng. 

Sci. 17(1), 45-52 

Storey, C., Rosenbrock (1964), On the computation of the optimal temperature pro le 

in a tubular reaction vessel, in: Balakrishnan and Neustadt (1964), pp. 23-64 

Stratonovich, R.L. (1968), Does there exist a theory of synthesis of optimal adaptive, 

self-learning and self-adaptive systems? ARC 29, 83-92 

Stratonovich, R.L. (1970), Optimal algorithms of the stochastic approximation type, 

Engng. Cybern. 8, 20-27 

Strongin, R.G. (1970), Multi-extremal minimization, ARC 31, 1085-1088


Strongin, R.G. (1971), Minimization of many-extremal functions of several variables, 

Engng. Cybern. 9, 1004-1010 

Suchowitzki, S.I., L.I. Awdejewa (1969), Lineare und konvexe Programmierung, Oldenbourg, 

Munich 

Sugie, N. (1964), An extension of Fibonaccian searching to multi-dimensional cases, 

IEEE Trans. AC-9, 105 

Sutti, C., L. Trabattoni, P. Brughiera (1972), A method for minimization of a onedimensional 

nonunimodal function, in: Szego (1972), pp. 181-192 

Svechinskii, V.B. (1971), Random search in probabilistic iterative algorithms, ARC 32, 

76-80 

Swann, W.H. (1964), report on the development of a new direct searching method of optimization, 

ICI Central Instrument Laboratory, research note 64-3, Middlesborough, 

Yorks, June 1964 

Swann, W.H. (1969), A survey of non-linear optimization techniques, FEBS-Letters 

2(Suppl.), S39-S55 

Swann, W.H. (1972), Direct search methods, in: Murray (1972a), pp. 13-28 

Sweschnikow, A.A. (Ed.) (1970), Wahrscheinlichkeitsrechnung und mathematische Statistik 

in Aufgaben, Teubner, Leipzig, Germany 

Sydow, A. (1968), Eine Methode zur exakten Realisierung des Gradientenverfahrens auf 

dem iterativ-arbeitenden Analogrechner, messen-steuern-regeln 11, 462-465 

Sydow, A. (Ed.) (1993), Simulationstechnik, 8th Symposium Simulationstechnik, Berlin, 

Sept. 1993, Vieweg, Braunschweig, Germany 

Synge, J.L. (1944), A geometrical interpretation of the relaxation method, Quart. Appl. 

Math. 2, 87-89 

Szczerbicka, H., P. Ziegler (Eds.) (1993), 6th Workshop Simulation und Kunstliche Intelligenz, 

Karlsruhe, Germany, April 22-23, 1993, Mitteilungen aus den Arbeitskreisen 

der ASIM, Arbeitsgemeinschaft Simulation in der Gesellschaft fur Informatik (GI), 

Bonn 

Szego, G.P. (Ed.) (1972), Minimization algorithms, mathematical theories, and computer 

results, Academic Press, New York 

Szego, G.P., G. Treccani (1972), Axiomatization of minimization algorithms and a new 

conjugate gradient method, in: Szego (1972), pp. 193-216 

Tabak, D. (1969), Comparative study of various minimization techniques used in mathematical 

programming, IEEE Trans. AC-14, 572


Tabak, D. (1970), Applications of mathematical programming techniques in optimal 

control|a survey, IEEETrans. AC-15, 688-690 

Talkin, A.I. (1964), The negative gradient method extended to the computer programming 

of simultaneous systems of di erential and nite equations, AFIPS Conf. Proc. 

26, 539-543 

Tapley, B.D., J.M. Lewallen (1967), Comparison of several numerical optimization methods, 

JOTA 1, 1-32 

Taran, V.A. (1968a), A discrete adaptive system with random search for the optimum, 

Engng. Cybern. 6(4), 142-150 

Taran, V.A. (1968b), Adaptive systems with random extremum search, ARC 29, 1447- 

1455 

Tazaki, E., A. Shindo, T. Umeda (1970), Decentralized optimization of a chemical process 

by a feasible method, IFAC Kyoto Symposium on Systems Engineering Approach 

to Computer Control, Kyoto, Japan, Aug. 1970, paper 25.1 

Thom, R. (1969), Topological models in biology, Topology 8, 313-336 

Thomas, M.E., D.J. Wilde (1964), Feed-forward control of over-determined systems by 

stochastic relaxation, in: Blakemore and Davis (1964), pp. 16-22 

Todd, J. (1949), The condition of certain matrices I, Quart. J. Mech. Appl. Math. 2, 

469-472 

Tokumaru, H., N. Adachi, K. Goto (1970), Davidon's method for minimization problems 

in Hilbert space with an application to control problems, SIAM J. Contr. 8, 163-178 

Tolle, H. (1971), Optimierungsverfahren fur Variationsaufgaben mit gewohnlichen Di erentialgleichungen 

als Nebenbedingungen, Springer, Berlin 

Tomlin, F.K., L.B. Smith (1969), Remark on algorithm 178 (E4)|direct search, CACM 

12, 637-638 

Torn, A., A. Zilinskas (1989), Global optimization, vol. 350 of Lecture Notes in Computer 

Science, Springer, Berlin 

Tovstucha, T.I. (1960), The e ect of random noise on the steady-state operation of a 

step-type extremal system for an object with a parabolic characteristic, ARC 21, 

398-404 

Traub, J.F. (1964), Iterative methods for the solution of equations, Prentice-Hall, Englewood 

Cli s NJ 

Treccani, G., L. Trabattoni, G.P. Szego (1972), A numerical method for the isolation of 

minima, in: Szego (1972), pp. 239-289


Tsypkin, Ya.Z. see also under Zypkin, Ja.S. 

Tsypkin, Ya.Z. (1968a), All the same, does a theory of synthesis of optimal adaptive 

systems exist? ARC 29, 93-98 

Tsypkin, Ya.Z. (1968b), Optimal hybrid adaptation and learning algorithms, ARC 29, 

1271-1276 

Tsypkin, Ya.Z. (1968c), Self-learning, what is it? IEEE Trans. AC-13, 608-612 

Tsypkin, Ya.Z. (1970a), On learning systems, IFAC Kyoto Symposium on Systems Engineering 

Approach to Computer Control, Kyoto, Japan, Aug. 1970, paper 34.1 

Tsypkin, Ya.Z. (1970b), Generalized learning algorithms, ARC 31, 86-92 

Tsypkin, Ya.Z. (1971), Smoothed randomized functionals and algorithms in adaptation 

and learning theory, ARC 32, 1190-1209 

Tsypkin, Ya.Z., A.S. Poznyak (1972), Finite learning automata, Engng. Cybern. 10, 

478-490 

Tzeng, G.-H., P.L. Yu (Eds.) (1992), Proceedings of the 10th International Conference 

on Multiple Criteria Decision Making, Taipei, July 19-24, 1992, National Chiao 

Tung University, Taipei, Taiwan 

Ueing, U. (1971), Zwei Losungsmethoden fur nichtkonvexe Programmierungsprobleme, 


Ueing, U. (1972), A combinatorial method to compute a global solution of certain nonconvex 

optimization problems, in: Lootsma (1972a), pp. 223-230 

Unbehauen, H. (1971), On the parameter optimization of multivariable control systems, 

in: Schwarz (1971), technical papers, vol. 2, pp. 2.2.10-1 to 2.2.10-11 

Vagin, V.N., L.Ye. Rudelson (1968), An example of a self-organizing system, Engng. 

Cybern. 6(6), 33-40 

Vajda, S. (1961), Mathematical programming, Addison-Wesley, Reading MA 

Vajda, S. (1967), The mathematics of experimental design, Gri n, London 

Vanderplaats, G.N. (1984), Numerical optimization techniques for engineering design| 

with applications, McGraw-Hill, New York 

VanNorton, R. (1967), Losung linearer Gleichungssysteme nach dem Verfahren von 

Gauss-Seidel, in: Ralston and Wilf (1967), pp. 92-105 

Varah, J.M. (1965), Certi cation of algorithm 203 (E4)|steep 1, CACM 8, 171


Varela, F.J., P. Bourgine (Eds.) (1992), Toward a practice of autonomous systems, 

Proceedings of the 1st European Conference on Arti cial Life (ECAL), Paris, Dec. 

11-13, 1991, MIT Press, Cambridge MA 

Varga, J. (1974), Praktische Optimierung|Verfahren und Anwendungen der linearen 

und nichtlinearen Optimierung, Oldenbourg, Munich 

Varga, R.S. (1962), Matrix iterative analysis, Prentice-Hall, Englewood Cli s NJ 

Vaysbord, E.M. (1967), Asymptotic estimates of the rate of convergence of random 

search, Engng. Cybern. 5(4), 22-32 

Vaysbord, E.M. (1968), Convergence of a method of random search, Engng. Cybern. 

6(3), 44-48 

Vaysbord, E.M. (1969), Convergence of a certain method of random search for a global 

extremum of a random function, Engng. Cybern. 7(1), 46-50 

Vaysbord, E.M., D.B. Yudin (1968), Multiextremal stochastic approximation, Engng. 

Cybern. 6(5), 1-11 

Venter, J.H. (1967), An extension of the Robbins-Monro procedure, Ann. Math. Stat. 

38, 181-190 

Viswanathan, R., K.S. Narendra (1972), A note on the linear reinforcement scheme for 

variable-structure stochastic automata, IEEE Trans. SMC-2, 292-294 

Vitale, P., G. Taylor (1968), A note on the application of Davidon's method to nonlinear 

regression problems, Technometrics 10, 843-849 

Vogelsang, R. (1963), Die mathematische Theorie der Spiele, Dummler, Bonn 

Voigt, H.-M. (1989), Evolution and optimization|an introduction to solving complex 

problems by replicator networks, Akademie-Verlag, Berlin 

Voigt, H.-M., H. Muhlenbein, H.-P. Schwefel (Eds.) (1990), Evolution and optimization 

'89|Selected papers on evolution theory, combinatorial optimization and related 

topics, Wartburg Castle, Eisenach, April 2-4, 1989, Akademie-Verlag, Berlin 

Voltaire, F.M. Arouet de (1759), Candide oder der Optimismus, Insel Verlag, Frankfort/Main, 

1973 

Volz, R.A. (1965), The minimization of a function by weighted gradients, IEEE Proc. 

53, 646-647 

Volz, R.A. (1973), Example of function optimization via hybrid computation, Simulation 

21, 43-48 

Waddington, C.H. (Ed.) (1968), Towards a theoretical biology I|prolegomena, Edinburgh 

University Press, Edinburgh


Wald, A. (1966), Sequential analysis, 8th ed., Wiley, NewYork 

Wallack, P. (1964), Certi cation of algorithm 203 (E4)|steep 1, CACM 7, 585 

Walsh, J. (Ed.) (1966), Numerical analysis|an introduction, Academic Press, London 

Ward, L., A. Nag, L.C.W. Dixon (1969), Hill-climbing techniques as a method of calculating 

the optical constants and thickness of a thin metallic lm, Brit. J. Appl. 

Phys. (J. Phys. D), Ser. 2, 2, 301-304 

Wasan, M.T. (1969), Stochastic approximation, Cambridge University Press, Cambridge 

UK 

Wasscher, E.J. (1963a), Algorithm 203 (E4)|steep 1, CACM 6, 517-519 

Wasscher, E.J. (1963b), Algorithm 204 (E4)|steep 2, CACM 6, 519 

Wasscher, E.J. (1963c), Remark on algorithm 129 (E4)|minifun, CACM 6, 521 

Wasscher, E.J. (1965), Remark on algorithm 205 (E4)|ative, CACM 8, 171 

Weber, H.H. (1972), Einfuhrung in Operations Research, Akademische Verlagsgesellschaft, 

Frankfort/Main 

Wegge, L. (1966), On a discrete version of the Newton-Raphson method, SIAM J. Numer. 

Anal. 3, 134-142 

Weinberg, F. (Ed.) (1968), Einfuhrung in die Methode Branch and Bound, Springer, 

Berlin 

Weinberg, F., C.A. Zehnder (Eds.) (1969), Heuristische Planungsmethoden, Springer, 

Berlin 

Weisman, J., C.F. Wood (1966), The use of optimal search for engineering design, in: 


Weisman, J., C.F. Wood, L. Rivlin (1965), Optimal design of chemical process systems, 

AIChE Engineering Progress Symposium Series 61, no. 55, pp. 50-63 

Weiss, E.A., D.H. Archer, D.A. Burt (1961), Computer sets tower for best run, Petrol 

Re ner 40(10), 169-174 

Wells, M. (1965), Algorithm 251 (E4)|function minimization (Flepomin), CACM 8, 

169-170 

Werner, J. (1974), Uber die Konvergenz des Davidon-Fletcher-Powell-Verfahrens fur 

streng konvexe Minimierungsaufgaben im Hilbert-Raum, Computing 12, 167-176 

Wheeling, R.F. (1960), Optimizers|their structure, CACM 3, 632-638


White, L.J., R.G. Day (1971), An evaluation of adaptive step-size random search, IEEE 

Trans. AC-16, 475-478 

White, R.C., Jr. (1970), Hybrid-computer optimization of systems with random parameters, 

Ph.D. thesis, University of Arizona, Tucson AZ, June 1970 

White, R.C., Jr. (1971), A survey of random methods for parameter optimization, 

Simulation 17, 197-205 

Whitley, L.D. (1991), Fundamental principles of deception in genetic search, in: Rawlins 

(1991), pp. 221-241 

Whitley, L.D. (Ed.) (1993), Foundations of Genetic Algorithms 2, Morgan Kaufmann, 

San Mateo CA 

Whitley, V.W. (1962), Algorithm 129 (E4)|minifun, CACM 5, 550-551 

Whittle, P. (1971), Optimization under constraints|theory and applications of nonlinear 

programming, Wiley-Interscience, London 

Wiener, N. (1963), Kybernetik|Regelung und Nachrichtenubertragung in Lebewesen 

und Maschine, Econ-Verlag, Dusseldorf, Germany 

Wiener, N., J.P. Schade (Eds.) (1965), Progress in biocybernetics, vol. 2, Elsevier, 

Amsterdam 

Wilde, D.J. (1963), Optimization by the method of contour tangents, AIChE J. 9(2), 

186-190 

Wilde, D.J. (1964), Optimum seeking methods, Prentice-Hall, Englewood Cli s NJ 

Wilde, D.J. (1965), A multivariable dichotomous optimum-seeking method, IEEE Trans. 

AC-10, 85-87 

Wilde, D.J. (1966), Objective function indistinguishability in unimodal optimization, in: 


Wilde, D.J., C.S. Beightler (1967), Foundations of optimization, Prentice-Hall, Englewood 

Cli s NJ 

Wilkinson, J.H. (1965), The algebraic eigenvalue problem, Oxford University Press, London 

Wilkinson, J.H., C. Reinsch (1971), Handbook for automatic computation, vol. 2| 

Linear algebra, Springer, Berlin 

Wilson, E.O., W.H. Bossert (1973), Einfuhrung in die Populationsbiologie, Springer, 

Berlin


Witt, U. (1992), Explaining process and change|approachestoevolutionary economics, 

University of Michigan Press, Ann Arbor MI 

Witte, B.F.W., W.R. Holst (1964), Two new direct minimum search procedures for 

functions of several variables, AFIPS Conf. Proc. 25, 195-209 

Witten, I.H. (1972), Comments on \use of stochastic automata for parameter selfoptimization 

with multimodal performance criteria", IEEE Trans. SMC-2, 289-292 

Wolf, G., T. Legendi, U. Schendel (Eds.) (1990), Parcella '90, Proceedings of the 5th 

International Workshop on Parallel Processing by Cellular Automata and Arrays, 

Berlin, Sept. 17-21, 1990, vol. 2 of Research in Informatics, Akademie-Verlag, Berlin 

Wolfe, P. (1959a), The simplex method for quadratic programming, Econometrica 27, 

382-398 

Wolfe, P. (1959b), The secant method for simultaneous nonlinear equations, CACM 2, 

12-13 

Wolfe, P. (1966), On the convergence of gradient methods under constraints, IBM Zurich, 

Switzerland, Research Laboratory report RZ-204, March 1966 

Wolfe, P. (1967), Another variable metric method, IBM working paper 

Wolfe, P. (1969), Convergence conditions for ascent methods, SIAM Review 11, 226-235 

Wolfe, P. (1970), Convergence theory in nonlinear programming, in: Abadie (1970), pp. 

1-36 

Wolfe, P. (1971), Convergence conditions for ascent methods II|some corrections, SIAM 

Review 13, 185-188 

Wol , W., C.-J. Soeder, F.R. Drepper (Eds.) (1988), Ecodynamics|Contributions to 

theoretical ecology, Springer, Berlin 

Wood, C.F. (1960), Application of direct search to the solution of engineering problems, 

Westinghouse Research Laboratory, scienti c paper 6-41210-1-P1, Pittsburgh PA, 

Oct. 1960 

Wood, C.F. (1962), Recent developments in direct search techniques, Westinghouse Research 

Laboratory, research paper 62-159-522-Rl, Pittsburgh PA 

Wood, C.F. (1965), Review of design optimization techniques, IEEE Trans. SSC-1, 

14-20 

Yates, F. (1967), A fresh look at the basic principles of the design and analysis of 

experiments, in: LeCam and Neyman (1967), pp. 777-790


Youden, W.J., O. Kempthorne, J.W. Tukey, G.E.P.Box, J.S. Hunter, F.E. Satterthwaite, 

T.A. Budne (1959), Discussion of the papers of Messrs. Satterthwaite and Budne, 

Technometrics 1, 157-193 

Yovits, M.C., S. Cameron (Eds.) (1960), Self-organizing systems, Pergamon Press, Oxford 

UK 

Yovits, M.C., G.T. Jacobi, D.G. Goldstein (Eds.) (1962), Self-organizing systems, Spartan, 

Washington, DC 

Yudin, D.B. (1965), Quantitative analysis of complex systems I, Engng. Cybern. 3(1), 

1-9 

Yudin, D.B. (1966), Quantitative analysis of complex systems II, Engng. Cybern. 4(1), 

1-13 

Yudin, D.B. (1972), New approaches to formalizing the choice of decisions in complex 

situations, ARC 33, 747-756 

Yvon, J.P. (1972), On some random search methods, in: Szego (1972), pp. 313-335 

Zach, F. (1974), Technisches Optimieren, Springer, Vienna 

Zadeh, L.A., L.W. Neustadt, A.V. Balakrishnan (Eds.) (1969a), Computing methods in 

optimization problems 2, Academic Press, London 

Zadeh, L.A., L.W. Neustadt, A.V. Balakrishnan (Eds.) (1969b), Computing methods in 

optimization problems, Springer, Berlin 

Zadeh, N. (1970), A note on the cyclic coordinate ascent method, Mgmt. Sci. 16, 

642-644 

Zahradnik, R.L. (1971), Theory and techniques of optimization for practicing engineers, 

Barnes and Noble, New York 

Zakharov, V.V. (1969), A random search method, Engng. Cybern. 7(2), 26-30 

Zakharov, V.V. (1970), The method of integral smoothing in many-extremal and stochastic 

problems, Engng. Cybern. 8, 637-642 

Zangwill, W.I. (1967), Minimizing a function without calculating derivatives, Comp. J. 

10, 293-296 

Zangwill, W.I. (1969), Nonlinear programming|a uni ed approach, Prentice-Hall, Englewood 

Cli s NJ 

Zeleznik, F.J. (1968), Quasi-Newton methods for nonlinear equation, JACM 15, 265-271 

Zellnik, H.E., N.E. Sondak, R.S. Davis (1962), Gradient search optimization, Chem. 

Engng. Progr. 58(8), 35-41


Zerbst, E.W. (1987), Bionik, Teubner, Stuttgart 

Zettl, G. (1970), Ein Verfahren zum Minimieren einer Funktion bei eingeschranktem 

Variationsbereich der Parameter, Numer. Math. 15, 415-432 

Zhigljavsky, A.A. (1991), Theory of global random search, Kluwer, Dordrecht, The 

Netherlands 

Zigangirov, K.S. (1965), Optimal search in the presence of noise, Engng. Cybern. 3(4), 

112-116 

Zoutendijk, G. (1960), Methods of feasible directions|a study in linear and nonlinear 

programming, Elsevier, Amsterdam 

Zoutendijk, G. (1970), Nonlinear programming|computational methods, in: Abadie 

(1970), pp. 37-86 

Zurmuhl, R. (1965), Praktische Mathematik fur Ingenieure und Physiker, 5th ed., Springer, 

Berlin 

Zwart, P.B. (1970), Nonlinear programming|a quadratic analysis of ridge paralysis, 

JOTA 6, 331-339 

Zwart, P.B. (1973), Nonlinear programming|counterexample to two global optimization 

algorithms, Oper. Res. 21, 1260-1266 

Zypkin, Ja.S. see also under Tsypkin, Ya.Z. 

Zypkin, Ja.S. (1966), Adaption und Lernen in automatischen Systemen, Oldenbourg, 

Munich 

Zypkin, Ja.S. (1967), Probleme der Adaption in automatischen Systemen, messen-steuernregeln 

10, 362-365 

Zypkin, Ja.S. (1970), Adaption und Lernen in kybernetischen Systemen, Oldenbourg, 

Munich


Glossary of Abbreviations 

AAAS American Association for the Advancement of Science 

ACM Association for Computing Machinery 

AEG Allgemeine Elektricitats-Gesellschaft 

AERE Atomic Energy Research Establishment 

AFIPS American Federation of Information Processing Societies 

AGARD Advisory Group for Aerospace Research and Development 

AIAA American Institute of Aeronautics and Astronautics 

AIChE American Institute of Chemical Engineers 

AIEE American Institute of Electrical Engineers 

ANL Argonne National Laboratory 

ARC Automation and Remote Control 

(cover-to-cover translation of Avtomatika iTelemechanika) 

ASME American Society ofMechanical Engineers 

BIT Nordisk Tidskrift for Informationsbehandling 

CACM Communications of the ACM 

DFVLR Deutsche Forschungs- und Versuchsanstalt fur Luft{ und Raumfahrt 

DGRR Deutsche Gesellschaft fur Raketentechnik und Raumfahrt 

DLR Deutsche Luft- und Raumfahrt 

FEBS Federation of European Biochemical Societies 

GI Gesellschaft fur Informatik 

GMD Gesellschaft fur Mathematik und Datenverarbeitung 

IBM International Business Machines Corporation 

ICI Imperial Chemical Industries Limited 

IEE Institute of Electrical Engineers 

IEEE Institute of Electrical and Electronics Engineers 

Transactions AC on Automatic Control 

BME on Bio-Medical Engineering 

C on Computers 

MIL on Military Electronics 

MTT on Microwave Theory and Techniques 

NN on Neural Networks 

SMC on Systems, Man, and Cybernetics 

SSC on Systems Science and Cybernetics 

IFAC International Federation of Automatic Control 

IIASA International Institute for Applied Systems Analysis 

IMACS International Association for Mathematics and Computers in Simulation 

IRE Institute of Radio Engineers 

Transactions EC on Electronic Computers 

EM on Engineering Management 

ISA Instrument Society of America 

JACM Journal of the ACM

JIMA Journal of the Institute of Mathematics and Its Applications 

JOTA Journal of Optimization Theory and Applications 

KFA Kernforschungsanlage (Nuclear Research Center) Julich 

KfK Kernforschungszentrum (Nuclear Research Center) Karlsruhe 

MIT Massachusetts Institute of Technology 

NASA National Aeronautics and Space Administration 

NBS National Bureau of Standards 

NTZ Nachrichtentechnische Zeitschrift 

PPSN Parallel Problem Solving from Nature 

SIAM Society for Industrial and Applied Mathematics 

UKAEA United Kingdom Atomic Energy Authority 

VDE Verband Deutscher Elektrotechniker 

VDI Verein Deutscher Ingenieure 

WGLR Wissenschaftliche Gesellschaft fur Luft- und Raumfahrt 

ZAMM Zeitschrift fur angewandte Mathematik und Mechanik 

ZAMP Zeitschrift fur angewandte Mathematik und Physik 

323

324 References

Appendix A 

Catalogue of Problems 

The catalogue is divided into three groups of test problems corresponding to the three 

divisions of the numerical strategy comparison. The optimization problems are all formulated 

as minimum problems with a speci ed objective function F (x) and solution x . 

For the second set of problems, the initial conditions x (0) are also given. Occasionally, 

further local minima and other stationary points of the objective function are also indicated. 

Inequality constraints are formulated such that the constraint functions Gj(x) are 

all greater than zero within the allowed or feasible region. If a solution lies on the edge 

of the feasible region, then the active constraints are mentioned. The values of these constraint 

functions must be just equal to zero at the optimum. Where possible the structure 

of the minimum problem is depicted geometrically by means of a two dimensional contour 

diagram with lines F (x 1x 2)=const: and as a three dimensional picture in which values 

of F (x 1x 2) are plotted as elevation over the (x 1x 2) plane. Additionally, thevalues of 

the objective function on the contour lines are speci ed. Constraints are shown as bold 

lines in the contour diagrams. In the 3D plots the objective function is mostly oored 

to minimal values within non-feasible regions. In some cases there is a brief mention of 

any especially characteristic behavior shown by individual strategies during their iterative 

search for the minimum. 

A.1 Test Problems for the First Part of the 

Strategy Comparison 

Problem 1.1 (sphere model) 

Objective function: 

Minimum: 

F (x)= 

nX 

i=1 

x 2 

i 

x i =0 for i = 1(1)n F (x )=0 

For n = 2 a contour diagram as well as a 3D plot are sketched under Problem 2.17. For 

this, the simplest of all quadratic problems, none of the strategies fails. 

325

326 Appendix A 

Problem 1.2 


Minimum: 

F (x) = 

0 

nX 

@ iX 

i=1 

j=1 

xj 

1 

A 

x i =0 for i =1(1)n F (x )=0 

A contour diagram as well as a 3D plot for n = 2 are given under Problem 2.9. The 

objective function of this true quadratic minimum problem can be written in matrix 

notation as: 

F (x) =x T Ax 

The n n matrix of coe cients A is symmetric and positive-de nite. According to 

Schwarz, Rutishauser, and Stiefel (1968) its condition number K is a measure of the 

numerical di culty of the problem. Among other de nitions, that of Todd (1949) is 

useful, namely: 

where 

K = max 

min 

= a2 

max 

a 2 

min 

max = max 

i fj ij i= 1(1)ng 

and similarly for min. The i are the eigenvalues of the matrix A, and the ai are the 

lengths of the semi-axes of an n-dimensional elliptic contour surface F (x) =const. 

Condition numbers for the present matrix 

A =(aij) = 

2 

6 

6 

4 

n n ; 1 n ; 2 ::: n ; j +1 ::: 1 

n ; 1 n ; 1 n ; 2 ::: n ; j +1 ::: 1 

n ; 2 n ; 2 n ; 2 ::: n ; j +1 ::: 1 

. 

. 

. 

. 

n ; i +1 ::: n ; i +1 ::: n ; i +1 ::: ::: 1 

. 

. 

. 

. 

1 1 1 ::: 1 ::: 1 

were calculated for various values of n by means of an algorithm of Greenstadt (1967b), 

which uses the Jacobi method of diagonalization. As can be seen from the following table, 

K increases with the number of variables as O(n 2 ). 

2 

3 

7 

7 

5

Test Problems for the Second Part of the Strategy Comparison 327 

n K K=n 2 

1 1 1 

2 6.85 1.71 

3 16.4 1.82 

6 64.9 1.80 

10 175 1.75 

20 678 1.69 

30 1500 1.67 

60 5930 1.65 

100 16400 1.64 

Not all the search methods achieved the required accuracy. For many variables the coordinate 

strategies and the complex method of Box terminated the search prematurely. 

Powell's method of conjugate gradients even got stuck without the termination criterion 

taking e ect. 

A.2 Test Problems for the Second Part of the 


Problem 2.1 after Beale (1958) 


F (x) =[1:5 ; x 1 (1 ; x 2)] 2 + h 2:25 ; x 1 (1 ; x 2 

2) i 2 

+ h 2:625 ; x 1 (1 ; x 3 

2) i 2 

Figure A.1: Graphical representation of Problem 2.1 

F (x) ==0:1 1 4' 14:20 36 100=


Minimum: 

x =(3 0:5) F (x )=0 

Besides the strong minimum x there is a weak minimum at in nity: 

Saddle point: 

Start: 

x 0 ! (;1 1) F (x 0 ) ! 0 

x 00 =(0 1) F (x 00 ) ' 14:20 

x (0) =(0 0) F (x (0) ) ' 14:20 

For very large initial step lengths the (1+1) evolution strategy converged once to the weak 

minimum x 0 . 

Problem 2.2 

As Problem 2.1, but with: 

Start: 

Problem 2.3 


x (0) =(0:1 0:1) F (x (0) ) ' 12:99 

q 

F (x) =;jx sin( jxj)j 

Figure A.2: Diagram F (x) for Problem 2.3


There are in nitely many local minima, the position of which can be speci ed by a 

transcendental equation: q q 

jx j = 2 tan ( jx j) 

For jx j 1wehave approximately 


x ' ( (0:5+k)) 2 for k =1 2 3:::integer 

F (x ) 'jx j 

Whereas in reality none of the nite local minima is at the same time a global minimum, 

the nite word length of the digital computer used together with the system-speci c 

method of evaluating the sine function give rise to an apparent global minimum at 

x = 4:44487453 10 16 

F (x )=;4:44487453 10 16 

Counting from the origin it is the 67 108 864th local minimum in each direction. If x is 

increased above this value, the objective function value is always set to zero. (Note that 

this behavior is machine dependent.) 

Start: 

x (0) =0 F (x (0) )=0 

Most strategies located the rst or highest local minimum left or right of the starting 

point (the origin). Depending on the sequence of random numbers, the two membered 

evolution method found (for example) the 2nd, 9th, and 34th local minimum. Only the 

(10, 100) evolution strategy almost always reached the apparent global minimum. 

Problem 2.4 


Minimum: 

Start: 

F (x) = 

nX 

i=1 

Problem 2.5 after Booth (1949) 


x 1 ; x 2 

i 

2 

+(xi ; 1] 2 

for n =5 

x i =1 for i = 1(1)n F (x )=0 

x (0) 

i =10 for i = 1(1)n F (x (0) ) = 40905 

F (x) =(x 1 +2x 2 ; 7) 2 +(2x 1 + x 2 ; 5) 2


Figure A.3: Graphical representation of Problem 2.4 for n =2 

F (x) ==10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 = 

This minimum problem is equivalent to solving the following pair of linear equations: 

x 1 +2x 2 =7 

2 x 1 + x 2 =5 


F (x) ==1 9 25 49 81 121 169 225=


An approach to the latter problem is to determine those values of x 1 and x 2 that minimize 

the error in the equations. The error is de ned here in the sense of a Gaussian 

approximation as the sum of the squares of the components of the residual vector. 

Minimum: 

Start: 

Problem 2.6 


x =(1 3) F (x )=0 

x (0) =(0 0) F (x (0) )=74 

F (x) = maxfjx 1 +2x 2 ; 7j j2 x 1 + x 2 ; 5jg 

This represents an attempt to solve the previous system of linear equations of Problem 

2.5 in the sense of a Tchebyche approximation. Accordingly, the error is de ned as the 

absolute maximum component of the residual vector. 

Minimum: 

Start: 

x =(1 3) F (x )=0 

x (0) =(0 0) F (x (0) )=7 


F (x) ==1 2 3 4 5 6 7 8 9 10 11=


Several of the search procedures wereunableto ndtheminimum. They converge to a 

point on the line x 1+x 2 = 4, which joins together the sharpest corners of the rhombohedral 

contours. The partial derivatives of the objective function are discontinuous there in the 

unit vector directions, parallel to the coordinate axes, no improvement can be made. 

Besides the coordinate strategies, the methods of Hooke and Jeeves and of Powell are 

thwarted by this property. 

Problem 2.7 after Box (1966) 


Minima: 

F (x)= 

10X 

j=1 

(exp (;0:1 jx 1) ; exp (;0:1 jx 2) ; x 3 [exp (;0:1 j) ; exp (;j)]) 2 

x =(1 10 1) F (x )=0 

x =(10 1 ;1) F (x )=0 

Besides these two equivalent, strong minima there is a weak minimum along the line 

x 0 

1 = x 0 

2 x 0 

3 =0 F (x 0 )=0 

Because of the 

into a region: 

nite computational accuracy the weak minimum is actually broadened 

x 00 

1 ' x 00 

2 x 00 

3 ' 0 F (x 00 )=0 if x1 1 

Figure A.6: Graphical representation of Problem 2.7 on the plane 

x3 =1F(x) ==0:03 0:3 1' 3:064 10 30=


Start: 

Figure A.7: Graphical representation of Problem 2.7 on the planes 

left: x3 = 0, right: x3 = ;1, 

F (x) ==0:03 0:3 1' 3:064 10 30= 

x (0) =(0 20 20) F (x (0) ) ' 1022 

Many strategies only roughly located the rst of the strong minima de ned above. The 

evolution strategies tended to converge to the weak minimum, since the minima are at 

equal values of the objective function. The second strong minimum, which is never referred 

to in the relevant literature, was sometimes found by the multimembered evolution 

strategy. 

Problem 2.8 

As Problem 2.7, but with 

Start: 

Problem 2.9 


Minimum: 

Start: 

x (0) =(0 10 20) F (x (0) ) ' 1031 

F (x) = 

nX 

i=1 

( 

iX 

j=1 

xj) 2 for n =5 

x i =0 for i = 1(1)n F (x )=0 

x (0) 

i =10 for i = 1(1)n F (x (0) ) = 5500



F (x) ==4 36 100 196 324 484= 

Problem 2.10 after Kowalik (1967 see also Kowalik and Morrison, 1968) 


F (x) = 

X11 

i=1 

ai ; x1 (b2 i + bi x2) b2 ! 2 

i + bi x3 + x4 Numerical values of the constants ai and bi for i = 1(1)11 can be taken from the following 

table: 

i ai b ;1 

i 

1 0.1957 0.25 

2 0.1947 0.5 

3 0.1735 1 

4 0.1600 2 

5 0.0844 4 

6 0.0627 6 

7 0.0456 8 

8 0.0342 10 

9 0.0323 12 

10 0.0235 14 

11 0.0246 16 

In this non-linear tting problem, formulated as a minimum problem, the free parameters


ajj = 1(1)4 of a function 

y(z) = 1 (z 2 + 2 z) 

z 2 + 3 z + 4 

have to be determined with reference to eleven data points fyizig such that the error, as 

measured by the Euclidean norm, is minimized (Gaussian or least squares approximation). 

Minimum: 

Start: 

x ' (0:1928 0:1908 0:1231 0:1358) F (x ) ' 0:0003075 

x (0) =(0 0 0 0) F (x (0) ) ' 0:1484 

Near the optimum, if the variables are changed in the last decimal place (with respect 

to the machine accuracy), rounding errors cause the objective function to behave almost 

stochastically. The multimembered evolution strategy with recombination yields the best 

solution. It deviates signi cantly from the optimum solution as de ned by Kowalik and 

Osborne (1968). Since this best value has a quasi-singular nature, it is repeatedly lost 

by the population of a (10 , 100) evolution strategy, with the result that the termination 

criterion of the search sometimes only takes e ect after a long time if at all. 

Problem 2.11 


Start: 

Problem 2.12 


Start: 

x (0) =(0:25 0:39 0:415 0:39) F (x (0) ) ' 0:005316 

x (0) =(0:25 0:40 0:40 0:40) F (x (0) ) ' 0:005566 

Problem 2.13 after Fletcher and Powell (1963) 


where 

F (x) = 

Ai = nP 

nX 

i=1 

(Ai ; Bi(x)) 2 for n =5 

(aij sin j + bij cos j) 

j=1 

Bi(x) = nP 

(aij sin xj + bij cos xj) 

j=1 

9 

>= 

> 

for i = 1(1)n 

aij and bij are integer random numbers in the range [;100 100], and i are random 

numbers in the range [; ]. A minimum of this problem is simultaneously a solution of 

the equivalent systemofn simultaneous non-linear (transcendental) equations:


Figure A.9: Graphical representation of Problem 2.13 for n =2: 

a11 = ;2 a12 = 27 a21 = ;70 a22 = ;48 

b11 = ;76 b12 = ;51 b21 = 63 b22 = ;50 

1 = ;3:0882 2 = 2:0559 

F (x) ==238:864 581:372 1403:11 3283:14 7153:45 

13635:3 21479:6 27961:4 31831:7 33711:8 34533:5= 

nX 

j=1 

(aij sin xj + bij cos xj) =Ai for i = 1(1)n 

The solution is again approximated in the least squares sense. 

Minimum: 

x i = i for i = 1(1)n F (x )=0 

Because the trigonometric functions are multivalued there are in nitely many equivalent 

minima (real solutions of the system of equations), of which upto2 n lie in the interval 

Start: 

i ; xi i + for i = 1(1)n 

x (0) 

i = i + i for i = 1(1)n 

where i are random numbers in the range [; =10 =10]. To provide the same conditions 

for all the search methods the same sequence of random numbers was used in each case, 

and hence 

F (x (0) ) ' 1182 

Because of the proximity of the starting point to the one solution, x i = i for i = 1(1)n, 

all the strategies approached this minimum only.


Problem 2.14 after Powell (1962) 


Minimum: 

Start: 

F (x) =(x 1 +10x 2) 2 +5(x 3 ; x 4) 2 +(x 2 ; 2 x 3) 4 +10(x 1 ; x 4) 4 

x =(0 0 0 0) F (x )=0 

x (0) =(3 ;1 0 1) F (x (0) )=215 

The matrix of second partial derivatives of the objective function goes singular at the 

minimum. Thus it is not surprising that a quasi-Newton method like thevariable metric 

method of Davidon, Fletcher, and Powell (applied here in Stewart's derivative-free form) 

got stuck a long way from the minimum. Geometrically speaking, there is a valley which 

becomes extremely narrow asitapproaches the minimum. Theevolution strategies therefore 

ended up by converging very slowly with a minimum step length, and the search had 

to be terminated for reasons of time. 

Problem 2.15 

As Problem 2.14, except: 

Start: 

Problem 2.16 after Leon (1966a) 


x (0) =(1 2 3 4) F (x (0) )=1512 

F (x) = 100 (x 2 ; x 3 

1) 2 +(x 1 ; 1) 2 


F (x) ==0:25 4 64 250 1000 5000 10000=


Minimum: 

Start: 

Problem 2.17 (sphere model) 


Minimum: 

Start: 

x =(1 1) F (x )=0 

x (0) =(;1:2 1) F (x (0) ) ' 749: 

F (x) = 

Problem 2.18 after Matyas (1965) 


Minimum: 

Start: 

nX 

i=1 

x 2 

i for n =5 

x i =0 for i =1(1)n F (x )=0 

x (0) 

i =10 for i = 1(1)n F (x (0) )=500 

F (x) =0:26 (x 2 

1 + x2 

2 ) ; 0:48 x1 x2 x =(0 0) F (x )=0 

x (0) =(15 30) F (x (0) )=76:5 


F (x) ==4 16 36 64 100 144 196=



F (x) ==1 3 10 30 100 300= 

The coordinate strategies terminated the search prematurely because of the lower bounds 

on the step lengths (as determined by themachine), which precluded making any more 

successful line searches in the coordinate directions. 

Problem 2.19 by Wood (after Colville, 1968) 


Minimum: 

F (x) = 100 x 1 ; x 2 

2 

2 

+(x 2 ; 1) 2 +90 x 3 ; x 2 

4 

2 

+(x 4 ; 1) 2 

+10:1 h (x1 ; 1) 2 +(x3 ; 1) 2 

i 

+19:8(x1 ; 1)(x3 ; 1) 

There is another stationary point near 

x =(1 1 1 1) F (x )=0 

x 0 ' (1 ;1 1 ;1) F (x 0 ) ' 8 

According to Himmelblau (1972a,b) there are still further local minima. 

Start: 

x (0) =(;1 ;3 ;1 ;3) F (x (0) ) = 19192 

Avery narrow valley appears to run from the stationary point x 0 to the minimum. All 

the coordinate strategies together with the methods of Hooke and Jeeves and of Powell 

ended the search in this region.


Problem 2.20 


Minimum: 

Start: 

Problem 2.21 


Minimum: 

Start: 

F (x) = 

nX 

i=1 

jxij for n =5 

x i =0 for i =1(1)n F (x )=0 

x (0) 

i =10 for i =1(1)n F (x (0) )=50 

F (x) = max 

i fjxij i= 1(1)ng for n =5 

x i =0 for i =1(1)n F (x )=0 

x (0) 

i =10 for i =1(1)n F (x (0) )=10 

Since the starting point is at a corner of the cubic contour surface, none of the coordinate 

strategies could nd a point withalower value of the objective function. The method of 


F (x) ==2 4 6 8 10 12 14 16 18 20=



F (x) ==2 4 6 8 10= 

Powell also ended the search without making any signi cant improvement on the initial 

condition. Both the simplex method of Nelder and Mead and the complex method of Box 

also had trouble in the minimum search in their cases the initially constructed simplex 

or complex collapsed long before reaching the minimum, again near one of the corners. 

Problem 2.22 


Minimum: 

Start: 

F (x) = 

nX 

i=1 

jxij + 

nY 

i=1 

jxij for n =5 

x i =0 for i = 1(1)n F (x )=0 

x (0) 

i =10 for i = 1(1)n F (x (0) ) = 100050 

The simplex and complex methods did not nd the minimum. As in the previous Problem 

2.21, this is due to the sharply pointed corners of the contours. The variable metric 

strategy also nally got stuck at one of these corners and converged no further. In this 

case the discontinuity in the partial derivatives of the objective function at the corners is 

to blame for its failure.


Problem 2.23 


Minimum: 


F (x) ==3 8 15 24 35 48 63 80 99= 

F (x) = 

nX 

i=1 

x 10 

i for n =5 

x i =0 for i =1(1)n F (x )=0 


F (x) ==2 10 4 10 6 10 8 10 10 10 =


Start: 

x (0) 

i =10 for i = 1(1)n F (x (0) )=5 10 10 

Only the two strategies that have a quadratic internal model of the objective function, 

namely the variable metric and conjugate directions methods, failed to converge, because 

the function F (x) isofmuch higher (10th) order. 

Problem 2.24 after Rosenbrock (1960) 


Minimum: 

Start: 

Problem 2.25 


Minimum: 

F (x) = 

F (x) = 100 (x 2 ; x 2 

1) 2 +(x 1 ; 1) 2 

x =(1 1) F (x )=0 

x (0) =(;1:2 1) F (x (0) )=24:2 

nX 

i=2 

x 1 ; x 2 

i 

2 

+(xi ; 1) 2 for n =5 

x i =1 for i = 1(1)n F (x )=0 


F (x) ==0:5 4 20 100 250 500 1000 2000 5000=


Start: 

x (0) 

i =10 for i = 1(1)n F (x (0) )=32724 

For n = 2 this becomes nearly the same as Problem 2.24. 

Problem 2.26 


q 

F (x)=;x sin( jxj) 

This problem is the same as Problem 2.3 except for the modulus. The di erence has 

the e ect that the neighboring minima are further apart here. The positions of the local 

minima and maxima are described under Problem 2.3. 

Start: 

x (0) =0 F (x (0) )=0 

Again, only the multimemberedevolution strategy converged to the apparent global minimum 

all the other methods only converged to the rst (nearest) local minimum. 

Problem 2.27 after Zettl (1970) 


Minimum: 

F (x) =(x 2 

1 + x 2 

2 ; 2 x 1) 2 +0:25 x 1 

x ' (;0:02990 0) F (x ) ';0:003791 

Figure A.18: Diagram F (x) of Problem 2.26



F (x) ==0:03 0:3 1 3 10 30= 

Because of rounding errors this same objective function value is reached for various pairs 

of values of x 1x 2: 

Local maximum: 

Saddle point: 

Start: 

x 0 ' (1:063 0) F (x 0 ) ' 1:258 

x 00 ' (1:967 0) F (x 00 ) ' 0:4962 

x (0) =(2 0) F (x (0) )=0:5 

Problem 2.28 of Watson (after Kowalik and Osborne, 1968) 


where 

F (x) = 

30X 

i=1 

0 

B 

@ 5X 

j=1 

ja j;1 

i xj+1 ; 

ai = 

2 

4 6X 

j=1 

i ; 1 

29 

a j;1 

i xj 

32 

5 

; 1 

1 

C 

A 

2 

+ x 2 

1 

The origin of this problem is the approximate solution of the ordinary di erential equation 

dz 

dy ; z2 =1


on the interval 0 y 1 with the boundary condition z(y =0)=0. The function 

sought, z(y), is to be approximated by a polynomial 

~z(c y) = 

nX 

j=1 

cj y j;1 

In the present case only the rst six terms are considered. Suitable values of the polynomial 

coe cients cj j = 1(1)6, are to be determined. The deviation from the exact 

solution of the di erential equation is measured in the Gaussian sense as the sum of the 

squares of the errors at m = 30 argument values yi, uniformly distributed in the range 

[0,1] 

F 1(c) = 

0 

mX 

@ 

i=1 

@~z(c y) 

; ~z 

@y yi 2 (c y) 

yi 

The boundary condition is treated as a second simultaneous equation by means of a 

similarly constructed term: 

F 2(c) = ~z 2 (c y) y=0 

By inserting the polynomial and rede ning the parameters ci as variables xi we obtain 

the objective function F (x) =F 1(x)+F 2(x), the minimum of which isanapproximate 

solution of the parameterized functional problem. 

Minimum: 

Start: 

x ' (;0:0158 1:012 ;0:2329 1:260 ;1:513 0:9928) F (x ) ' 0:002288 

; 1 

x (0) =(0 0 0 0 0 0) F (x (0) )=30 

Judging by the number of objective function evaluations all the search methods found 

this a di cult problem to solve. The best solution was provided by the complex strategy. 

Problem 2.29 after Beale (1967) 


Constraints: 

Minimum: 

Start: 

F (x) =2x 2 

1 +2x 2 

2 + x 2 

3 +2x 1 x 2 +2x 1 x 3 ; 8 x 1 ; 6 x 2 ; 4 x 3 +9 

x = 4 7 4 

 

3 9 9 

Gj(x) =xj 0 for j = 1(1)3 

G 4(x) =;x 1 ; x 2 ; 2 x 3 +3 0 

F (x )= 1 

9 only G 4 active i.e., G 4(x )=0 

x (0) =(0:1 0:1 0:1) F (x (0) )=7:29 

1 

A 

2


Problem 2.30 

As Problem 2.3, but with the constraints 

G 1(x) =;x +300 0 G 2(x) =x +300 0 

The introduction of constraints gives rise to two equivalent, global minima at the edge of 

the feasible region: 

Minima: 

x = 300 F (x ) ';299:7 G 1 or G 2 active 

In addition there are ve local minima within the feasible region. Here too, the absolute 

minima were only located by the multimembered evolution strategy. 

Problem 2.31 

As Problem 2.4, but with constraints: 

Minimum: 

Start: 

Gj(x) =xj ; 1 0 for j =1(1)n n =5 

x i =1 for i = 1(1)n F (x )=0 all Gj active 

x (0) 

i = ;10 for i = 1(1)n F (x (0) ) = 61105 

The starting point is located outside of the feasible region. 



Problem 2.32 after Bracken and McCormick (1970) 


Constraints: 

Minimum: 

F (x) =;x 2 

1 ; x 2 

2 

Gj(x) =xj 0 for j =1 2 

G 3(x) =;x 1 +1 0 G 4(x) =;x 1 ; 4 x 2 +5 0 

x =(1 1) F (x )=;2 G 3 and G 4 active 

Besides this global minimum there is another local one: 

Start: 

x 0 = 0 5 

4 

F (x 0 )=; 25 

16 G 1 and G 4 active 

x (0) =(0 0) F (x (0) )=0 

All the search methods converged to the global minimum. 

Problem 2.33 after Zettl (1970) 

As Problems 2.14 and 2.15, but with the constraints: 

Gj(x) =xj+2 ; 2 0 for j =1 2 


F (x) ==0:04 0:16 0:36 0:64 1:0 1:44 1:96 2:56 3:24 4=


Minimum: 

Start: 

x =(1:275 0:6348 2:0 2:0) F (x ) ' 189:1 all Gj active 

x (0) =(1 2 3 4) F (x (0) )=1512 

The (1+1) evolution strategy only solved the problem very inaccurately. Due to the 1=5 

success rule the mutation variances vanish prematurely. 

Problem 2.34 after Fletcher and Powell (1963) 


where 

or 

Constraints: 

Minimum: 

= 

F (x) = 100 h (x 3 ; 10 ) 2 +(R ; 1) 2i + x 2 

3 

8 

>< 

>: 

1 

2 

x 1 = R cos (2 ) 

x 2 = R sin (2 ) 

R = 

q x 2 

1 + x 2 

2 

arctan x2 

x1 if x 2 6= 0 and x 1 > 0 

1 

2 if x2 =0 

1 

2 

+ arctan x2 

x1 

if x 2 6= 0 and x 1 < 0 

G 1(x) =;x 3 +7:5 0 G 2(x) =x 3 +2:5 0 

x =(1 ' 0 0) F (x )=0 no constraint isactive 

The objective function itself has a discontinuity atx 2 = 0, right at the minimum sought. 

Thus x 2 should only be allowed to approach closely to zero. Because of the multivalued 

trigonometric functions there are in nitely many solutions to the problem, of which only 

one, however, lies within the feasible region. 

Start: 

Problem 2.35 after Rosenbrock (1960) 


x (0) =(;1 0 0) F (x (0) )=2500 

F (x)=;x 1 x 2 x 3


Constraints: 

Gj(x) =xj 0 for j = 1(1)3 

G 4(x) =;x 1 ; 2 x 2 ; 2 x 3 +72 0 

The underlying question here was: What dimension should a parcel of maximum volume 

have, if the sum of its length and transverse circumference is bounded? 

Minimum: 

Start: 

x =(24 12 12) F (x )=;3456 G 4 active 

x (0) =(0 0 0) F (x (0) )=0 

All variants of the evolution strategy converged only to within the neighborhood of the 

minimum sought, because in the end only a fraction of all trials were feasible. 

Problem 2.36 

This is derived from Problem 2.35 by treating the constraint G 4,which is active at the 

minimum, as an equation, and thereby eliminating one of the free variables. With 

we obtain 

x 0 

1 +2x 0 

2 +2x 0 

3 =72 

F 0 (x) =;(72 ; 2 x 0 

2 ; 2 x 0 

3) x 2 x 3 

or by renumbering of the variables a new objective function: 

F (x) =;x 1 x 2 (72 ; 2 x 1 ; 2 x 2) 


F (x) == ; 3400 ;3000 ;2000 ;1000 ;300 300 1000=


Constraints: 

Minimum: 

Start: 

Gj(x) =xj 0 for j =1 2 

x =(12 12) F (x )=;3456 no constraints are active 

Problem 2.37 (corridor model) 


Constraints: 

Gj(x) = 

8 

>< 

>: 

x (0) =(1 1) F (x (0) )=;68 

F (x) =; 

nX 

i=1 

xi for n =3 

;xj + 100 0 for j = 1(1)n 

P 

xj;n+1 ; 1 

j;n 

j;n 

i=1 

;xj;2 n+2 + 1 

j;2 n+1 

j;2 n+1 

xi + q j;n+1 

j;n 0 for n +1 j 2 n ; 1 

P 

i=1 

xi + q j;2 n+2 

j;2 n+1 0 for 2 n j 3 n ; 2 


F (x) == ; 220 ;215 ;210 ;205 ;200 ;195 

;190 ;185 ;180 ;175 ;170 ;165 ;160=


The constraints form a feasible region, which could be described as a corridor with a 

square cross section (three dimensionally speaking). The axis of the corridor runs along 

the diagonal in the space 

x 1 = x 2 = x 3 = :::= xn 

The contours of the linear objective function run perpendicular to this axis. In order 

to obtain a nite minimum further constraints were added, whereby a kind of pencil 

point is placed on the end of the corridor. In the absence of these additional constraints 

the problem corresponds to the corridor model used by Rechenberg (1973), for which he 

derived theoretically the rate of progress (a measure of the convergence rate) of the two 

membered evolution strategy. 

Minimum: 

Start: 

Problem 2.38 

x i =100 for i = 1(1)n F (x )=;300 G 1 to Gn active 

x (0) 

i =0 for i =1(1)n F (x (0) )=0 

As Problem 2.25, but with the additional constraints: 

Minimum: 

Start: 

Gj(x) =xj ; 1 0 for j = 1(1)n n =5 

x i =1 for i = 1(1)n F (x )=0 all Gj active 

x (0) 

i = ;10 for i = 1(1)n F (x (0) ) = 48884 

The starting point is not in the feasible region. 

Problem 2.39 after Rosen and Suzuki (1965) 


Constraints: 

Minimum: 

F (x) =x 2 

1 + x 2 

2 +2x 2 

3 + x 2 

4 ; 5 x 1 ; 5 x 2 ; 21 x 3 +7x 4 

G1(x) = ;2 x 2 

1 ; x2 

2 ; x2 

3 ; 2 x1 + x2 + x4 +5 0 

G2(x) = ;x 2 

1 ; x 2 

2 ; x 2 

3 ; x 2 

4 ; x1 + x2 ; x3 + x4 +8 0 

G3(x) = ;x 2 

1 ; 2 x 2 

2 ; x 2 

3 ; 2 x 2 

4 + x1 + x4 +10 0 

x =(0 1 2 ;1) F (x )=;44 G 1 active


Start: 

x (0) =(0 0 0 0) F (x (0) )=0 

None of the search methods that operate directly with constraints, i.e., without reformulating 

the objective functions, managed to solve the problem to satisfactory accuracy. 

Problem 2.40 


Constraints: 

Gj(x) = 

8 

>< 

>: 

F (x) =; 

5X 

i=1 

xj 0 for j = 1(1)5 

; 5P 

i=1 

xi 

(9 + i) xi + 50000 0 for j =6 

This is a simple linear programming problem. The solution is in a corner of the allowed 

region de ned by the constraints (simplex). 

Minimum: 

x =(5000 0 0 0 0) F (x )=;5000 G 2 to G 6 active 


x3 = x4 = x5 =0 

F (x) == ; 10500 ;9500 ;8500 ;7500 ;6500 

;5500 ;4500 ;3500;2500 ;1500;500 500=


Start: 

x (0) =(250 250 250 250 250) F (x (0) )=;1250 

In terms of the values of the variables, none of the strategies tested achieved accuracies 

better than 10 ;2 . The two variants of the (10 , 100) evolution strategy came closest to 

the exact solution. 

Problem 2.41 


Constraints: 

Minimum: 

Start: 

F (x) =; 

x = 0 0 0 0 50000 

14 

5X 

i=1 

(ixi) 

as for Problem 2.40 

F (x )= ;250000 

14 

Gj active forj =1 2 3 4 6 

x (0) =(250 250 250 250 250) F (x (0) )=;3750 

This problem di ers from the previous one only in the numerical values regarding the 

accuracies achieved, the same remarks apply as for Problem 2.40. 


x2 = x3 = x4 =0 

F (x) == ; 30000 ;25000 ;20000 

;15000 ;10000 ;5000 0=


Problem 2.42 


Constraints: 

Minimum: 

Start: 

F (x) = 

5X 

i=1 

(ixi) 

as for Problems 2.40 and 2.41 

x =(0 0 0 0 0) F (x )=0 G 1 to G 5 active 

x (0) = (250 250 250 250 250) F (x (0) )=3750 

The minimum is at the origin of coordinates. The evolution strategies were thus better 

able to approach the solution by adjusting the individual step lengths. The multimembered 

strategy with recombinations yielded an exact solution with variable values less 

than 10 ;38 . 

Problem 2.43 

As Problem 2.42, except: 

Start: 

x (0) =(;250 ;250 ;250 ;250 ;250) F (x (0) )=;3750 

The starting point is not in the feasible region. 

The solutions are the same as in Problem 2.42. 


x3 = x4 = x5 =0 

F (x) == ; 1000 1000 3000 5000 

7000 9000 11000 13000 15000=


Problem 2.44 

As Problem 2.26, but with additional constraints: 

Minimum: 

G 1(x) =;x + 300 0 G 2(x) =x + 300 0 

x = ;300 F (x ) ';299:7 G 2 active 

Besides this global minimum there are ve more local minima within the feasible region. 

Start: 

x (0) =0 F (x (0) )=0 

The global minimum could only be located by multimembered evolution. All the other 

search strategies converged to the local minimum nearest to the starting point. 

Problem 2.45 of Smith and Rudd (after Leon, 1966a) 


Constraints: 

Gj = 

F (x) = 

8 

>< 

>: 

nX 

i=1 

x i 

i e;x i for n =5 

xj 0 for j = 1(1)n 

2 ; xj;n 0 for j = n + 1(1)2 n 



Minimum: 

Figure A.28: Graphical representation of Problem 2.45 for n = 2 

F (x) == ; 1:0 0:0 0:3 0:4 0:6 0:8 0:9= 

x i =0 for i = 1(1)n F (x )=0 all G 1 to Gn active 

Besides this global minimum there is another local one: 

Start: 

x 0 

i =(2 0:::0) F (x 0 )=2e ;2 G 2 to Gn+1 active 

x (0) 

i =1 for i = 1(1)n F (x (0) ) ' 1:84 

In the neighborhood of the minimum sought, the rate of convergence of a search strategy 

depends strongly on its ability tomake widely di erent individual adjustments to the 

step lengths for the changes in the variables. The multimembered evolution solved this 

problem best when working with recombination. Rosenbrock's method converged to the 

local minimum, as did the complex method and the simple evolution strategies. 

Problem 2.46 


Constraints: 

Minimum: 

Start: 

F (x) =x 2 

1 + x2 

2 

G 1(x) =x 1 +2x 2 ; 2 0 

x =(0:4 0:8) F (x )=0:8 G 1 active 

x (0) =(10 10) F (x (0) ) = 200



F (x) ==0:04 0:36 1:00 1:96 3:24 4:84 6:76= 

Problem 2.47 after Ueing (1971) 


Constraints: 

Minimum: 

F (x) =;x 2 

1 ; x 2 

2 

Gj(x) = xj 0 for j =1 2 

G 3(x) = x 2 

1 + x 2 

2 ; 17 x 1 ; 5 x 2 +66 0 

G 4(x) = x 2 

1 

+ x2 

2 ; 10 x 1 ; 10 x 2 +41 0 

G 5(x) = x 2 

1 + x 2 

2 ; 4 x 1 ; 14 x 2 +45 0 

G 6(x) = ;x 1 +7 0 

G 7(x) = ;x 2 +7 0 


Besides the global minimum x there are three other local minima: 

Start: 

x 0 ' (2:116 4:174) F (x 0 ) ';21:90 

x 00 =(0 5) F (x 00 )=;25 

x 000 =(5 2) , F (x 000 )=;29 

x (0) =(0 0) F (x (0) )=0



F (x) == ; 4 ;16 ;36 ;64 ;100;144;196;256= 

To the original problem have been added the two constraints G 6 and G 7. Without them 

there are two separate feasible regions and the global minimum is at in nity, inthe 

external, open region. Depending on the initial step lengths, the evolution strategies were 

sometimes able to go out from the starting point within the inner, closed region into the 

external region. After adding G 6 and G 7, the multimembered strategies converged to the 

global minimum, all other search methods located other local minima which of these was 

located by the two membered evolution strategy, depended on the sequence of random 

numbers. 

Problem 2.48 after Ueing (1971) 


Constraints: 

Minimum: 

F (x) =;x 2 

1 ; x 2 

2 

Gj(x) = xj 0 for j =1 2 

G 3(x) = ;x 1 + x 2 +4 0 

G4(x) = x1 3 ; x2 +4 0 

G5(x) = x 2 

1 + x 2 

2 ; 10 x1 ; 10 x2 +41 0 


Besides this global minimum there are two more local minima: 

x 0 ' (2:018 4:673) F (x 0 ) ';25:91


Start: 


F (x) == ; 4 ;16 ;36 ;64 ;100 ;144 ;196 ;256= 

x 00 ' (6:293 2:293) F (x 00 ) ';44:86 

x (0) =(0 0) F (x (0) )=0 

There are two feasible regions which are unconnected and closed. The starting point and 

the global minimum are separated by a non-feasible region. Only the (10 , 100) evolution 

strategy converged to the global minimum. It sometimes happened with this strategy that 

one descendant of a generation would jump from one feasible region to the other however, 

the group of remaining individuals would converge to one of the other local minima. All 

other strategies did not converge to the global minimum. 

Problem 2.49 after Wolfe (1966) 


Constraints: 

Minimum: 

Start: 

F (x) = 4 

3 (x2 

1 + x 2 

2 ; x1 x2) 3 

4 + x3 Gj(x) =xj 0 for j = 1(1)3 

x =(0 0 0) F (x )=0 all Gj active 

x (0) =(10 10 10) F (x (0) ) ' 52:16

Test Problems for the Third Part of the Strategy Comparison 361 

Problem 2.50 

As Problem 2.37, but with some other constraints: 

Minimum: 

Gj(x) =;xj + 100 0 for j = 1(1)n 

Gn+1(x) =1; 

0 

nX 

i=1 

@ 1 

n 

nX 

j=1 

(xj) ; xi 

x i =100 for i = 1(1)n F (x )=;300 for n =3 G 1 to Gn active 

Start: 

1 

A 

x (0) 

i =0 for i = 1(1)n F (x (0) )=0 

Instead of the 2 n ; 2 linear constraints of Problem 2.37, a non-linear constraint served 

here to bound the corridor at its sides. From a geometrical point of view, the cross section 

of the corridor for n =3variables is now circular instead of square. For n =2variables 

the two problems become equivalent. 

A.3 Test Problems for the Third Part of the 


These are usually n-dimensional extensions of problems from the second set of tests, whose 

numbers are given in brackets after the new problem number. 

Problem 3.1 (analogous to Problem 2.4) 


Minimum: 

F (x)= 

nX 

i=1 

h 

(x1 ; x 2 

i )2 +(1; xi) 2 

i 

x i =1 for i = 1(1)n F (x )=0 

No noteworthy di culties arose in the solution of this and the following biquadratic 

problem with any of the comparison strategies. Away from the minimum, the contour 

patterns of the objective functions resemble those of the n-dimensional sphere problem 

(Problem 1.1). Nevertheless, the slight di erences caused most search methods to converge 

much moreslowly (typically by a factor 1=5). The simplex strategy was particularly 

a ected. The computation time it required were about 10 to 30 times as long as for 

the sphere problem with the same number of variables. With n = 100 and greater, 

the required accuracy was only achieved in Problem 3.1 after at least one collapse and 

subsequent reconstruction of the simplex. The evolution strategies on the other hand 

were all practically una ected by the di erence with respect to Problem 1.1. Also for 

the complex method the cost was only slightly higher, although with this strategy the 

computation time increased very rapidly with the number of variables for all problems. 

2 

0




Minimum: 

F (x) = 

nX 

i=2 



F (x) = 

0 

nX 

@ nX 

i=1 

h 

(x1 ; x 2 

i ) 2 +(1; xi) 2 

i 

x i =1 for i =1(1)n F (x )=0 

j=1 

(aij sin j + bij cos j) ; 

nX 

j=1 

(aij sin xj + bij cos xj) 

where aijbij for i j = 1(1)n are integer random numbers from the range [;100 100], and 

jj = 1(1)n are random numbers from the range [; ]. 

Minimum: 

x i = i for i =1(1)n F (x )=0 

Besides this desired minimum there are numerous others that have the same value (see 

Problem 2.13). The aij and bij require storage space of order O(n 2 ). For this reason 

the maximum number of variables for which this problem could be set up had to be 

limited to nmax = 30. The computation time per function call also increases as O(n 2 ). 

The coordinate strategies ended the search for the minimum before reaching the required 

accuracy when 10 or more variables were involved. The method of Davies, Swann, and 

Campey (DSC) with Gram-Schmidt orthogonalization and the complex method failed 

in the same way for 30 variables. For n = 30 the search simplex of the Nelder-Mead 

strategy also collapsed prematurely, but after a restart the minimum was su ciently 

well approximated. Depending on the sequence of random numbers, the two membered 

evolution strategy converged either to the desired minimum or to one of the others. This 

was not seen to occur with the multimembered strategies however, only one attempt 

could be made in each case because of the long computation times. 



Minimum: 

F (x) = 

nX 

i=1 

jxij 

x i =0 for i =1(1)n F (x )=0 

This problem presented no di culties to those strategies having a line (one dimensional) 

search subroutine, since the axes-parallel minimizations are always successful. The simplex 

method on the other hand required several restarts even for just 30 variables, and 

1 

A 

2


for n = 100 variables it had to be interrupted, as it exceeded the maximum permitted 

computation time (8 hours) without achieving the required accuracy. The success or failure 

of the (1+1) evolution strategy and the complex method depended upon the actual 

random numbers. Therefore, in this and the following problems, whenever there was 

any doubt about convergence, several (at least three) attempts were made with di erent 

sequences of random numbers. It was seen that the two membered evolution strategy 

sometimes spent longer near one of the corners formed by the contours of the objective 

function, where it converged only slowly however, it nally escaped from this situation. 

Thus, although the computation times were very varied, the search was never terminated 

prematurely. The success of the multimembered evolution strategy depended on whether 

or not recombination was implemented. Without recombination the method sometimes 

failed for just 30 variables, whereas with recombination it converged safely and with no 

periods of stagnation. In the latter case the computation times taken were actually no 

longer than for the sphere problem with the same number of variables. 



Minimum: 

F (x) = max 

i fxi i=1(1)ng 

x i =0 for i = 1(1)n F (x )=0 

Most of the methods using a one dimensional search failed here, because the value of the 

objective function is piecewise constant along the coordinate directions. The methods of 

Rosenbrock and of Davies, Swann, and Campey (whatever the method of orthogonalization) 

converged safely, since they consider trial steps that do not change the objective 

function value as successful. If only true improvements are accepted, as in the conjugate 

gradient, variable metric, and coordinate strategies, the search never even leaves the chosen 

starting point at one of the corners of the contour surface. The simplex and complex 

strategies failed for n>30 variables. Even for just 10 variables the search simplex of the 

Nelder-Mead method had to be constructed anew after collapsing 185 times, before the 

desired accuracy could be achieved. For the evolution strategy with only one parent and 

one descendant, the probability of nding from the starting point apoint with a better 

value of the objective function is 

we =2 ;n 

For this reason the (1+1) strategy failed for n 10. The multimembered version without 

recombination, could solve the problem for up to n =10variables. With recombination, 

convergence was sometimes still achieved for n =30variables, but no longer for n =100 

in the three attempts made.




Minimum: 

F (x) = 

nX 

i=1 

jxij + 

nY 

i=1 

jxij 

x i =0 for i =1(1)n F (x )=0 

In spite of the even sharper corners on the contour surfaces of the objective function 

all the strategies behaved in much the same way as they did in the minimum search 

of Problem 3.4. The only notable di erence was with the (10 , 100) evolution strategy 

without recombination. For n = 30 variables the minimum search always converged 

only for n = 100 and above the search was no longer successful. 



Minimum: 

F (x)= 

nX 

i=1 

x 10 

i 

x i =0 for i =1(1)n F (x )=0 

The strategy of Powell failed for n 10 variables. Since all the step lengths were set 

to zero the search stagnated and the internal termination criterion did not take e ect. 

The optimization had to be interrupted externally. From n = 30, the variable metric 

method was also ine ective. The quadratic model of the objective function on which it 

is based led to completely false predictions of suitable search directions. For n = 10 the 

simplex method required 48 restarts, and for n =30asmany as 181 in order to achieve 

the desired accuracy. None of the evolution strategies had any convergence di culties 

in solving the problem. They were not tested further for n >300 simply for reasons of 

computation time. 

Problem 3.8 (similar to Problem 2.37) (corridor model) 


Constraints: 

Gj(x) = 

8 

>< 

>: 

q 

j+1 

j + xj+1 ; 1 

j 

F (x) =; 

jP 

i=1 

nX 

i=1 

xi 

xi 0 for j =1(1)n ; 1 

q 

j;n+2 

j;n+1 ; xj;n+2 + 1 

j;n+1 P 

xi 0 for j = n(1)2 n ; 2 

j;n+1 

i=1 

The other constraints of Problem 2.37, which bound the corridor in the direction of the 

minimum being sought, were omitted here. The minimum is thus at in nity.


In comparing the results of this and the following circularly bounded corridor problem 

with the theoretical rates of progress for this model function, the quantity ofinterest was 

the cost, not of reaching a given approximation to an objective, but of covering a given 

distance along the corridor axis. For the half-width of the corridor, b =1was taken. The 

search was started at the origin and terminated as soon as a distance s 10 b had been 

covered, or the objective function had reached a value F ;10 p n: 

Start: 

x (0) 

i =0 for i = 1(1)n F (x (0) )=0 

All the tested strategies converged satisfactorily. Thenumber of mutations or generations 

required by theevolution strategies increased linearly with the number of variables, as 

expected. Since the number of constraints, as well as the computation time per function 

call, increased as O(n), the total computation time increased as O(n 3 ). Because of the 

maximum of 8 hours per search adopted as a limit on the computation time, the two 

membered evolution strategy could only be tested to n = 300, and the multimembered 

strategies to n = 100. Intermediate results for n = 300, however, con rm that the 

expected trend is maintained. 

Problem 3.9 (similar to Problem 2.50) 


Constraint: 

G(x) =1; 

F (x) =; 

0 

nX 

i=1 

@ 1 

n 

nX 

j=1 

nX 

i=1 

xi 

(xj) ; xi 

Minimum, starting point and convergence criterion as in Problem 3.8. 

The complex method failed for n 30, but the Rosenbrock strategy simply required 

more objective function evaluations and orthogonalizations compared to the rectangular 

corridor. The evolution strategies converged safely. They too required more mutations 

or generations than in the previous problem. However, since only one constraint instead 

of 2 n ; 2was to be tested and respected, the time they took only increased as O(n 2 ). 

Recombination in the multimembered version was only a very slight advantage for this 

and the linearly bounded corridor problem. 



Constraints: 

Gj(x) = 

8 

>< 

>: 

F (x)= 

nX 

i=1 

x i 

i e;x i 

1 

A 

xj 0 for j = 1(1)n 

2 ; xj;n 0 for j = n + 1(1)2 n 

2 

0


Minimum: 

x i =0 for i =1(1)n F (x )=0 all G 1 to Gn active 

Besides this global minimum there is a local one within the feasible region: 

x 0 

i = 

( 2 for i =1 

0 for i = 2(1)n F (x0 )=2e ;2 

As in the solution of Problem 2.45 with vevariables, the search methods only converged if 

they could adjust the step lengths individually. The strategy of Rosenbrock failed for only 

n = 10. The complex method sometimes converged for the same number of variables after 

about 1,000 seconds of computation time, but occasionally not even within the allotted 8 

hours. For n =30variables, none of the strategies reached the objective before the time 

limit expired. The results obtained after 8 hours showed clearly that better progress was 

being made by thetwo membered evolution strategy and the multimembered strategy 

with recombination. The following table gives the best objective function values obtained 

by each of the strategies compared. 

Rosenbrock 10 ;4y 

Complex 10 ;7 

(1 + 1) evolution 10 ;30 

(10 100) evolution without recombination 10 ;12 

(10 100) evolution with recombination 10 ;26 

y The Rosenbrock strategy ended the search prematurely after about 5 hours. All the other values are 

intermediate results after 8 hours of computation time when the strategy's own termination criteria were 

not yet satis ed. The searches could therefore still have come to a successful conclusion.

Appendix B 

Program Codes 

This appendix contains the two FORTRAN programs EVOL and GRUP (with option 

REKO) used for the test series described in Chapter 6 plus the extension KORR as of 

1976, whichcovers all features of GRUP/REKOaswell as correlated mutations (Schwefel, 

1974 see also Chap. 7) introduced shortly after the rst German version of this work was 

nished in 1974 (and reproduced as monograph by Birkhauser, Basle, Switzerland, in 

1977). GRUP and REKO thus should no longer be used or imitated. 

B.1 (1+1) Evolution Strategy EVOL 

1. Purpose 

The EVOL subroutine is a FORTRAN coding of the two membered evolution strategy. 

It is an iterative direct search strategy for a parameter optimization problem. A search 

is made for the minimum in a non-linear function of an arbitrary but nite number of 

continuously variable parameters. Derivatives of the objective function are not required. 

Constraints in the form of inequalities can be incorporated (right hand side 0). The 

user must supply initial values for the variables and for the appropriate step sizes. If the 

initial state is not feasible, a search is made for a feasible point by minimizing the sum of 

the negative values for the constraints that have been violated. 

2. Subroutine parameter list 

EVOL (N,M,LF,LR,LS,TM,EA,EB,EC,ED,SN,FB,XB,SM,X,F,G,T,Z,R) 

All parameters apart from LF, FB, X, and Z must be assigned values or names either 

before or when the subroutine is called. The variables XB and SM do not retain the 

values initially assigned to them. 

N (integer) Number of parameters (>0). 

M (integer) Number of constraints ( 0). 

367

368 Appendix B 

LF (integer) Return code with the following meaning: 

LF=;2 Starting point not feasible and search for a feasible state unsuccessful. 

Feasible region probably empty. 

LF=;1 Starting point not feasible and search for a feasible state terminated 

because time limit was reached. 

LF=0 Starting point not feasible, search for a feasible state successful. The 

nal values of XB can be used as starting point for a subsequent search 

for a minimum if EVOL is called again. 

LF=1 Search for minimum terminated because time limit was reached. 

LF=2 Search for minimum terminated in an orderly fashion. No further improvement 

in the value of the objective function could be achieved in the 

context of the given accuracy parameters. Probably the nal state XB 

(extreme point) having FB (extreme value) lies near a local minimum, 

perhaps the global minimum. 

LR (integer) Auxiliary quantity used in step size management. Normal value 1.0. 

The step sizes are adjusted so that on average one success (improvement 

in the value of the objective function) is obtained in 5 LR trials 

(objective function calls). This is computed on the last 10 N LR 

trials. 

LS (integer) Auxiliary quantity used in convergence testing. Minimum value 2.0. 

The search is terminated if the value of the objective function has improved 

by less than EC (absolute) or ED (relative) in the course of 10 

N LR LS trials. Note: the step sizes are reduced by atmosta 

factor SN 10 LS during this period. The factor is 0:2 LS if SN = 0.85 

is selected. 

TM (real) Parameter used in controlling the computation time, e.g., the maximum 

CPU time in seconds, depending on the function designated T (see 

below). The search is terminated if T > TM. This check is performed 

after each N LR mutations (objective function calls). 

EA (real) Lower bound to step sizes, absolute. EA > 0.0 must be chosen large 

enough to be treated as di erent from 0.0 within the accuracy of the 

computer used. 

EB (real) Lower bound to step sizes relativetovalues of variables. EB > 0.0 must 

be chosen large enough for 1:0 + EB to be treated as di erent from 1:0 

within the accuracy of the computer used. 

EC (real) Parameter in convergence test, absolute. See under LS. (EC > 0.0, see 

EA). 

ED (real) Parameter in convergence test, relative. See under LS. (1:0 +ED> 1:0, 

see EB). Convergence is assumed if the data pass one or both tests. If 

it is desired to suppress a test, it is possible either to set EC = 0.0 or to 

choose a value for ED such that1:0 + ED = 1.0 but ED > 0.0 within 

the accuracy of the machine. 

SN (real) Auxiliary variable for step size adjustment. Normal value 0.85. The 

step size can be kept constant during the trials by setting SN = 1:0. 

The success rate indicated by LR is used to adjust the step size by a 

factorSNor1:0/SN after every N LR trials.

(1 + 1) Evolution Strategy EVOL 369 

FB (real) Best value of objective function obtained during the search. 

XB (one dimensional On call: holds initial values of variables. 

real array of On exit: holds best values of variables corresponding to FB. 

length N) 

SM (one dimensional On call: holds initial values of step sizes (more precisely, standreal 

array of ard deviations of components of the mutation vector). 

length N) On exit: holds current step sizes of the last (not necessarily 

successful) mutation. Optimum initialization: somewhere 

near the local radius of curvature of the objective function 

hypersurface divided by the number of variables. More practical 

suggestion: SM(I) = DX(I)/SQRT(N), where DX(I) is 

a crude estimate of either the distance between start and expected 

optimum or the maximum uncertainty range for the 

variable X(I). If the SM(I) are initially set too large, a certain 

time elapses before they are appropriately adjusted. This is 

advantageous as regards the probability of locating the global 

optimum in the presence of several local optima. 

X (one dimensional Space for holding a variable vector. 

real array of 

length N) 

F (real function) Name of the objective function, which istobeprovided by the 

user. 

G (real function) Name of the function used in calculating the values of the 

constraint functions to be provided by the user. 

T (real function) Name of the function used in controlling the computation time. 

Z (real function) Name of the function used in transforming a uniform random 

number distribution to a normal distribution. If the nomenclature 

Z is retained, the function Z appended to the EVOL 

subroutine can be used for this purpose. 

R (real function) Name of the function that generates a uniform random number 

distribution. 

3. Method 

See I. Rechenberg, Evolution Strategy: Optimization of Technical Systems in Accordance 

with the Principles of Biological Evolution (in German), vol. 15 of Problemata series, Verlag 

Frommann-Holzboog, Stuttgart, 1973 also H.-P. Schwefel, Numerical Optimization of 

Computer Models, Wiley, Chichester, 1981 (translated by M. W. Finnis from Numerische 

Optimierung von Computer-Modellen mittels der Evolutionsstrategie, vol. 26 of Interdisciplinary 

Systems Research, Birkhauser, Basle, Switzerland, 1977). 

The method is based on a very much simpli ed simulation of biological evolution using 

the principles of mutation (random changes in variables, normal distribution for change 

vector) and selection (elimination of deteriorations and retention of improvements). The 

widths of the normal distribution (or step sizes) are controlled by reference to the ratio 

of the number of improvements to the number of mutations.


4. Convergence criterion 

Based on the change in the value of the objective function (see under LS, EC, and ED). 

5. Peripheral I/O: none. 

6. Notes 

If there are several (local) minima, only one is located. Which one actually is found depends 

on the initial values of variables and step sizes as well as on the random number 

sequence. In such cases it is recommended to repeat the search several times with di erent 

sets of initial values and/or random numbers. The approximation to the minimum 

is usually poor if the search terminates at the boundary of the feasible region de ned by 

the constraints. Better results can then be obtained by setting LR > 1, LS > 2, and/or 

SN > 0.85 (maximum value 1.0). In addition, the bounds EA and EB should not be 

made too small. The same applies if the objective function has discontinuous rst partial 

derivatives (e.g., in the case of Tchebyche approximation). 

7. Subroutines or functions used 

The function names should be declared as external in the segment that calls EVOL. 

7.1 Objective function 

This is to be written by the user in the form: 

----------------------------------------------------- 

FUNCTION F(N,X) 

DIMENSION X(N) 

... 

F=... 

RETURN 

END 

----------------------------------------------------- 

N represents the number of parameters, and X represents the formal parameter vector. 

The function should be written on the basis that EVOL searches for a minimum ifa 

maximum is to be sought, F must be supplied with a negative sign. 

7.2 Constraints function 

This is to be written by the user in the general style: 

----------------------------------------------------- 

FUNCTION G(J,N,X) 


GOTO(1,2,3,...,(M)),J 

1 G=... 

RETURN 

2 G=...


RETURN 

... 

(M) G=... 

RETURN 

END 

----------------------------------------------------- 

N and X have the meanings described for the objective function, while J (integer) is the 

serial number of the constraint. The statements should be written on the basis that EVOL 

will accept vector X as feasible if all the G values are larger than or equal to 0.0. 

7.3 Function for controlling the computation time 

This may be de ned by the user or called from the subroutine library in the particular 

machine. The following structure is assumed: 

REAL FUNCTION T(D) 

where D is a dummy parameter. T should be assigned the monitored quantity, e.g., 

the CPU time in seconds limited by TM.Many computers are supplied with ready-made 

timing software. If this is given as a function, only its name needs to be supplied to EVOL 

instead of T as a parameter. If it is a subroutine, the user can program the required 

function. For example, the subroutine might be called SECOND(I), where parameter I is 

an integer representing the CPU time in microseconds, in which case one could program: 

----------------------------------------------------- 

FUNCTION T(D) 

CALL SECOND(I) 

T=1.E-6*FLOAT(I) 

RETURN 

END 

----------------------------------------------------- 

7.4 Function for transforming a uniformly distributed random number to a normally 

distributed one 

See under Section 8. 

7.5 Function for generating a uniform random number distribution in the range (0,1] 

The structure must be 

REAL FUNCTION R(D) 

where D is dummy. R is the value of the random number. Note: The smallest value of 

Rmust be large enough for the natural logarithm to be generated without oating-point 

over ow. The standard library usually includes a suitable program, in which case only 

the appropriate name has to be supplied to EVOL.


8. Function Z(S,R) 

This function converts a uniform random number distribution to a normal distribution 

pairwise by means of the Box-Muller rules. The standard deviation is supplied as parameter 

S, while the expectation value for the mean is always 0.0. The quantity LZis 

common to EVOL and Z by virtue of a COMMON block and acts as a switch to transmit 

only one of the two random numbers generated in response to each second call. 

--------------------------------------------------------- 

SUBROUTINE EVOL(N,M,LF,LR,LS,TM,EA,EB,EC,ED,SN,FB, 

1XB,SM,X,F,G,T,Z,R) 

DIMENSION XB(1),SM(1),X(1),L(10) 

COMMON/EVZ/LZ 

EXTERNAL R 

TN=TM+T(D) 

LZ=1 

IF(M)4,4,1 

1 LF=-1 

C 

C FEASIBILITY CHECK 

C 

FB=0. 

DO 3 J=1,M 

FG=G(J,N,XB) 

IF(FG)2,3,3 

2 FB=FB-FG 

3 CONTINUE 

IF(FB)4,4,5 

C 

C ALL CONSTRAINTS SATISFIED IF FB


7 DO 8 I=1,N 

8 X(I)=XB(I)+Z(SM(I),R) 

IF(LF)9,9,12 

C 

C AUXILIARY OBJECTIVE 

C 

9 FF=0. 

DO 11 J=1,M 

FG=G(J,N,X) 

IF(FG)10,11,11 

10 FF=FF-FG 

11 CONTINUE 

IF(FF)32,32,16 

C 

C ALL CONSTRAINTS SATISFIED IF FF


25 L(K)=L(K+1) 

L(10)=LE 

LM=0 

LC=LC+1 

IF(LC-10*LS)31,26,26 

C 

C CONVERGENCE CRITERION 

C 

26 IF(FC-FB-EC)28,28,27 

27 IF((FC-FB)/ED-ABS(FC))28,28,30 

28 LF=ISIGN(2,LF) 

29 RETURN 

30 LC=0 

FC=FB 

C 

C TIME CONTROL 

C 

31 IF(T(D)-TN)7,29,29 

32 DO 33 I=1,N 

33 XB(I)=X(I) 

FB=F(N,XB) 

LF=0 

GOTO 29 

END 

--------------------------------------------------------- 

FUNCTION Z(S,R) 

COMMON/EVZ/LZ 

DATA ZP/6.28318531/ 

GOTO(1,2),LZ 

1 A=SQRT(-2.*ALOG(R(D))) 

B=ZP*R(D) 

Z=S*A*SIN(B) 

LZ=2 

RETURN 

2 Z=S*A*COS(B) 

LZ=1 

RETURN 

END 

---------------------------------------------------------

( , ) Evolution Strategies GRUP and REKO 375 

B.2 ( , )Evolution Strategies GRUP and REKO 

1. Purpose 

The GRUP subroutine is a FORTRAN program to handle a multimembered (L,LL) evolution 

strategy with L parents and LL descendants per generation. It is an iterative direct 

search strategy for handling parameter optimization problems. A search is made for the 

minimum in a non-linear function of an arbitrary but nite number of continuously variable 

parameters. Derivatives of the objective function are not required. Constraints in 

the form of inequalities can be incorporated (right hand side 0). The user must supply 

initial values for the variables and for the appropriate step sizes. If the initial state is 

not feasible, a search is made for a feasible point that minimizes the sum of the negative 

values for the constraints that have been violated. 

2. Subroutine parameter list 

GRUP (REKO,L,LL,N,M,LF,TM,EA,EB,EC,ED,SN,FA,FB,XB,SM,X,FK,XK,SK,F,G,T,Z,R) 

All parameters apart from LF, FA, FB, X, FK, XK, SK, and Z must be assigned values 

or names before or when the subroutine is called. The variables XB and SM do not retain 

the values initially assigned to them. 

REKO (logical) Switch for alternative with/without recombination. 

REKO=.FALSE. No recombination. The step sizes retain the relationship 

initially assigned to them. 

REKO=.TRUE. Recombination occurs. The relationships between the step 

sizes alter during the search. 

L (integer) Number of parents ( 1). This parameter should not be 

chosen too small if recombination is to occur. 

LL (integer) Number of descendants (>L). Recommended tochoose a 

value 6 L. 

N (integer) Number of parameters (>0). 

M (integer) Number of constraints ( 0). 

LF (integer) Return code with the following meanings: 

LF=;2 Starting point not feasible and search for a feasible state 

unsuccessful. Feasible region probably empty. 

LF=;1 Starting point not feasible and search for a feasible state 

terminated because time limit was reached. 

LF=0 Starting point not feasible, search for a feasible state successful. 

The nal values of XB can be used as starting point 

for the search for a minimum if GRUP is called again. 

LF=1 Search for minimum terminated because time limit was 

reached. 

LF=2 Search for minimum terminated in an orderly fashion.


LF=2 (continued) No further improvement inthevalue of the objective function 

could be achieved in the context of the framework of the 

given accuracy parameters. Probably the nal state XB (extreme 

value) lies near a local minimum, perhaps the global 

minimum. 

TM (real) Parameter used in monitoring the computation time, e.g., the 

maximum CPU time in seconds, depending on the function 

designated T (see below). The search is terminated if T > 

TM. This check is performed after every generation = LL 

objective function calls. 

EA (real) Lower bound to step sizes, absolute. EA > 0.0 must be chosen 

large enough to be treated as di erent from 0.0 within the 

accuracy of the computer used. 

EB (real) Lower bound to step sizes relativetovalues of variables. EB > 

0.0 must be chosen large enough for 1:0 + EB to be treated as 

di erent from 1.0 within the accuracy of the computer used. 

EC (real) Parameter in convergence test, absolute. The search is terminated 

if the di erence between the best and worst values 

of the objective function within a generation is less than or 

equal to EC (EC > 0.0, see EA). 

ED (real) Parameter in convergence test, relative. The search is terminated 

if the di erence between the best and worst values 

of the objective function within a generation is less than or 

equaltoEDmultiplied by the absolute value of the mean of 

the objective function as taken over all L parents in a generation 

(1.0 + ED > 1.0, see EB). Convergence is assumed if the 

data pass one or both tests. If it is desired to delete a test, it 

is possible either to set EC = 0.0 or to choose a value for ED 

such that 1.0 + ED = 1.0 but ED > 0.0 within the accuracy 

of the machine. 

SN (real) Auxiliary quantity used in step size adjustment. Normal value 

C/SQRT(N), with C > 0.0, e.g., C = 1.0 for L = 10 and LL 

=100. C can be increased as LL increases, but it must be 

reduced as L increases. An approximation for L = 1 is LL 

proportional to SQRT(C) EXP(C). 

FA (real) Current best objective function value for population. 

FB (real) Best value of objective function attained during the whole 

search. The minimum found may not be unique if FB di ers 

from FA because: (1) there is a state with an even smaller 

value for the objective function (e.g., near a local minimum 

or even near the global minimum) that has been lost over 

the generations or (2) the minimum consists of several quasisingular 

peaks on account of the nite accuracy of the computer 

used. Usually, the di erence between FA and FB is 

larger in the rst case than in the second, if EC and ED have 

been assigned small values.


XB (one dimensional On call: holds initial values of variables. 

real array of On exit: holds best values of variables corresponding to FB. 

length N) 

SM (one dimensional On call: holds initial values of step sizes (more precisely, standreal 

array of ard deviations of components of the mutation vector). 

length N) On exit: holds current step sizes of the last (not necessarily successful) 

mutation. Optimum initialization: somewhere near the 

local radius of curvature of the objective function hypersurface 

divided by thenumber of variables. More practical suggestion: 

SM(I) = DX(I)/SQRT(N), where DX(I) is a crude estimate of 

either the distance between start and expected optimum or the 

maximum uncertainty range for the variable X(I). If the SM(I) 

are initially set too large, it may happen that a good starting 

point is lost in the rst generation. This is advantageous as 

regards the probability of locating the global optimum in the 

presence of several local optima. 

X (one dimensional Space for holding a variable vector. 

real array of 

length N) 

FK (one dimensional Holds objective function values for the L best individuals in 

real array of each of the last two generations. 

length 2 L) 

XK (one dimensional Holds the variable values for N components for each of the L 

real array of parents in each of the last two generations. XK(1) to XK(N) 

length 2 L N) hold the state vector X for the rst individual, the next N 

locations do the same for the second, and so on. 

SK (one dimensional Holds the standard deviations, structure as for XK. 

real array of 

length 2 L N) 

F (real function) Name of the objective function, which is to be programmed by 

the user. 

G (real function) Name of the function for calculating the values of the constraints, 

to be programmed by the user. 

T (real function) Name of function used in monitoring the computation time. 

Z (real function) Name of function used in transforming a uniform random number 

distribution to a normal distribution. If the name Z is retained, 

the function Z listed after the GRUP subroutine can 

be used for this purpose. 

R (real function) Name of the function that generates a uniform random number 

distribution. 

3. Method 

GRUP has been developed from EVOL. The method is based on a very much simpli ed 

simulation of biological evolution. See I. Rechenberg, Evolution Strategy: Optimization 

of Technical Systems in Accordance with the Principles of Biological Evolution (in Ger-


man), vol. 15 of Problemata series, Verlag Frommann-Holzboog, Stuttgart, 1973 also 

H.-P. Schwefel, Numerical Optimization of Computer Models, Wiley, Chichester, 1981 

(translated by M. W. Finnis from Numerische Optimierung von Computer-Modellen mittels 

der Evolutionsstrategie, vol. 26 of Interdisciplinary Systems Research, Birkhauser, 

Basle, Switzerland, 1977). 

The current L parameter vectors are used to generate LL new ones by means of small 

random changes. 

The best L of these become the initial ones for the next generation (iteration). At the 

same time, the step sizes (standard deviations) for the changes in the variables (strategy 

parameters) are altered. The selection leads to adaptation to the local topology if LL/L 

is assigned a suitably large value, e.g., >6. The random changes in the parameters are 

produced by the addition of normally distributed random numbers, while those in the 

step sizes are produced from random numbers with a log-normal distribution by multiplication. 


Based on the di erences in value of the objective function (see under EC and ED). 

5. Peripheral I/O: none. 

6. Notes 

The multimembered strategy represents an improvement in reliabilityover the two membered 

strategy. On the other hand, the run time is greater when an ordinary (serial) 

digital computer is used. The run time increases less rapidly than in proportion to LL 

(the number of descendants per generation), because increasing LL increases the convergence 

rate (over the generations). However, minima at a boundary of the feasible 

region or at a vertex are attained only slowly or inexactly. In any case, although the 

certainty of global convergence cannot be guaranteed, numerical tests have shown that 

the multimembered strategy is far better than other search procedures in this respect. It 

is capable of handling separated feasible regions provided that the number of parameters 

is not large and that the initial step sizes are set suitably large. In doubtful cases it is 

recommended to repeat the search each time with a di erent set of initial values and/or 

random numbers. If the optimum being sought lies at a boundary of the feasible region, it 

is probably better to choose a value for SN (the parameter governing the rates of change 

of the standard deviations) less than the (maximal) value suggested above. 


The function names are to be declared as external in the segment that calls GRUP. 

7.1 Objective function 

To be written by the user in the form:


----------------------------------------------------- 



... 

... 

F=... 

RETURN 

END 

----------------------------------------------------- 


GRUP supplies the actual values. The function should be written on the basis that GRUP 

searches for a minimum if a maximum is to be sought, F must be supplied with a negative 

sign. 

7.2 Constraints function 

To bewrittenby the user in the general style: 

----------------------------------------------------- 

FUNCTION G(J,N,X) 


GOTO(1,2,3,...,(M)),J 

1 G=... 

RETURN 

2 G=... 

RETURN 

... 

... 

(M) G=... 

RETURN 

END 

----------------------------------------------------- 

N and X have the meanings described for the objective function, while J (integer) is 

the serial number of the constraint. The statements should be written on the basis that 

GRUP will accept vector X as feasible if all the G values are larger than or equal to zero. 

7.3 Function for monitoring the computation time 

This may be de ned by the user or called from the subroutine library in the particular 

machine. The following structure is assumed: 

REAL FUNCTION T(D) 

where D is a dummy parameter. T should be assigned the monitored quantity, e.g., 

the CPU time in seconds limited by TM.Many computers are supplied with ready-made 

timing software. If this is given as a function only its name needs to be supplied to GRUP,


instead of T, as a parameter. If it is a subroutine, the user can program the required 

function. For example, the subroutine might be called SECOND(I), where parameter I is 

an integer representing the CPU time in microseconds, in which case one could program: 

----------------------------------------------------- 

FUNCTION T(D) 

CALL SECOND(I) 

T=1.E-6*FLOAT(I) 

RETURN 

END 

----------------------------------------------------- 

7.4 Function for transforming a uniformly distributed random number to a normally 

distributed one 

See under 8. 

7.5 Function for generating a uniform random number distribution in the range (0,1] 

The structure must be 

REAL FUNCTION R(D) 

where D is dummy. R is the value of the random number. 

Note: The smallest value of R must be large enough for the natural logarithm to be generated 

without oating-point over ow. The standard library usually includes a suitable 

program, in which case only the appropriate name has to be supplied to GRUP. 

8. Function Z(S,R) 

This function converts a uniform random number distribution to a normal distribution 

pairwise by means of the Box-Muller rules. The standard deviation is supplied as parameter 

S, while the expectation value for the mean is always zero. The quantity LZis 

common to GRUP and Z by virtue of a COMMON block and acts as a switch to transmit 

only one of the two random numbers generated in response to each second call. 

--------------------------------------------------------- 

SUBROUTINE GRUP(REKO,L,LL,N,M,LF,TM,EA,EB,EC,ED,SN, 

1FA,FB,XB,SM,X,FK,XK,SK,F,G,T,Z,R) 

LOGICAL REKO 

DIMENSION XB(1),SM(1),X(1),FK(1),XK(1),SK(1) 

COMMON/GRZ/LZ 

EXTERNAL R 

KK(RR)=(LA+IFIX(FLOAT(L)*RR))*N 

C 

C THE PRECEDING LINE CONTAINS A STATEMENT FUNCTION 

C 

TN=TM+T(D)


LZ=1 

IF(M)4,4,1 

1 LF=-1 

C 

C FEASIBILITY CHECK 

C 

FB=0. 

DO 3 J=1,M 

FG=G(J,N,XB) 

IF(FG)2,3,3 

2 FB=FB-FG 

3 CONTINUE 

IF(FB)4,4,5 

C 

C ALL CONSTRAINTS SATISFIED IF FB


15 CONTINUE 

16 FF=F(N,X) 

17 IF(FF-FB)18,19,19 

C 

C STORING OF BEST INTERMEDIATE RESULT 

C 

18 FB=FF 

KB=K 

19 DO 20 I=1,N 

KA=KA+1 

SK(KA)=AMAX1(SM(I)*SA,ABS(X(I))*EB,EA) 

20 XK(KA)=X(I) 

21 FK(K)=FF 

IF(KB)24,24,22 

22 KB=(KB-1)*N 

DO 23 I=1,N 

23 XB(I)=XK(KB+I) 

C 

C START OF MAIN LOOP 

C 

24 LA=L 

LB=0 

C 

C LA AND LB FORM A ROTATING COUNTER TO AVOID SHUFFLING 

C GENOTYPES WITHIN THE ARRAYS CONTAINING PARENTS AND 

C DESCENDANTS 

C 

25 LC=LB 

LB=LA 

LA=LC 

LC=0 

LD=0 

26 SA=EXP(Z(SN,R)) 

C 

C LOG-NORMAL STEP SIZE FACTOR 

C 

IF(REKO)GOTO 28 

KI=KK(R(D)) 

DO 27 I=1,N 

KI=KI+1 

SM(I)=SK(KI)*SA 

27 X(I)=XK(KI)+Z(SM(I),R) 

C 

C MUTATION WITHOUT RECOMBINATION ABOVE


C 

GOTO 30 

28 SA=SA*.5 

C 

C MUTATION WITH RECOMBINATION BELOW 

C 

DO 29 I=1,N 

SM(I)=(SK(KK(R(D))+I)+SK(KK(R(D))+I))*SA 

29 X(I)=XK(KK(R(D))+I)+Z(SM(I),R) 

30 IF(LF)31,31,34 

C 

C AUXILIARY OBJECTIVE 

C 

31 FF=0. 

DO 33 J=1,M 

FG=G(J,N,X) 

IF(FG)32,33,33 

32 FF=FF-FG 

33 CONTINUE 

IF(FF)60,60,38 

C 

C ALL CONSTRAINTS SATISFIED IF FF


SK(KS)=AMAX1(SM(I),ABS(X(I))*EB,EA) 

42 XK(KS)=X(I) 

IF(LD-L)46,43,43 

C 

C DETERMINING THE CURRENT WORST 

C 

43 KS=LB+1 

FS=FK(KS) 

DO 45 K=2,L 

KA=LB+K 

FF=FK(KA) 

IF(FF-FS)45,45,44 

44 FS=FF 

KS=KA 

45 CONTINUE 

46 LC=LC+1 

IF(LC-LL)26,47,47 

47 IF(LD-L)26,48,48 

C 

C END OF A GENERATION 

C 

48 KA=LB+1 

FA=FK(KA) 

FC=FA 

C 

C DETERMINING THE CURRENT BEST AND SUM 

C 

DO 50 K=2,L 

KB=LB+K 

FF=FK(KB) 

FC=FC+FF 

IF(FF-FA)49,50,50 

49 FA=FF 

KA=KB 

50 CONTINUE 

IF(FA-FB)51,51,53 

C 

C DETERMINING WHETHER THE CURRENT BEST IS BETTER THAN 

C THE SO FAR OVERALL BEST 

C 

51 FB=FA 

KB=(KA-1)*N 

DO 52 I=1,N 

52 XB(I)=XK(KB+I)


C 

C CONVERGENCE CRITERION 

C 

53 IF(FS-FA-EC)55,55,54 

54 IF((FS-FA)*FLOAT(L)/ED-ABS(FC))55,55,59 

55 LF=ISIGN(2,LF) 

56 KB=(KA-1)*N 

DO 57 I=1,N 

57 X(I)=XK(KB+I) 

58 RETURN 

C 

C TIME CONTROL 

C 

59 IF(T(D)-TN)25,56,56 

60 DO 61 I=1,N 

61 XB(I)=X(I) 

FB=F(N,XB) 

FA=FB 

LF=0 

GOTO 58 

END 

--------------------------------------------------------- 

FUNCTION Z(S,R) 

COMMON/GRZ/LZ 

DATA ZP/6.28318531/ 

GOTO(1,2),LZ 

1 A=SQRT(-2.*ALOG(R(D))) 

B=ZP*R(D) 

Z=S*A*SIN(B) 

LZ=2 

RETURN 

2 Z=S*A*COS(B) 

LZ=1 

RETURN 

END 

---------------------------------------------------------


B.3 ( + ) Evolution Strategy KORR 

Plus additional subroutines: PRUEFG, SPEICH, MUTATI, DREHNG, 

UMSPEI, MINMAX, GNPOOL, ABSCHA. 

and functions: ZULASS, GAUSSN, BLETAL. 

1. Purpose 

The KORR subroutine is a FORTRAN coding of a multimembered evolution strategy. 

It is an iterative direct search strategy for a parameter optimization problem. A search 

is made for the minimum in a non-linear function of an arbitrary but nite number of 

continuously variable parameters. Derivatives of the objective function are not required. 

Constraints in the form of inequalities can be incorporated (right hand side 0). The 

user must supply initial values for the variables and for the appropriate step sizes. If the 

initial state is not feasible, a search is made for a feasible point by minimizing the sum of 

the negative values for the constraints that have been violated. 

2. Parameter list for subroutine KORR 

KORR (IELTER, BKOMMA, NACHKO, IREKOM, BKORRL, KONVKR, IFALLK, TGRENZ, 

EPSILO, DELTAS, DELTAI, DELTAP, N, M, NS, NP, NY, ZSTERN, XSTERN, 

ZBEST, X, S, P, Y, ZIELFU, RESTRI, GAUSSN, GLEICH, TKONTR, KANAL) 

All parameters apart from IFALLK, ZSTERN, ZBEST, X, and Y must be assigned values 

or names before or when the subroutine is called. The variables XSTERN, S, and P do 

not retain the values initially assigned to them. 

IELTER (integer) Number of parents of a generation. 

IELTER 1 if IREKOM = 111 

IELTER > 1 IF IREKOM > 111 

BKOMMA (logical) Switch for comma or plus version. 

BKOMMA=.FALSE. Selection criterion applied to parents and descendants 

(IELTER + NACHKO) evolution strategy. 

BKOMMA=.TRUE. Selection criterion applied only to descendants 

(IELTER, NACHKO) evolution strategy. 

NACHKO (integer) Number of descendants in a generation. 

NACHKO 1ifBKOMMA = .FALSE. 

NACHKO > IELTER if BKOMMA = .TRUE. 

IREKOM (integer) Switch for recombination type consisting of three 

digits each of which has values between 1 and 5. 

The rst digit applies to the object variables X, the 

second one to the step sizes S, and the third one to 

the correlation angles P. Thus 111 IREKOM 

555. Each digit controls the recombination in the 

following way:

( + ) Evolution Strategy KORR 387 

1 No recombination 

2 Discrete recombination of pairs of parents 

3 Intermediary recombination of pairs of parents 

4 Discrete recombination of all parents 

5 Intermediary recombination of all parents in pairs 

BKORRL (logical) Switch forvariability of the mutation hyperellipsoid 

(locus of equal probability density). 

BKORRL=.FALSE. The ellipsoid cannot rotate. 

BKORRL=.TRUE. The ellipsoid can extend and rotate. 

KONVKR (integer) Switch fortheconvergence criterion: 

KONVKR = 1 The di erence in the objective functionvalues between 

the best and worst parents at the start of each 

generation is used to determine whether to terminate 

the search before the time limit is reached. It 

is assumed that IELTER > 1. 

KONVKR > 1 (best 2 N): The change in the mean of all the 

parental objective function values in KONVKR generations 

is used as the search termination criterion. 

In both cases EPSILO(3) serves as the absolute and 

EPSILO(4) as the relative bound for deciding to terminate 

the search. 

IFALLK (integer) Return code with the following meaning: 

IFALLK = ;2 Starting point not feasible, search terminated on 

nding a minimal value of the auxiliary objective 

function without satisfying all the constraints. 

IFALLK = ;1 Starting point not feasible, search for a feasible parameter 

vector terminated because time limit was 

reached. 

IFALLK=0 Starting point not feasible, search for a feasible 

XSTERN vector successful, search foraminimum 

can be restarted with this. 

IFALLK=1 Search for a minimum terminated because time limit 

was reached. 

IFALLK=2 Search for minimum terminated regularly. The convergence 

criterion was satis ed. 

IFALLK=3 As for IFALLK = 1, but time limit reached not at 

the end of a generation but in an attempt to generate 

NACHKO viable descendants. 

TGRENZ (real) Parameter used in monitoring the computation time, 

e.g., the maximum CPU time in seconds. Search 

terminated at the latest at the end of the generation 

for which TKONTR TGRENZ.


EPSILO (one dimensional Holds parameters that a ect the attainable accuracy of 

real array of approximation. The lowest possible values are machinelength 

4) dependent. 

EPSILO(1) Lower bound to step sizes, absolute. 

EPSILO(2) Lower bound to step sizes relative tovalues of variables 

(not implemented in this program). 

EPSILO(3) Limit to absolute value of objective function di erences 

for convergence test. 

EPSILO(4) As EPSILO(3) but relative. 

DELTAS (real) Factor used in step-size change. All standard deviations 

(=step sizes) S(I) are multiplied by a common 

random number EXP(GAUSSN(DELTAS)), where 

GAUSSN(DELTAS) is a normally distributed random 

number with zero mean and standard deviation DELTAS. 

EXP(DELTAS) 1.0. 

DELTAI (real) As for DELTAS, but each S(I) is multiplied by its own 

random factor EXP(GAUSSN(DELTAI)). 

EXP(DELTAI) 1.0. The S(I) retain their initial values 

if DELTAS = 0.0 and DELTAI = 0.0. The variables 

can be scaled only by recombination (IREKOM > 1) if 

DELTAI = 0.0. 

The following rules are suggested to provide the most 

rapid convergence for sphere models: 

DELTAS = C/SQRT(2.0 N). 

DELTAI = C/(SQRT(2.0 N/SQRT(NS)). 

The constant C can increase sublinearly with NACHKO, 

but it must be reduced as IELTER increases. The empirical 

value C = 1.0 has been found applicable for IELTER 

= 10, NACHKO = 100, and BKOMMA = .TRUE., which 

is a (10 , 100) evolution strategy. 

DELTAP (real) Standard deviation in random variation of the position 

angles P(I) for the mutation ellipsoid. 

DELTAP > 0.0 if BKORRL = .TRUE. Data in radians. 

A suitable value has been found to be DELTAP = 5.0 

0.01745 (5 degrees) in certain cases. 

N (integer) Number of parameters N > 0. 

M (integer) Number of constraints M 0. 

NS (integer) Field length in array Sornumber of distinct step-size 

parameters that can be used, 1 NS N. The mutation 

ellipse becomes a hypersphere for NS = 1. All the principal 

axes of the ellipsoid may be di erent for NS = N, 

whereas 1


NP (integer) Field length of array P. 

NP = N (NS ; 1) ; ((NS ; 1) NS)/2 if BKORRL = 

.TRUE. NP = 1 if BKORRL = .FALSE. 

NY (integer) Field length of array Y. 

NY = (N+NS+NP+1) IELTER 2 if BKORRL = 

.TRUE. NY = (N+NS+1) IELTER 2ifBKORRL 

=.FALSE. 

ZSTERN (real) Best value of objective function found during search for 

minimum. 

XSTERN (one dimensional On call: initial parameter vector. 

real array At end of search: best values for parameters correspondof 

length N) ing to ZSTERN, or feasible vector found for the special 

case IFALLK = 0. 

ZBEST (real) Current best value of objective function for the population, 

may be di erent from ZSTERN if BKOMMA = 

.TRUE. 

X (one dimensional Holds the variables for a descendant. 

real array of 

length N) 

S (one dimensional Holds the step sizes for a descendant. The user must 

real array of supply initial values. Universally valid rules for selecting 

length NS) the best S(I) are not available. If the step sizes are 

too large, a very good starting point can be wasted 

(BKOMMA = .TRUE.) or the step size adjustment may 

be very much delayed (BKOMMA = .FALSE.). 

If the initial step sizes are too small, there is only a 

slight chance of locating the global optimum in the presence 

of several local ones. In general, the optimum overall 

step sizes vary with the number N of parameters as 

C/SQRT(N), so the individual standard deviations vary 

as C/N with C = const. 

P (one dimensional Holds the positional angles of the ellipsoid for a descendreal 

array ant. The user must supply initial values if BKORRL = 

of length NP) .TRUE. has been selected. If no better values are known 

initially, one can set P(I) = ATAN(1.0) for all I = 1(1)NP. 

Y (one dimensional Holds the vectors X, S, P, andtheobjective function 

real array of values for the parents of the current generation and the 

length NY) next generation as well. 

ZIELFU (real function) Name of the objective function, to be programmed bythe 

user. 

RESTRI (real function) Name of the function for evaluating all constraints, to be 

programmed by the user. 

GAUSSN (real function) Name of the function used in transforming a uniform random 

number distribution to a Gaussian one.


GLEICH (real function) Name of the function for generating uniformly distributed 

random numbers. 

TKONTR (real function) Name of the run-time monitoring function. 

KANAL (integer) Channel number for output, relates only to messages output 

by subroutine PRUEFG concerning formal errors detected 

in the parameter list of subroutine KORR when the 

latter is called. 

3. Method 

KORR is a development from EVOL, Rechenberg's two membered strategy, andGRUP, 

the older version of Schwefel's multimembered evolution strategy. The method is based 

onavery much simpli ed simulation of biological evolution. See I. Rechenberg, Evolution 

Strategy: Optimization of Technical Systems in Accordance with the Principles of Biological 

Evolution (in German), vol. 15 of Problemata series, Verlag Frommann-Holzboog, 

Stuttgart, 1973 also H.-P. Schwefel, Numerical Optimization of Computer Models, Wiley, 

Chichester, 1981 (translated by M. W. Finnis from Numerische Optimierung von 

Computer-Modellen mittels der Evolutionsstrategie, vol. 26 of Interdisciplinary Systems 

Research, Birkhauser, Basle, Switzerland, 1977). 

The IELTER parameter vectors are used to generate NACHKO new ones by introducing 

small normally distributed random changes. The IELTER best of these are used as 

starting points for the next generation (iteration). At the same time the strategy parameters 

are altered. These are the parameters of the current normal distributions for the 

lengths of the principal axes (standard deviations = step sizes) and the angular position 

of the mutation ellipsoid in N-dimensional space. Selection results in adaptation to local 

topology if the ratio NACHKO/IELTER is set large enough, e.g., at least 6. The random 

variations in the angles are produced by the addition of normally distributed random 

numbers, while those in the step sizes are produced from random numbers with a lognormal 

distribution by multiplication. 


The termination criterion is based on value di erences in the objective function, see under 

KONVKR, EPSILO(3), and EPSILO(4). 

5. Peripheral I/O 

Input: none. 

Output: via channel KANAL, but only if there are formal errors in the 

parameter list of KORR. See under KANAL. 

6. Notes 

The two membered strategy (EVOL) usually has the shortest run time of all these evolution 

strategies (the EVOL, GRUP, and KORR codings so far developed) because ordinary 

(serial) computers can test the descendants only one after another in a generation, whereas 

in nature they are tested in parallel. The run times of the multimembered strategies increase 

less rapidly than in proportion to NACHKO because the convergence rate taken


over the generations tends to increase with NACHKO. However, there are frequent instances 

where even the simpler multimembered scheme (GRUP) has a run time less than 

that of EVOL because GRUP and KORR in principle allow one to adapt the step sizes 

individually to the local topology, which is not possible with EVOL, and this permits one 

to scale the variables in a exible fashion. For this reason, the reliability and attainable 

accuracy are appreciably better than those given by EVOL. 

The new KORR program represents further improvements on GRUP in this respect on 

account of the increased exibility inthemutation ellipsoid, which improves the variability 

of the object variables. In addition to the lengths of the principal axes (standard 

deviations = step sizes) the positions of the principal axes in N-dimensional space are 

strategy parameters that are adjustable within the population. This together with the 

scaling provides directional adaptation to any valleys or ridges in the objective function 

surface. The changes in the object variables are no longer independent but linearly correlated, 

and this improves the convergence rate (with respect to the number of generations) 

quite appreciably in many instances. In special cases, however, there may be an increase 

in the run time arising from the storage and modi cation of the positional angles, and 

also from coordinate transformation. KORR enables the user to test how many strategy 

parameters (=degrees of freedom in the mutation ellipsoid) may be adequate to solve 

his special problem. The correlation can be suppressed completely, in which case KORR 

becomes equivalent toGRUP. Intermediary stages can be implemented by means of the 

NS parameter, the number of mutually independent step sizes. For example, for NS = 2 

< Nwehave ahyperellipsoid of rotation with N-NS rotation axes. KORR di ers from the 

older EVOL and GRUP in being divided into numerous small subroutines. This modular 

structure is disadvantageous as regards core requirement and run time, but it provides 

insight into the mode of operation of the program as a whole, so that it is easier for the 

user to modify the algorithm. 

Although KORR in general allows one to improve the reliability of the optimum search 

there are still two critical situations. Minima at the boundary of the feasible region or 

inavertex are attained only slowly or inaccurately. In any case, certainty of global 

convergence cannot be guaranteed however, numerical tests have shown that the multimembered 

strategy is far better than other search procedures in this respect. It is capable 

of handling separated feasible regions provided that the number of parameters is not large 

and that the initial step sizes are set suitably large. In doubtful cases it is recommendedto 

repeat the search each time with a di erent set of initial values and/or random numbers. 


--------------------------------------------------------- 

SUBROUTINES: PRUEFG, SPEICH, MUTATI, DREHNG, UMSPEI, 

MINMAX, GNPOOL, ABSCHA 

FUNCTIONS : ZIELFU, RESTRI, GAUSSN, GLEICH, TKONTR, 

ZULASS, BLETAL 

--------------------------------------------------------- 

The segment that calls KORR should have the name of the functions ZIELFU, RESTRI,


GLEICH, and TKONTR declared as external. This applies also to the name of any 

function used instead of GAUSSN to convert a uniform distribution to a normal one. 

ZIELFU Objective function, to be programmed by the user in the form: 

------------------------------------------------- 

FUNCTION ZIELFU(N,X) 


... 

ZIELFU=... 

RETURN 

END 

------------------------------------------------- 


The actual values are supplied by KORR. The function should be written on the basis 

that KORR searches for a minimum if a maximum is to be sought, F must be supplied 

with a negative sign. 

RESTRI Constraints function, to be programmed by the user in the general style: 

------------------------------------------------- 

FUNCTION RESTRI(J,N,X) 


GOTO(1,2,3,...,(M)),J 

1 RESTRI=... 

RETURN 

2 RESTRI=... 

RETURN 

... 

... 

(M) RESTRI=... 

RETURN 

END 

------------------------------------------------- 

N and X have the meanings described for the objective function, while J (integer) is 

the serial number of the constraint. The statements should be written on the basis that 

KORR will accept vector X as feasible if RESTRI 0.0 for all J = 1(1)M. 

TKONTR The function for monitoring the computation time may bede nedby the user 

or called from the subroutine library in the particular machine. The following structure 

is assumed: 

REAL FUNCTION TKONTR(D) 

where D is a dummy parameter. TKONTR should be assigned the monitored quantity, 

e.g., the CPU time in seconds limited by TGRENZ. Many computers are supplied with


ready-made timing software. If this is given as a function, only its name needs to be 

supplied to KORR instead of TKONTR as a parameter. 

GLEICH Function for generating a uniform random number distribution in the range 

(0,1]. The structure must be: 

REAL FUNCTION GLEICH(D) 

where D is arbitrary. GLEICH is the value of the random number. The standard library 

usually includes a suitable program, in which case only the appropriate name has to 

be supplied to KORR. The other subroutines and functions are explained brie y in the 

program itself. 

--------------------------------------------------------- 

SUBROUTINE KORR 

1(IELTER,BKOMMA,NACHKO,IREKOM,BKORRL,KONVKR,IFALLK, 

2TGRENZ,EPSILO,DELTAS,DELTAI,DELTAP,N,M,NS,NP,NY, 

3ZSTERN,XSTERN,ZBEST,X,S,P,Y,ZIELFU,RESTRI,GAUSSN, 

4GLEICH,TKONTR,KANAL) 

LOGICAL BKOMMA,BKORRL,BFATAL,BKONVG,BLETAL 

DIMENSION EPSILO(4),XSTERN(N),X(N),S(NS),P(NP), 

1Y(NY) 

COMMON/PIDATA/PIHALB,PIEINS,PIZWEI 

EXTERNAL RESTRI,GAUSSN,GLEICH 

IREKOX = IREKOM / 100 

IREKOS = (IREKOM - IREKOX*100) / 10 

IREKOP = IREKOM - IREKOX*100 - IREKOS*10 

D = 0. 

CALL PRUEFG 

1(IELTER,BKOMMA,NACHKO,IREKOM,BKORRL,KONVKR,TGRENZ, 

2EPSILO,DELTAS,DELTAI,DELTAP,N,M,NS,NP,NY,KANAL, 

3BFATAL) 

C 

C CHECK INPUT PARAMETERS FOR FORMAL ERRORS. 

C 

IF(BFATAL) RETURN 

C 

C PREPARE AUXILIARY QUANTITIES. TIMING MONITORED IN 

C ACCORDANCE WITH THE TKONTR FUNCTION FROM HERE 

C ONWARDS. 

C 

TMAXIM=TGRENZ+TKONTR(D) 

IF(.NOT.BKORRL) GOTO 1 

PIHALB=2.*ATAN(1.) 

PIEINS=PIHALB+PIHALB 

PIZWEI=PIEINS+PIEINS


1 

NL=1+N-NS 

NM=N-1 

NZ=NY/(IELTER+IELTER) 

IF(M.EQ.0) GOTO 2 

C 

C 

C 

CHECK FEASIBILITY OF INITIAL VECTOR XSTERN. 

IFALLK=-1 

ZSTERN=ZULASS(N,M,XSTERN,RESTRI) 

IF(ZSTERN.GT.0.) GOTO 3 

2 IFALLK=1 

ZSTERN=ZIELFU(N,XSTERN) 

3 CALL SPEICH 

1(0,BKORRL,EPSILO,N,NS,NP,NY,ZSTERN,XSTERN,S,P,Y) 

C 

C THE INITIAL VALUES SUPPLIED BY THE USER ARE STORED 

C 

C 

C 

IN FIELD Y AS THE DATA OF THE FIRST PARENT. 

IF(KONVKR.GT.1) Z1=ZSTERN 

ZBEST=ZSTERN 

LBEST=0 

IF(IELTER.EQ.1) GOTO 16 

DSMAXI=DELTAS 

DPMAXI=AMIN1(DELTAP*10.,PIHALB) 

DO 14 L=2,IELTER 

C IF IELTER > 1, THE OTHER IELTER - 1 INITIAL VECTORS 

C ARE DERIVED FROM THE VECTOR FOR THE FIRST PARENT BY 

C MUTATION (WITHOUT SELECTION). THE STRATEGY 

C 

C 

PARAMETERS ARE WIDELY SPREAD. 

DO 4 I=1,NS 

4 S(I)=Y(N+I) 

5 IF(TKONTR(D).LT.TMAXIM) GOTO 501 

IFALLK=-3 

GOTO 42 

501 IF(.NOT.BKORRL) GOTO 7 

DO 6 I=1,NP 

6 P(I)=Y(N+NS+I) 

7 

C 

CALL MUTATI 

1(NL,NM,BKORRL,DSMAXI,DELTAI,DPMAXI,N,NS,NP,X,S,P, 

2GAUSSN,GLEICH) 

C MUTATION IN ALL OBJECT AND STRATEGY PARAMETERS.


C 

8 

DO 8 I=1,N 

X(I)=X(I)+Y(I) 

IF(IFALLK.GT.0) GOTO 9 

C 

C IF THE STARTING POINT IS NOT FEASIBLE, EACH 

C MUTATION IS CHECKED AT ONCE TO SEE WHETHER A 

C FEASIBLE VECTOR HAS BEEN FOUND. THE SEARCH ENDS 

C 

C 

WITH IFALLK = 0 IF THIS IS SO. 

Z=ZULASS(N,M,X,RESTRI) 

IF(Z)40,40,12 

9 IF(M.EQ.0) GOTO 11 

IF(.NOT.BLETAL(N,M,X,RESTRI)) GOTO 11 

C 

C IF A MUTATION FROM A FEASIBLE STARTING POINT 

C RESULTS IN A NON-FEASIBLE X VECTOR, THEN THE STEP 

C SIZES ARE REDUCED (ON THE ASSUMPTION THAT THEY WERE 

C INITIALLY TOO LARGE) IN ORDER TO AVOID THE 

C THE CONSUMPTION OF EXCESSIVE TIME IN DEFINING THE 

C 

C 

THE FIRST PARENT GENERATION. 

DO 10 I=1,NS 

10 S(I)=S(I)*.5 

GOTO 5 

11 Z=ZIELFU(N,X) 

12 IF(Z.GT.ZBEST) GOTO 13 

ZBEST=Z 

LBEST=L-1 

DSMAXI=DSMAXI*ALOG(2.) 

13 CALL SPEICH 

1((L-1)*NZ,BKORRL,EPSILO,N,NS,NP,NY,Z,X,S,P,Y) 

C 

C 

C 

STORE PARENT DATA IN ARRAY Y. 

IF(KONVKR.GT.1) Z1=Z1+Z 

14 

C 

CONTINUE 

C THE INITIAL PARENT GENERATION IS NOW COMPLETE. 

C ZSTERN AND XSTERN, WHICH HOLD THE BEST VALUES, ARE 

C OVERWRITTEN WHEN AN IMPROVEMENT OF THE INITIAL 

C 

C 

SITUATION IS OBTAINED. 

IF(LBEST.EQ.0) GOTO 16


ZSTERN=ZBEST 

K=LBEST*NZ 

DO 15 I=1,N 

15 XSTERN(I)=Y(K+I) 

16 L1=IELTER 

L2=0 

IF(KONVKR.GT.1) KONVZ=0 

C 

C ALL INITIALIZATION STEPS COMPLETED AT THIS POINT. 

C EACH FRESH GENERATION NOW STARTS AT LABEL 17. 

C 

17 L3=L2 

L2=L1 

L1=L3 

IF(M.GT.0) L3=0 

LMUTAT=0 

C 

C LMUTAT IS THE MUTATION COUNTER WITHIN A GENERATION, 

C WHILE L3 IS THE COUNTER FOR LETHAL MUTATIONS WHEN 

C THE PROBLEM INVOLVES CONSTRAINTS. 

C 

IF(BKOMMA) GOTO 18 

C 

C IF BKOMMA=.FALSE. HAS BEEN SELECTED, THE PARENTS 

C MUST BE INCORPORATED IN THE SELECTION. THE DATA FOR 

C THESE ARE TRANSFERRED FROM THE FIRST (OR SECOND) 

C PART OF THE ARRAY Y TO THE SECOND (OR FIRST) PART. 

C IN THIS CASE THE WORST INDIVIDUAL MUST ALSO BE 

C KNOWN, THIS IS REPLACED BY THE FIRST BETTER 

C DESCENDANT. 

C 

CALL UMSPEI 

1(L1*NZ,L2*NZ,IELTER*NZ,NY,Y) 

CALL MINMAX 

1(-1.,L2,NZ,ZSCHL,LSCHL,IELTER,NY,Y) 

C 

C THE GENERATION OF EACH DESCENDANT STARTS AT LABEL 18 

C 

18 K1=L1+IELTER*GLEICH(D) 

C 

C RANDOM CHOICE OF A PARENT OR OF A PAIR OF PARENTS 

C IN ACCORDANCE WITH THE VALUE CHOSEN FOR IREKOM. IF 

C IREKOM=3 OR IREKOM=5, THE CHOICE OF PARENTS IS MADE 

C WITHIN GNPOOL.


C 

19 

C 

K2=L1+IELTER*GLEICH(D) 

CALL GNPOOL 

1(1,L1,K1,K2,NZ,N,IELTER,IREKOS,NS,NY,S,Y,GLEICH) 

C STEP SIZES SUPPLIED FOR THE DESCENDANT FROM THE 

C 

C 

C 

POOL OF GENES. 

IF(BKORRL) CALL GNPOOL 

1(2,L1,K1,K2,NZ,N+NS,IELTER,IREKOP,NP,NY,P,Y,GLEICH) 

C POSITIONAL ANGLES OF ELLIPSOID SUPPLIED FOR THE 

C DESCENDANT FROM THE POOL OF GENES WHEN CORRELATION 

C 

C 

IS REQUIRED. 

CALL MUTATI 

1(NL,NM,BKORRL,DELTAS,DELTAI,DELTAP,N,NS,NP,X,S,P, 


C 

C CALL TO MUTATION SUBROUTINE FOR ALL VARIABLES, 

C INCLUDING POSSIBLY COORDINATE TRANSFORMATION. S 

C (AND P) ARE ALREADY THE NEW ATTRIBUTES OF THE 

C DESCENDANT, WHILE X REPRESENTS THE CHANGES TO BE 

C 

C 

C 

MADE IN THE OBJECT VARIABLES. 

CALL GNPOOL 

1(3,L1,K1,K2,NZ,0,IELTER,IREKOX,N,NY,X,Y,GLEICH) 

C OBJECT VARIABLES SUPPLIED FOR THE DESCENDANT FROM 

C THE POOL OF GENES AND ADDITION OF THE MODIFICATION 

C VECTOR. X NOW REPRESENTS THE NEW STATE OF THE 

C 

C 

DESCENDANT. 

LMUTAT=LMUTAT+1 

IF(IFALLK.GT.0) GOTO 20 

C 

C EVALUATION OF THE AUXILIARY OBJECTIVE FUNCTION FOR 

C 

C 

THE SEARCH FOR A FEASIBLE VECTOR. 

Z=ZULASS(N,M,X,RESTRI) 

IF(Z)40,40,22 

20 

C 

IF(M.EQ.0) GOTO 21 

C CHECK FEASIBILITY OF DESCENDANT. IF THE RESULT IS


C NEGATIVE (LETHAL MUTATION), THE MUTATION IS NOT 

C COUNTED AS REGARDS THE NACHKO PARAMETER. 

C 

IF(.NOT.BLETAL(N,M,X,RESTRI)) GOTO 21 

IF(.NOT.BKOMMA) GOTO 25 

LMUTAT=LMUTAT-1 

L3=L3+1 

IF(L3.LT.NACHKO) GOTO 18 

L3=0 

C 

C TIME CHECK MADE NOT ONLY AFTER EACH GENERATION BUT 

C ALSO AFTER EVERY NACHKO LETHAL MUTATIONS FOR 

C CERTAINTY. 

C 

IF(TKONTR(D).LT.TMAXIM) GOTO 18 

IFALLK=3 

GOTO 26 

21 Z=ZIELFU(N,X) 

C 

C EVALUATION OF OBJECTIVE FUNCTION VALUE FOR THE 

C DESCENDANT. 

C 

22 IF(BKOMMA.AND.LMUTAT.LE.IELTER) GOTO 23 

IF(Z-ZSCHL)24,24,25 

23 LSCHL=L2+LMUTAT-1 

24 CALL SPEICH 

1(LSCHL*NZ,BKORRL,EPSILO,N,NS,NP,NY,Z,X,S,P,Y) 

C 

C TRANSFER OF DATA OF DESCENDANT TO PART OF ARRAY Y 

C HOLDING THE PARENTS FOR THE NEXT GENERATION. 

C 

IF(.NOT.BKOMMA.OR.LMUTAT.GE.IELTER) CALL MINMAX 

1(-1.,L2,NZ,ZSCHL,LSCHL,IELTER,NY,Y) 

C 

C LOOK FOR THE CURRENTLY WORST INDIVIDUAL STORED IN 

C ARRAY Y WITHOUT CONSIDERING THE PARENTS THAT STILL 

C CAN PRODUCE DESCENDANTS IN THIS GENERATION. 

C 

25 IF(LMUTAT.LT.NACHKO) GOTO 18 

C 

C END OF GENERATION. 

C 

26 CALL MINMAX 

1(1.,L2,NZ,ZBEST,LBEST,IELTER,NY,Y)


C 

C LOOK FOR THE BEST OF THE INDIVIDUALS HELD AS 

C PARENTS FOR THE NEXT GENERATION. IF THIS IS BETTER 

C THAN ANY DESCENDANT PREVIOUSLY GENERATED, THE DATA 

C ARE WRITTEN INTO ZSTERN AND XSTERN. 

C 

IF(ZBEST.GT.ZSTERN) GOTO 28 

ZSTERN=ZBEST 

K=LBEST*NZ 

DO 27 I=1,N 

27 XSTERN(I)=Y(K+I) 

28 IF(IFALLK.EQ.3) GOTO 30 

Z2=0. 

K=L2*NZ 

DO 29 L=1,IELTER 

K=K+NZ 

29 Z2=Z2+Y(K) 

CALL ABSCHA 

1(IELTER,KONVKR,IFALLK,EPSILO,ZBEST,ZSCHL,Z1,Z2, 

2KONVZ,BKONVG) 

C 

C TEST CONVERGENCE CRITERION. 

C 

IF(BKONVG) GOTO 30 

C 

C CHECK TIME ELAPSED. 

C 

IF(TKONTR(D).LT.TMAXIM) GOTO 17 

C 

C PREPARE FINAL DATA FOR RETURN FROM KORR IF THE 

C STARTING POINT WAS FEASIBLE. 

C 

30 K=LBEST*NZ 

DO 31 I=1,N 

K=K+1 

31 X(I)=Y(K) 

DO 32 I=1,NS 

K=K+1 

32 S(I)=Y(K) 

IF(.NOT.BKORRL) RETURN 

DO 33 I=1,NP 

K=K+1 

33 P(I)=Y(K) 

RETURN


C 

C PREPARE FINAL DATA FOR RETURN FROM KORR IF THE 

C STARTING POINT WAS NOT FEASIBLE. 

C 

40 DO 41 I=1,N 

41 XSTERN(I)=X(I) 

ZSTERN=ZIELFU(N,XSTERN) 

ZBEST=ZSTERN 

IFALLK=0 

42 RETURN 

END 

--------------------------------------------------------- 

Subroutine PRUEFG 

PRUEFG checks the values given with the parameter list on calling KORR. If discrepancies 

are found, an attempt is made to eliminate them. If this is not possible, e.g., arrays 

required are not appropriately dimensioned, the search for the minimum is not initiated. 

Then PRUEFG outputs a message to the peripheral unit denoted by KANAL on the correction 

of the error or else a warning message. BFATAL supplies KORR with information 

on the outcome of the check as a Boolean value. 

--------------------------------------------------------- 

SUBROUTINE PRUEFG 

1(IELTER,BKOMMA,NACHKO,IREKOM,BKORRL,KONVKR,TGRENZ, 

2EPSILO,DELTAS,DELTAI,DELTAP,N,M,NS,NP,NY,KANAL, 

3BFATAL) 

LOGICAL BKOMMA,BKORRL,BFATAL 

DIMENSION EPSILO(4) 

IREKOX = IREKOM / 100 

IREKOS = (IREKOM - IREKOX*100) / 10 

IREKOP = IREKOM - IREKOX*100 - IREKOS*10 

100 FORMAT(1H ,' CORRECTION. IELTER > 0 . ASSUMED: 2 AND 

1 KONVKR = ',I5) 

101 FORMAT(1H ,' CORRECTION. NACHKO > 0 . ASSUMED: ',I5) 

102 FORMAT(1H ,' WARNING. BETTER VALUE NACHKO >= 6*IELTER') 

103 FORMAT(1H ,' CORRECTION. IF BKOMMA = .TRUE., THEN 

1 NACHKO > IELTER . ASSUMED: ',I3) 

1041 FORMAT(1H ,' CORRECTION. 0 < IREKOX < 6 . ASSUMED: 1') 

1042 FORMAT(1H ,' CORRECTION. 0 < IREKOS < 6 . ASSUMED: 1') 

1043 FORMAT(1H ,' CORRECTION. 0 < IREKOP < 6 . ASSUMED: 1') 

105 FORMAT(1H ,' CORRECTION. IF IELTER = 1, THEN 

1 IREKOM = 111 . ASSUMED: 111') 

106 FORMAT(1H ,' CORRECTION. IF N = 1 OR NS = 1, THEN 

1 BKORRL = .FALSE. . ASSUMED: .FALSE.') 

107 FORMAT(1H ,' CORRECTION. KONVKR > 0 . ASSUMED: ',I5)


108 FORMAT(1H ,' CORRECTION. IF IELTER = 1, THEN 

1 KONVKR > 1 . ASSUMED: ',I5) 

109 FORMAT(1H ,' CORRECTION. EPSILO(',I1,') > 0. . 

1 SIGN REVERSED') 

110 FORMAT(1H ,' WARNING. EPSILO(',I1,') TOO SMALL. 

1 TREATED AS 0. .') 

111 FORMAT(1H ,' CORRECTION. DELTAS >= 0. . 


112 FORMAT(1H ,' WARNING. EXP(DELTAS) = 1. 

1 OVER-ALL STEP SIZE CONSTANT') 

113 FORMAT(1H ,' CORRECTION. DELTAI >= 0. . 


114 FORMAT(1H ,' WARNING. EXP(DELTAI) = 1. 

1 STEP-SIZE RELATIONS CONSTANT') 

115 FORMAT(1H ,' CORRECTION. DELTAP >= 0. . 


116 FORMAT(1H ,' WARNING. DELTAP = 0. 

1 CORRELATION REMAINS FIXED') 

117 FORMAT(1H ,' WARNING. TGRENZ = 0 . ASSUMED: 0') 

119 FORMAT(1H ,' FATAL ERROR. N


2 IF(.NOT.BKOMMA.OR.NACHKO.GE.6*IELTER) GOTO 3 

WRITE(KANAL,102) 

IF(NACHKO.GT.IELTER) GOTO 3 

NACHKO=6*IELTER 

WRITE(KANAL,103)NACHKO 

3 IF(IREKOX.GT.0.AND.IREKOX.LT.6) GOTO 301 

IREKOX=1 


301 IF(IREKOS.GT.0.AND.IREKOS.LT.6) GOTO 302 

IREKOS=1 


302 IF(IREKOP.GT.0.AND.IREKOP.LT.6) GOTO 4 

IREKOP=1 


4 IF(IREKOM.EQ.111.OR.IELTER.NE.1) GOTO 5 

IREKOM=111 

IREKOX=1 

IREKOS=1 

IREKOP=1 


5 IF(.NOT.BKORRL.OR.(N.GT.1.AND.NS.GT.1)) GOTO 6 

BKORRL=.FALSE. 


6 IF(KONVKR.GT.0) GOTO 7 

IF(IELTER.EQ.1) KONVKR=N+N 

IF(IELTER.GT.1) KONVKR=1 

WRITE(KANAL,107)KONVKR 

GOTO 8 

7 IF(KONVKR.GT.1.OR.IELTER.GT.1) GOTO 8 

KONVKR=N+N 

WRITE(KANAL,108)KONVKR 

8 DO 12 I=1,4 

IF(I.EQ.2.OR.I.EQ.4) GOTO 9 

IF(EPSILO(I))10,11,12 

9 IF((1.+EPSILO(I))-1.)10,11,12 

10 EPSILO(I)=-EPSILO(I) 

WRITE(KANAL,109)I 

GOTO 12 

11 WRITE(KANAL,110)I 

12 CONTINUE 

IF(EXP(DELTAS)-1.)13,14,15 

13 DELTAS=-DELTAS 


GOTO 15


14 IF(EXP(DELTAI).NE.1.) GOTO 15 


15 IF(EXP(DELTAI)-1.)16,17,18 

16 DELTAI=-DELTAI 


GOTO 18 

17 IF(IREKOS.GT.1.AND.EXP(DELTAS).GT.1.) GOTO 18 


18 IF(.NOT.BKORRL) GOTO 21 

IF(DELTAP)19,20,21 

19 DELTAP=-DELTAP 


GOTO 21 

20 WRITE(KANAL,116) 

21 IF(TGRENZ.GT.0.) GOTO 22 


22 IF(M.GE.0) GOTO 23 

M=0 


23 IF(N.GT.0) GOTO 24 


RETURN 

24 IF(NS.GT.0) GOTO 25 


RETURN 

25 IF(NP.GT.0) GOTO 26 


RETURN 

26 IF(NS.LE.N) GOTO 27 

NS=N 

WRITE(KANAL,122)N 

27 IF(BKORRL) GOTO 31 

IF(NP.EQ.1) GOTO 28 

NP=1 


28 NYY=(N+NS+1)*IELTER*2 

IF(NY-NYY)29,37,30 


RETURN 

30 NY=NYY 

WRITE(KANAL,125)NY 

GOTO 37 

31 NPP=N*(NS-1)-((NS-1)*NS)/2 

IF(NP-NPP)32,34,33



RETURN 

33 NP=NPP 

WRITE(KANAL,127)NP 

34 NYY=(N+NS+NP+1)*IELTER*2 

IF(NY-NYY)35,37,36 


RETURN 

36 NY=NYY 

WRITE(KANAL,129)NY 

37 BFATAL=.FALSE. 

RETURN 

END 

--------------------------------------------------------- 

Function ZULASS 

This function is required only if there are constraints. If the starting point does not lie 

in the feasible region, ZULASS generates an auxiliary objective function that is used to 

search for a feasible initial vector. 

If ZULASS, the negative sum of the values for the functions representing constraints 

that have been violated, is zero, then X represents a feasible vector that can be used in 

restarting the search with KORR. 

XX represents XSTERN or X. 

--------------------------------------------------------- 

FUNCTION ZULASS 

1(N,M,XX,RESTRI) 

DIMENSION XX(N) 

ZULASS=0. 

DO 1 J=1,M 

R=RESTRI(J,N,XX) 

IF(R.LT.0.) ZULASS=ZULASS-R 

1 CONTINUE 

RETURN 

END 

--------------------------------------------------------- 

Subroutine UMSPEI 

UMSPEI is required only if BKOMMA = .FALSE., whereupon the parents in the source 

generation have to be subject to selection. UMSPEI transposes the data on the parents 

within array Y. 

K1, K2, and KK are auxiliary quantities transmitted from KORR that de ne the number 

and addresses of the data to be transposed.


--------------------------------------------------------- 

SUBROUTINE UMSPEI 

1(K1,K2,KK,NY,Y) 

DIMENSION Y(NY) 

DO 1 K=1,KK 

1 Y(K2+K)=Y(K1+K) 

RETURN 

END 

--------------------------------------------------------- 

Subroutine GNPOOL 

GNPOOL supplies a set of variables for a descendant by drawing on the pool of parents 

taken together in accordance with the type of recombination selected. This subroutine is 

called once each for the object variables X, the strategy variables S, and possibly also P. 

To minimize storage demand, the changes in the object variables by mutation are added 

immediately (J = 3). In intermediary recombination for the positional angle (J = 2), a 

check must be made on the di erence between the parental angles to establish suitable 

mean values. J = 1 denotes the case where step sizes are involved. 

L1 denotes the part of the gene pool from which the parent data are to be drawn if IREKO 

= 3 or IREKO = 5. K1 denotes the parent selected by KORR whose data are to be used 

when IREKO = 1 (no recombination). K1 and K2 denote the two parents whose data are 

to be recombined if IREKO =2orIREKO = 4 has been selected. 

NZ and NN are auxiliary quantities for deriving the addresses in array Y. 

NX representsNorNSorNP, XX representsXorSorP,IREKO represents one of the 

digits of IREKOM, i.e., IREKOX or IREKOS or IREKOP. 

--------------------------------------------------------- 

SUBROUTINE GNPOOL 

1(J,L1,K1,K2,NZ,NN,IELTER,IREKO,NX,NY,XX,Y,GLEICH) 

DIMENSION XX(NX),Y(NY) 


EXTERNAL GLEICH 

IF(J.EQ.3) GOTO 11 

GOTO(1,1,1,7,9),IREKO 

1 KI1=K1*NZ+NN 

IF(IREKO.GT.1) GOTO 3 

DO 2 I=1,NX 

2 XX(I)=Y(KI1+I) 

RETURN 


IF(IREKO.EQ.3) GOTO 5


DO 4 I=1,NX 

KI=KI1 

IF(GLEICH(D).GE..5) KI=KI2 

4 XX(I)=Y(KI+I) 

RETURN 

5 DO 6 I=1,NX 

XX1=Y(KI1+I) 

XX2=Y(KI2+I) 

XXI=(XX1+XX2)*.5 


DXX=XX1-XX2 

IF(ABS(DXX).GT.PIEINS) XXI=XXI+SIGN(PIEINS,DXX) 

6 XX(I)=XXI 

RETURN 

7 DO 8 I=1,NX 

8 XX(I)=Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I) 

RETURN 

9 DO 10 I=1,NX 

XX1=Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I) 

XX2=Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I) 

XXI=(XX1+XX2)*.5 


DXX=XX1-XX2 

IF(ABS(DXX).GT.PIEINS) XXI=XXI+SIGN(PIEINS,DXX) 

10 XX(I)=XXI 

RETURN 

11 GOTO(12,12,12,18,20),IREKO 


IF(IREKO.GT.1) GOTO 14 

DO 13 I=1,NX 

13 XX(I)=XX(I)+Y(KI1+I) 

RETURN 


IF(IREKO.EQ.3) GOTO 16 

DO 15 I=1,NX 

KI=KI1 

IF(GLEICH(D).GE..5) KI=KI2 

15 XX(I)=XX(I)+Y(KI+I) 

RETURN 

16 DO 17 I=1,NX 

17 XX(I)=XX(I)+(Y(KI1+I)+Y(KI2+I))*.5 

RETURN 

18 DO 19 I=1,NX 

19 XX(I)=XX(I)+Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I)


RETURN 

20 DO 21 I=1,NX 

21 XX(I)=XX(I)+(Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I) 

1+Y((L1+IFIX(IELTER*GLEICH(D)))*NZ+NN+I))*.5 

RETURN 

END 

--------------------------------------------------------- 

Subroutine SPEICH 

SPEICH transfers to the data pool Y for the parents of the next generation the data of 

a descendant representing a successful mutation (the object variables X and the strategy 

parameters S (and P, if used) together with the corresponding value of the objective function). 

A check is made that S (and P) fall within speci ed bounds. 

J is the address in array Y from which pointonwards the data are to be written and is 

provided by KORR. 

ZZ represents ZSTERN or Z, XX represents XSTERN or X. 

--------------------------------------------------------- 

SUBROUTINE SPEICH 

1(J,BKORRL,EPSILO,N,NS,NP,NY,ZZ,XX,S,P,Y) 

LOGICAL BKORRL 

DIMENSION EPSILO(4),XX(N),S(NS),P(NP),Y(NY) 


K=J 

DO 1 I=1,N 

K=K+1 

1 Y(K)=XX(I) 

DO 2 I=1,NS 

K=K+1 

2 Y(K)=AMAX1(S(I),EPSILO(1)) 

IF(.NOT.BKORRL) GOTO 4 

DO 3 I=1,NP 

K=K+1 

PI=P(I) 

IF(ABS(PI).GT.PIEINS) PI=PI-SIGN(PIZWEI,PI) 

3 Y(K)=PI 

4 K=K+1 

Y(K)=ZZ 

RETURN 

END 

---------------------------------------------------------


Subroutine MINMAX 

MINMAX searches for the smallest or largest value in a series of values of the objective 

function held in an array. KORR calls this subroutine to determine the best or worst 

parent, in the rst case in order to transfer its data to the location ZBEST (and perhaps 

also ZSTERN and XSTERN) and in the other case in order to give space for a better 

descendant. C=1.0 initiates a search for the best (smallest) value of the function, while 

C=;1.0 does the same for the worst (largest) value. 

LL and NZ are auxiliary quantities used to transmit information on the position of the 

required values within array Y. ZM and LM contain the best (or worst) values of the 

objective function and the number of the corresponding parent minus one. 

--------------------------------------------------------- 

SUBROUTINE MINMAX 

1(C,LL,NZ,ZM,LM,IELTER,NY,Y) 

DIMENSION Y(NY) 

LM=LL 

K1=LL*NZ+NZ 

ZM=Y(K1) 

IF(IELTER.EQ.1) RETURN 

K1=K1+NZ 

K2=(LL+IELTER)*NZ 

KM=LL 

DO 1 K=K1,K2,NZ 

KM=KM+1 

ZZ=Y(K) 

IF((ZZ-ZM)*C.GT.0.) GOTO 1 

ZM=ZZ 

LM=KM 

1 CONTINUE 

RETURN 

END 

--------------------------------------------------------- 

Subroutine ABSCHA 

ABSCHA tests the convergence criterion. If KONVKR = 1 has been selected, the difference 

between the objective function values representing the best and worst parents 

(ZBEST and ZSCHL) must be less than the limits set by EPSILO(3) (absolute) or EP- 

SILO(4) (relative). Then the assignment BKONVG =.TRUE. is made. 

Alternatively, the current di erence ZSCHL-ZBEST is replaced by the change Z1;Z2 in 

the sum of all the parent objective function values occurring after KONVKR generations 

divided by IELTER. 

The Boolean variable BKONVG transmits the result of the convergence test to KORR.


KONVZ is the generation counter if KONVKR > 1. 

--------------------------------------------------------- 

SUBROUTINE ABSCHA 

1(IELTER,KONVKR,IFALLK,EPSILO,ZBEST,ZSCHL,Z1,Z2, 

2KONVZ,BKONVG) 

LOGICAL BKONVG 

DIMENSION EPSILO(4) 

IF(KONVKR.EQ.1) GOTO 1 

KONVZ=KONVZ+1 

IF(KONVZ.LT.KONVKR) GOTO 3 

KONVZ=0 

DELTAF=Z1-Z2 

Z1=Z2 

GOTO 2 

1 DELTAF=(ZSCHL-ZBEST)*IELTER 

2 IF(DELTAF.GT.EPSILO(3)*IELTER) GOTO 3 

IF(DELTAF.GT.EPSILO(4)*ABS(Z2)) GOTO 3 

IFALLK=ISIGN(2,IFALLK) 

BKONVG=.TRUE. 

RETURN 

3 BKONVG=.FALSE. 

RETURN 

END 

--------------------------------------------------------- 

Function GAUSSN 

GAUSSN converts a uniform random number distribution to a normal one. The function 

has been programmed for the trapezium algorithm (J. H. Ahrens and U. Dieter, Computer 

Methods for Sampling from the Exponential and Normal Distributions, Communications 

of the Association for Computing Machinery, vol. 15 (1972), pp. 873-882 and 1047). The 

Box-Muller rules require in many cases (machine-dependent) a longer run time even if 

both of the pair of numbers can be used. 

SIGMA is the standard deviation, which ismultiplied by the random number derived 

from a (0.0,1.0) normal distribution. 

--------------------------------------------------------- 

FUNCTION GAUSSN 

1(SIGMA,GLEICH) 

1 U=GLEICH(D) 

U0=GLEICH(D) 

IF(U.GE..919544406) GOTO 2 

X=2.40375766*(U0+U*.825339283)-2.11402808 

GOTO 10


2 IF(U.LT..965487131) GOTO 4 

3 U1=GLEICH(D) 

Y=SQRT(4.46911474-2.*ALOG(U1)) 

U2=GLEICH(D) 

IF(Y*U2.GT.2.11402808) GOTO 3 

GOTO 9 

4 IF(U.LT..949990709) GOTO 6 


Y=1.84039875+U1*.273629336 

U2=GLEICH(D) 

IF(.398942280*EXP(-.5*Y*Y)-.443299126+Y*.209694057 

1.LT.U2*.0427025816) GOTO 5 

GOTO 9 

6 IF(U.LT..925852334) GOTO 8 


Y=.289729574+U1*1.55066917 

U2=GLEICH(D) 

IF(.398942280*EXP(-.5*Y*Y)-.443299126+Y*.209694057 

1.LT.U2*.0159745227) GOTO 7 

GOTO 9 


Y=U1*.289729574 

U2=GLEICH(D) 

IF(.398942280*EXP(-.5*Y*Y)-.382544556 

1.LT.U2*.0163977244) GOTO 8 

9 X=Y 

IF(U0.GE..5) X=-Y 

10 GAUSSN=SIGMA*X 

RETURN 

END 

--------------------------------------------------------- 

Subroutine DREHNG 

DREHNG is called from MUTATI only if BKORRL = .TRUE. and N > 1. DREHNG 

performs the coordinate transformation of the modi cation vector for the object variables. 

Although the components of this vector are initially mutually independent, they 

are linearly related on account of the rotation speci ed by the positional angles P and so 

are correlated. The transformation involves NP partial rotations, in each ofwhich only 

two of the components of the modi cation vector are involved.


--------------------------------------------------------- 

SUBROUTINE DREHNG 

1(NL,NM,N,NP,X,P) 

DIMENSION X(N),P(NP) 

NQ=NP 

DO 1 II=NL,NM 

N1=N-II 

N2=N 

DO 1 I=1,II 

X1=X(N1) 

X2=X(N2) 

SI=SIN(P(NQ)) 

CO=COS(P(NQ)) 

X(N2)=X1*SI+X2*CO 

X(N1)=X1*CO-X2*SI 

N2=N2-1 

1 NQ=NQ-1 

RETURN 

END 

--------------------------------------------------------- 

Logical function BLETAL 

BLETAL tests the feasibility ofanobjectvariable vector immediately on production if 

constraints are imposed. The rst constraint tobeviolatedcausesBLETAL to signal 

to KORR via the function name (declared as a Boolean variable) that the mutation was 

lethal. 

--------------------------------------------------------- 

LOGICAL FUNCTION BLETAL 

1(N,M,X,RESTRI) 


DO 1 J=1,M 

IF(RESTRI(J,N,X).LT.0.) GOTO 2 

1 CONTINUE 

BLETAL=.FALSE. 

RETURN 

2 BLETAL=.TRUE. 

RETURN 

END 

--------------------------------------------------------- 

Subroutine MUTATI 

MUTATI handles the random alteration of the strategy variables and the object variables. 

First, the step sizes are altered in accordance with the DELTAS and DELTAI


parameters by multiplication by two random factors with log-normal distributions. The 

resulting normal distribution is used in a random vector X that represents the changes 

in the object variables. If BKORRL = .TRUE. is set when KORR is called, i.e., linear 

correlation is required, the positional angle P is also mutated, with random numbers from 

a (0.0,DELTAP) normal distribution added to the original values. Also, DREHNG is 

called in that case to transform the vector of modi cations to the object variable. 

NL and NM are auxiliary quantities transmitted from KORR via MUTATI to DREHNG. 

--------------------------------------------------------- 

SUBROUTINE MUTATI 

1(NL,NM,BKORRL,DELTAS,DELTAI,DELTAP,N,NS,NP,X,S,P, 


LOGICAL BKORRL 

DIMENSION X(N),S(NS),P(NP) 

EXTERNAL GLEICH 

DS=GAUSSN(DELTAS,GLEICH) 

DO 1 I=1,NS 

1 S(I)=S(I)*EXP(DS+GAUSSN(DELTAI,GLEICH)) 

DO 2 I=1,N 

2 X(I)=GAUSSN(S(MIN0(I,NS)),GLEICH) 

IF(.NOT.BKORRL) RETURN 

DO 3 I=1,NP 

3 P(I)=P(I)+GAUSSN(DELTAP,GLEICH) 

CALL DREHNG 

1(NL,NM,N,NP,X,P) 

RETURN 

END 

--------------------------------------------------------- 

Note 

Without modi cations the subroutines EVOL, GRUP, andKORR may be used to solve 

optimization problems with integer (or discrete) and mixed-integer variables. The search 

for an optimum then, however, will only lead into the vicinity of the exact solution. 

The discreteness may be induced by the user when formulating the objective function, by 

merely rounding the correspondent variables to integers or by attributing discrete values 

to them. 

The following two examples will give hints only to possible formulations. In order to get 

the results in the form wanted the variables will have to be transformed in the same manner 

at the end of the optimum search with EVOL, GRUP, orKORR, as is done within 

the objective function.


Example 1 

Minimize 

with xi 

0, integer for all i = 1(1)n 

F (x) = 

nX 

i=1 

(xi ; i) 

--------------------------------------------------------- 



F=0. 

DO 1 I=1,N 

IX=IFIX(ABS(X(I))) 

XI=FLOAT(IX-I) 

F=F+XI*XI 

1 CONTINUE 

RETURN 

END 

--------------------------------------------------------- 

Example 2 

Minimize 

with x 1 from f1.3, 1.5, 2.2, 2.8g only 

F (x) =(x 1 ; 2) 2 +(x 1 ; 2x 2) 2 

--------------------------------------------------------- 


DIMENSION X(N), Y(4) 

DATA Y /1.3,1.5,2.2,2.8/ 

DO 1 I=1,4 

X1=Y(I) 

IF (X(1)-X1) 2,2,1 

1 CONTINUE 

2 F1=X1-2. 

F2=X1-X(2)-X(2) 

F =F1*F1+F2*F2 

RETURN 

END 

---------------------------------------------------------

414 Appendix B

Appendix C 

Programs 

C.1 Contents of the Floppy Disk 

The oppy disk that accompanies this book contains: 

Sources of FORTRAN subroutines of the following direct optimization procedures 

as described in the Chapters 3, 5, and 7 of the book. 

- FIBO Coordinate strategy with Fibonacci division 

boh ( boh.f) calls subroutine bo ( bo.f) 

-GOLD Coordinate strategy with Golden section 

goldh (goldh.f) calls subroutine gold (gold.f) 

-LAGR Coordinate strategy with Lagrangian interpolation 

lagrh (lagrh.f) calls subroutine lagr (lagr.f) 

- HOJE Strategy of Hooke and Jeeves (pattern search) 

hoje (hoje.f) calls subroutine hilf (hilf.f) 

-ROSE Strategy of Rosenbrock (rotating coordinates search) 

rose (rose.f) calls subroutine grsmr (grsmr.f) 

- DSCG Strategy of Davies, Swann, and Campey 

with Gram-Schmidt orthogonalization 

dscg (dscg.f) calls subroutine lineg (lineg.f) 

subroutine grsmd (grsmd.f) 

- DSCP Strategy of Davies, Swann, and Campey 

with Palmer orthogonalization 

dscp (dscp.f) calls subroutine linep (linep.f) 

subroutine palm (palm.f) 

-POWE Powell's strategy of conjugate directions 

powe (powe.f) calls - 

-DFPS Davidon, Fletcher, Powell strategy (Variable metric) 

dfps (dfps.f) calls subroutine seth (seth.f) 

subroutine grad (grad.f) 

function updot (updot.f) calls dot (dot.f) 

function dot (dot.f) 

415

416 Appendix C 

- SIMP Simplex strategy of Nelder and Mead 

simp (simp.f) calls - 

- COMP Complex strategy of M. J. Box 

comp (comp.f) calls - 

-EVOL Two membered evolution strategy 

evol (evol.f) calls function z (included in evol.f) 

-KORR Multimembered evolution strategy 

korr2 (korr2.f) calls function zulass (included in korr2.f) 

function gaussn (included in korr2.f) 

function bletal (included in korr2.f) 

subroutine pruefg (included in korr2.f) 

subroutine speich (included in korr2.f) 

subroutine mutati (included in korr2.f) 

subroutine umspei (included in korr2.f) 

subroutine minmax (included in korr2.f) 

subroutine gnpool (included in korr2.f) 

subroutine abscha (included in korr2.f) 

subroutine drehng (included in korr2.f) 

Additionally, FORTRAN function sources of the 50 test problems are included: 

{ ZIELFU(N,X) one objective function with a computed GOTO for 50 entries. 

{ RESTRI(J,N,X) one constraints function with a computed GOTO for50entries 

and J as current number of the single restriction. 

No runtime package is provided for this set, however. 

C sources for all strategies mentioned above and C sources for the 50 test problems 

(GRUP with option REKO is missing since it has become one special case within 

KORR). 

A set of simple interfaces to run 13 of the above mentioned optimization routines 

with the above mentioned 50 test problems on a PC or workstation. 

C.2 About the Program Disk 

The oppy disk contains both FORTRAN and C sources for each of the strategies described 

in the book. All test problems presented in the catalogue of problems (see appendix 

A) exist as C code. A set of simple interfaces, easy to understand and to expand, 

combines the strategies and functions to OptimA, a ready for use program package. 

The programs are designed to run on a minimally con gured PC using a math-coprocessor 

or having an 80486 CPU and running the DOS or LINUX operating system. To accomplish 

semantic equivalence with the well tested original FORTRAN codes, all strategies 

have been translated via f2c, aFortran-to-C converter of AT&T Bell Laboratories. All 

C codes can be compiled and linked via gcc (Gnu C compiler, version 2.4). Of course,

Running the C Programs 417 

any other ANSI C compiler such as Borland C++ that supports 4-byte-integers should 

produce correct results as well. 

LINUX and gcc are freely available under the conditions of the GNU General Public 

License. Information about ordering the Gnu C compiler in the United States is available 

through the Free Software Foundation by calling 617 876 3296. 

All C programs should compile and run on any UNIX workstation having gcc or another 

ANSI C compiler installed. 

C.3 Running the C Programs 

The following instructions are appropriate for installing and running the C programs on 

your PC or workstation. Installation as well as compilation and linking can be carried 

out automatically. 

C.3.1 How to Install OptimA on a PC Using LINUX 

or on a UNIX Workstation 

First, enter the directory where you want OptimA to be installed. Then copy the installation 

le via mtools by typing the command: 

mcopy a:install.sh . 

If you don't have mtools, copy wb-1p?.tar from oppy toworkspace and untar it. The 

instruction 

sh install.sh 

will copy the whole tree of directories from the disk to your local directory. The following 

directories and subdirectories will be created: 

fortran 

funct 

include 

lib 

rstrct 

strat 

util 

To compile, link, and run OptimA go to the workbench directory and type 

make 

to start a recursive compilation and linking of all C sources.


C.3.2 How to Install OptimA on a PC Under DOS 

First, enter the directory where you want OptimA to be installed. The instruction 

a:INSTALL 

or 

b:INSTALLB 

will copy the whole tree of directories from the disk to your local directory. The same 

directories and subdirectories as mentioned above will be created. To compile, link, and 

run OptimA go to the workbench directory and type 

mkOptimA 

to start a recursive compilation and linking of all C sources. This will take a while, 

depending on how fast your machine works. 

C.3.3 Running OptimA 

After the successful execution of make or mkOptimA, respectively, the executable le OptimA 

is located in the subdirectory bin. Here you can run the program package by issuing the 

command 

OptimA 

First, the program will list the available strategies. After choosing a strategy by typing 

its number, a list of test problems is displayed. Type a number or continue the listing 

by hitting the return key. Depending on the method and the problem, the program will 

ask for the parameters to con gure the strategy. Please refer to Chapter 6 and Appendix 

Atochoose appropriate values. Of course, you are free to de ne your own parameter 

values, but please remember that the behavior of each strategy strongly depends on its 

parameter settings. 

Warnings during the process will inform the user of inappropriate parameter de nitions 

or abnormal program behavior. For example, the message timeout reached warns the 

user that the strategy may nd a better result if the user de ned maximal time were set 

to a larger value. The strategies COMP, EVOL, and KORR will try at most ve restarts after 

the rst timeout occurred. 

If a strategy that can process unrestricted problems only is applied to a restricted problem, 

awarning will be displayed, too. After the acknowledgement of this message by hitting 

the return key, the user can choose another function. 

C.4 Description of the Programs 

The following pages brie y describe the programs on which this package is based. A short 

description of how to incorporate self-de ned problem functions to OptimA follows.

Description of the Programs 419 

The directory FORTRAN lists all the original codes described in the book. The reader may 

write his own interfaces to these programs. For further information please refer to the C 

sources or to Schwefel (1980, 1981). 

All C source codes of the strategies have been translated from FORTRAN to C via f2c. 

Some modi cations in the C sources were done to gain higher portability and to achievea 

homogeneous program behavior. For example, all strategies are minimizing, use standard 

output functions, and perform operations on the same data types. All modi cations did 

not change the semantics of any strategy. 

To each optimization method a dialogue interface has been added. Here the strategy's 

speci c parameter de nition takes place. In the comments within the program listings 

the meaning and usage of each parameter is brie y described. All names of the dialogue 

interfaces end with the su x \ mod.c." The strategies together with the interfaces are 

listed in the directory named strat. 

The whole catalogue of problems (see Appendix A) has been coded as C functions. They 

are collected in the subdirectory funct. 

The problems 2.29 to 2.50 (see Appendix A) are restricted. Therefore, constraints functions 

to these problems were written and listed in directory rstrct. Because in some 

problems the number of constraints to be applied depends on the dimension of the function 

to be optimized, this number has to be calculated. This task is performed by the 

programs with pre x \rsn ." The evaluation of the constraint itself is done in the modules 

with pre x \rst ." A restriction holds if its value is negative. 

All strategies perform operations on vectors of varying dimensions. Therefore a set of 

tools to allocate and to de ne vectors is compiled in the package vec util which islocated 

in the subdirectory util. The procedures from this package are used only in the 

dialogue interfaces. All other programs perform operations on vectors as if they would 

use arrays of arbitrary but xed length. 

The main program \OptimA.c" performs only initialization tasks and runs the dialogue 

within which the user can choose a strategy and a function number. 

The strategies and functions are listed in tables, namely \func tab.c" and \strt tab.c." 

If the user wants to incorporate new problems to OptimA the table \func tab.c" hasto 

be extended. This task is relatively simple for a programmer with little C knowledge if 

he follows the next instructions carefully. 

C.4.1 How to Incorporate New Functions 

The following template is typical for every function de nition: 

#include "f2c.h" 

#include "math.h" 

doublereal probl_2_18(int n,doublereal *x)


{ 

} 

return(0.26*(x[0]*x[0] + x[1]*x[1])-0.48*x[0]*x[1] ) 

Please add your own function into the directory funct. Here you will nd the le 

\func tab.c." Include the formal description of your problem into this table. Atypical 

template looks like: 

{ 

}, 

5, 

rs_nm_x_x, 

restr_x_x, 

"Problem x_x (restricted problem):\n\t x[1]+x[2]+... ", 

probl_x_x 

with the data type de nition: 

struct functions { 

long int dim /* Problem's dimension */ 

long int (*rs_num)() /* Calculates the number */ 

/* of constraints */ 

doublereal (*restrictions)() /* Constraints function */ 

char* name /* Mathem. description */ 

doublereal (*function)() /* Objective function */ 

} 

typedef struct functions funct_t 

The rst item denotes the number of dimensions of the problem. A problem with 

variable dimension will be denoted by a;1. In this case the program should inquire 

the dimension from the user. 

The second entry denotes the function that calculates the numbers of constraints 

to be applied to the problem. If no constraints are needed a NULL pointer has to be 

inserted. 

The next line will be displayed to the user during an OptimA session. This string 

provides a short description of the problem, typically in mathematical notation. 

The last item is a function-pointer to the objective function. 

Please do not add a new formal problem description into the func tab behind the last 

table entry. The latter denotes the end of the table and should not be displaced.

Examples 421 

To inform all problems of the new function, its prototype must be included into the header 

le func names.h. 

As a last step the Makefile has to be extended. The lists FUNCTSRCS and FUNCTOBJS 

denote the les that make up the list of problems. These lists have tobeextendedbythe 

lename of your program code. 

Now step back to the directory C and issue the command make or mkOptimA, respectively, 

to compile \OptimA." 

Restrictions can be incorporated into OptimA like functions. Every C code from the 

directory rstrct can be taken as template. The name of the constraints function and the 

name of the function that calculates the number of constraints has to be included in the 

formal problem description. 

C.5 Examples 

Here two examples of how OptimA works in real life will be presented. The rst one 

describes an application of the multimembered evolution strategy KORR to the corridor 

model (problem 2.37, function number 32). The second example demonstrates a batch 

run. The batch mode enables the user to apply a set of methods to a set of functions in 

one task. 

C.5.1 An Application of the Multimembered Evolution 

Strategy to the Corridor Model 

After calling OptimA and choosing problem 2.37 by typing 32, atypical dialogue will look 

like: 

Multimembered evolution strategy applied to function: 

Problem 2.37 (Corridor model) (restricted problem): 

Sum[-x[i],{i,1,n}] 

Please enter the parameters for the algorithm: 

Dimension of the problem : 3 

Number of restrictions : 7 

Number of parents : 10 

Number of descendants : 100 

Plus (p) or the comma (c) strategy : c 

Should the ellipsoid be able to rotate (y/n) : y 

You can choose under several recombination types:


1 No recombination 

2 Discrete recombination of pairs of parents 

3 Intermediary recombination of pairs of parents 

4 Discrete recombination of all parents 

5 Intermediary recombination of all parents in pairs 

Recombination type for the parameter vector : 2 

Recombination type for the sigma vector : 3 

Recombination type for the alpha vector : 1 

Check for convergence after how many generations (> 2*Dim.) : 10 

Maximal computation time in sec. : 30 

Lower bound to step sizes, absolute : 1e-6 

Lower bound to step sizes, relative : 1e-7 

Parameter in convergence test, absolute : 1e-6 

Parameter in convergence test, relative : 1e-7 

Common factor used in step-size changes (e.g. 1) : 1 

Standard deviation for the angles 

of the mutation ellipsoid (degrees) : 5.0 

Number of distinct step-sizes : 3 

Initial values of the variables : 

0 

0 

0 

Initial step lengths : 

1 

1 

1 

Common factor used in step-size changes : 0.408248 

Individual factor used in step-size changes : 0.537285 

Starting at : F(x) = 0 

Time elapsed : 18.099276 

Minimum found : -300.000000 

at point : 99.999992 100.000000 99.999992 

Current best value of population: -300.000000 

C.5.2 OptimA Working in Batch Mode 

OptimA also supports a batch mode option. This option was introduced to enable a user 

to test the behavior of any strategy by varying parameter settings automatically. Of

Examples 423 

course, any function or method may bechanged during a run, as well. The batch le that 

will be processed should contain the list of input data you would typeinmanually during 

a whole session in non-batch mode. OptimA in batch mode suppresses the listing of the 

strategies and functions. That reduces the output a lot and makes it better readable. 

Atypical batch run looks like: 

OptimA -b < bat_file > results 

With a \bat file" like: 

8 

1 

100.100 

0.98e-6 

0.0e+0 

5 

5 

1 

1 

0.8e-6 

0.8e-6 

0.111 

0.111 

y 

the le \results" maylooklike: 

Method # : 8 

Function # : 1 

DFPS strategy (Variable metric) applied to function: 

Problem 2.1 (Beale): 

(1.5-x*(1-y))^2 + (2.25-x*(1-y^2))^2 + (2.625-x*(1-y^3))^2 

Dimension of the problem : 2 

Maximal computation time in sec. : 100.100000 

Accuracy required : 9.8e-07 

Expected value of the objective function 

at the optimum : 0 

Initial values of the variables : 

5 

5 

Initial step lengths : 

1 

1 

Lower bounds of the step lengths :


8e-07 

8e-07 

Initial step lengths for construction of derivatives : 

0.111 

0.111 

Starting at : F(x) = 403069 

Time elapsed : 0.033332 

Minimum found : 0.000000 

at point : 3.000000 0.500000 

Both examples have been run on a SUN SPARC S10/40workstation. 

The oppy disk included into this book may not be copied, sold, or redistributed without 

the permission of John Wiley & Sons, Inc., New York.

Index 425 

Index 

Aarts, E.H.L., 161 

Abadie, J., 17, 24 

Abe, K., 239 

Ablay, P., 163 

Absolute minimum, see global minimum 

Accuracy of approximation, 26, 27, 29, 

32, 38, 41, 70, 76, 78, 81, 91, 92, 

94, 116, 146, 167, 168, 173, 175, 

206{208, 213, 214, 235 

Accuracy of computation, 12, 14, 32, 35, 

54, 57, 66, 67, 71, 78, 81, 83, 88, 

89, 99, 112{114, 145, 170, 173{ 

175, 206, 209, 236, 329 

Ackley, D.H., 152 

Adachi, N., 77, 81, 82 

Adams, R.J., 96 

Adaptation, 5, 6, 9, 100, 102, 105, 142, 

147, 152 

Adaptive step size random search, 96, 97, 

200 

AESOP program package, 68 

Ahrens, J.H., 116 

AID program package, 68 

Aizerman, M.A., 90 

Akaike, H., 66, 67, 203 

Alander, J.T., 152, 246 

Aleksandrov, V.M., 95 

Algebra, 5, 14, 41, 69, 75, 239 

Alland, A., Jr., 244 

Allen, P., 102 

Allometry, 243 

Allowed region, see feasible region 

Altman, M., 68 

Amann, H., 93 

Analogue computers, 12, 15, 65, 68, 89, 

99, 236 

Analytic optimization, see indirect optimization 

Anders, U., 246 

Anderson, N., 35 

Anderson, R.L., 91 

Andrews, H.C., 5 

Andreyev, V.O., 94 

Animats, 103 

Anscombe, F.J., 101 

Antonov, G.E., 90 

Aoki, M., 23, 93 

Apostol, T.M., 17 

Appelbaum, J., 48 

Applications, 48, 53, 64, 68, 69, 99, 151, 

245{246 

Approximation problems, 5, 14, see also 

sum of squares minimization 

Archer, D.H, 48 

Arrow, K.J., 17, 18, 165 

Arti cial intelligence, 102, 103 

Arti cial life, 103 

Asai, K., 94 

Ashby, W.R., 9, 91, 100, 105 

Atmar, J.W., 151 

Automata, 6, 9, 44, 48, 94, 99, 102 

Avriel, M., 29, 31, 33 

Awdejewa, L.I., 18 

Axelrod, R., 21 

Azencott, R., 161 

Bach, H., 23 

Back, T., 118, 134, 147, 151, 155, 159, 

245, 246, 248 

Baer, R.M., 67 

Balakrishnan, A.V., 11, 18 

Balas, E., 18 

Balinski, M.L., 19

426 Index 

Banach, S., 10 

Bandler, J.W., 48, 115 

Banzhaf, W., 103 

Bard, Y., 78, 83, 205 

Barnes, G.H., 233, 239 

Barnes, J.G.P., 84 

Barnes, J.L., 102 

Barr, D.R., 241 

Barrier penalty functions (barrier methods), 

16, 107 

Bass, R., 81 

Bauer, F.L., 84 

Bauer, W.F., 93 

Beale, E.M.L., 18, 70, 84, 166, 327, 346 

Beamer, J.H., 26, 29, 39 

Beckman, F.S., 69 

Beckmann, M., 19 

Behnken, D.W., 65 

Beier, W., 105 

Beightler, C.S., 1, 23, 27, 32, 38, 87 

Bekey, G.A., 12, 65, 89, 95, 96, 98, 99 

Belew, R.K., 152 

Bell, D.E., 20 

Bell, M., 44, 178 

Bellman, R.W., 11, 38, 102 

Beltrami, E.J., 87 

Bendin, F., 248 

Berg, R.L., 101 

Berlin, V.G., 90 

Berman, G., 29, 39 

Bernard, J.W., 48 

Bernoulli, Joh., 2 

Bertram, J.E., 20 

Bessel function, 129, 130 

Beveridge, G.S.G., 15, 23, 28, 32, 37, 64, 

65 

Beyer, H.-G., 118, 134, 149, 159 

Biasing, 98, 156, 174 

Biggs, M.C., 76 

Binary optimization, 18, 247 

Binomial distribution, 7, 108, 243 

Bionics, 99, 102, 105, 238 

Birkho , G., 48 

Bisection method, 33, 34 

Bjorck, A., 35 

Blakemore, J.W., 23 

Bledsoe, W.W., 239 

Blind random search, see pure random 

search 

Blum, J.R., 19, 20 

Boas, A.H., 26 

Bocharov, I.N., 89, 90 

Boltjanski, W.G. (Boltjanskij, V.G.), 18 

Boltzmann, L., 160 

Bolzano method, 33, 34, 38 

Booker, L.B., 152 

Booth, A.D., 67, 329 

Booth, R.S., 27 

Boothroyd, J., 33, 77, 178 

Born, J., 118, 149 

Borowski, N., 98, 240 

Bossert, W.H., 146 

Bourgine, P., 103 

Box, G.E.P., 6, 7, 65, 68, 69, 89, 101, 115, 

see also EVOP method 

Box, M.J., 17, 23, 28, 54, 56{58, 61, 68, 

89, 115, 174, 332, see also complex 

strategy 

Boxing in the minimum, 28, 29, 32, 36, 

41, 56, 209 

Brachistochrone problem, 11 

Bracken, J., 348 

Brajnes, S.N., 102 

Bram, J., 27 

Branch and bound methods, 18 

Brandl, V., 93 

Branin, F.H., Jr., 88 

Braverman, E.M., 90 

Bremermann, H.J., 100, 101, 105, 238 

Brent, R.P., 23, 27, 34, 35, 74, 84, 88, 89, 

174 

Brocker, D.H., 95, 98, 99 

Broken rational programming, 20 

Bromberg, N.S., 89 

Brooks, S.H., 58, 87, 89, 91{95, 100, 174 

Brown, K.M., 75, 81, 84 

Brown, R.R., 66 

Broyden, C.G., 14, 77, 81{84, 172, 205

Index 427 

Broyden-Fletcher-Shanno formula, 83 

Brudermann, U., 246 

Brughiera, P., 88 

Bryson, A.E., Jr., 68 

Budne, T.A., 101 

Buehler, R.J., 67, 68 

Bunny-hop search, 48 

Burkard, R.E., 18 

Burt, D.A., 48 

Calculus of observations, see observational 

calculus 

Campbell, D.T., 102 

Campey, I.G., 54, see also DSC strategy 

Campos, I., 248 

Canon, M.D., 18 

Cantrell, J.W., 70 

Carroll, C.W., 16, 57, 115 

Cartesian coordinates, 10 

Casey, J.K., 68, 89 

Casti, J., 239 

Catalogue of problems, 110, 205, 325{366 

Cauchy, A., 66 

Causality, 237 

Cea, J., 23, 47, 68 

Cembrowicz, R.G., 246 

Cerny, V., 160 

Chambliss, J.P., 81 

Chandler, C.B., 48 

Chandler, W.J., 239 

Chang, S.S.L., 11, 90 

Charalambous, C., 115 

Chatterji, B.N. and Chatterjee, B., 99 

Chazan, D., 239 

Cherno , H., 75 

Chichinadze, V.K., 88, 91 

2 distribution, 108 

Cholesky, matrix decomposition, 14, 75 

Chromosome mutations, 106, 148 

Circumferential distribution, 95{97, 109 

Cizek, F., 106 

Clayton, D.G., 54 

Clegg, J.C., 11 

Cochran, W.G., 7 

Cockrell, L.D., 93, 99 

Cohen, A.I., 70 

Cohn, D.L., 100 

Collatz, L., 5 

Colville, A.R., 68, 174, 175, 339 

Combinatorial optimization, 152 

Complex strategy, 17, 61{65, 89, 115, 177, 

179, 185, 190, 191, 201, 202, 210, 

212, 213, 216, 217, 228{230, 232, 

327, 341, 346, 357, 361{363, 365, 

366 

Computational intelligence, 152 

Computer-aided design (CAD), 5, 6, 23 

Computers, see analogue, digital, hybrid, 

parallel, and process computers 

Concave, see convex 

Conceptual algorithms, 167 

Condition of a matrix, 67, 180, 203, 242, 

326 

Conjugate directions, 54, 69, 74, 82, 88, 

170{172, 202, see also Powell 

strategy 

Conjugate gradients, 38, 68, 69, 77, 81, 

169{172, 204, 235, see also 

Fletcher-Reeves strategy 

Conrad, M., 103 

Constraints, 8, 12, 14{18, 24, 44, 48, 49, 

57, 62, 87, 90{93, 105, 107, 115, 

119, 134, 150, 176, 212{214, 216, 

236 

Constraints, active, 17, 44, 62, 116, 118, 

213, 215 

Constraints satisfaction problem (CSP), 

91 

Contour tangent method, 39 

Control theory, 9, 11, 18, 23, 70, 88, 89, 

99, 112 

Convergence criterion, 113{114, 145{146, 

see also termination of the search 

Converse, A.O., 23 

Convex, 17, 34, 39, 47, 66, 101, 166, 169, 

236, 239 

Cooper, L., 23, 38, 48, 87

428 Index 

Coordinate strategy, 41{44, 47, 48, 67, 

87, 100, 164, 167, 172, 177, 200, 

202{204, 207, 209, 228{230, 233, 

327, 332, 339, 340, 362, 363, see 

also Fibonacci division, golden 

section, and Lagrangian interpolation 

Coordinate transformation, 241 

Cornick, D.E., 70 

Correlation, 118, 240, 241, 243, 246 

Corridor model objective function, 110, 

116, 120, 123, 124, 134{142, 215, 

231, 232, 351, 352, 361, 364, 365 

Cost of computation, 12, 38, 39, 64, 66, 

74, 89, 90, 92, 168, 170, 179, 204, 

230, 232, 234, see also rate of convergence 

Cottrell, B.J., 67 

Courant, R., 11, 66 

Covariances, 155, 204, 240, 241 

Cowdrey, D.R., 93 

Cox, D.R., 7 

Cox, G.M., 7 

Cragg, E.E., 70 

Created response surface technique, 16, 

57 

Creeping random search, 94, 95, 99, 100, 

236, 237 

Crippen, G.M., 89 

Criterion of merit, 2, 7 

Crockett, J.B., 75 

Crossover, 154 

Crowder, H., 70 

Cryer, C.W., 43 

Cubic interpolation, 34, 37, see also Lagrangian 

and Hermitian interpolation 

Cullum, C.D., Jr., 18 

Cullum, J., 83 

Curry, H.B., 66, 67 

Curse of dimensions, Bellman's, 38 

Curtis, A.R., 66 

Curtiss, J.H., 93 

Curve tting, 35, 64, 84, 151, 246 

Cybernetics, 9, 101, 102, 322 

Dambrauskas, A.P., 58, 64 

Daniel, J.W., 15, 23, 68, 70 

Dantzig, G.B., 17, 57, 88, 166 

Darwin, C., 106, 109, 244 

Davidon, W.C., 77, 81, 82, 170 

Davidon-Fletcher-Powell strategy, see 

DFP strategy 

Davidor, Y., 152 

Davies, D., 23, 28, 54, 56, 57, 76, 81, see 

also Davies-Swann-Campey strategy 

Davies, M., 84 

Davies, O.L., 7, 58, 68 

Davies-Swann-Campey strategy, see DSC 

strategy 

Davis, L., 152 

Davis, R.H., 70 

Davis, R.S., 66, 89 

Davis, S.H., Jr., 23 

Day, R.G., 97 

Debye series, 130 

Decision theory, 94 

Decision tree methods, 18 

De Graag, D.P., 95, 98 

De Jong, K., 152 

Dekker, T.J., 34 

Demyanov, V.F., 11 

Denn, M.M., 11 

Dennis, J.E., Jr., 75, 81, 84 

Derivative-free methods, 15, 40, 80, 83, 

172, 174, see also direct search 

strategies 

Derivatives, numerical evaluation of, 19, 

23, 35, 66, 68, 71, 76, 78, 81, 83, 

95, 97, 170{172 

Descendants, number of, 126, 142{144 

Descent, theory of, 100, 109 

Design and analysis of experiments, 6, 58, 

65, 89 

D'Esopo, D.A., 41 

DeVogelaere, R., 44, 178 

DFP strategy, 77{78, 83, 97, 170{172, 243

Index 429 

DFP-Stewart strategy, 78{81, 177, 178, 

184, 189, 195, 200, 201, 209, 210, 

219, 228{231, 337, 341, 343, 363, 

364 

Diblock search, 33 

Dichotomous search, 27, 29, 33, 39 

Dickinson, A.W., 93, 98, 174 

Dieter, U., 116 

Di erential calculus, 2, 11 

Digital computers, 6, 10{12, 14, 15, 32, 

33, 92, 99, 110, 173, 236 

Dijkhuis, B., 37 

Dinkelbach, W., 17 

Diploidy, 106, 148 

Direct optimization, 13{15, 20 

Direct search strategies, 40{65, 68, 90 

Directed random search, 98 

Discontinuity, 13, 23, 25, 42, 88, 91, 116, 

176, 211, 214, 231, 236, 341, 349 

Discovery, 2 

Discrete distribution, 110, 243 

Discrete optimization, 11, 18, 32, 39, 44, 

64, 88, 91, 108, 152, 160, 243, 247 

Discrete recombination, 148, 153, 156 

Discretization, see parameterization 

Divergence, 35, 76, 169 

Dixon, L.C.W., 15, 23, 29, 34, 35, 58, 71, 

76, 78, 81{83 

Dobzhansky, T., 101 

Dominance and recessiveness, 101, 106, 

148 

Dowell, M., 35 

Draper, N.R., 7, 65, 69 

Drenick, R.F., 48 

Drepper, F.R., 103, 246 

Drucker, H., 61 

DSC strategy, 54{57, 74, 89, 177, 183, 

188, 194, 200{202, 209, 228{230, 

362, 363 

Dubovitskii, A.Ya., 11 

Dueck, G., 98, 164 

Du n, R.J., 14 

Dunham, B., 102 

Dvoretzky, A.,20 

Dynamic optimization, 7, 9, 10, 48, 64, 

89{91, 94, 99, 102, 245, 248 

Dynamic programming, 11, 12, 18, 149 

Ebeling, W., 102, 163 

Edelbaum, T.N., 13 

Edelman, G.B., 103 

E ectivity of a method, see robustness 

E ciency of a method, see rate of convergence 

Eigen, M., 101 

Eigenvalue problems, 5 

Eigenvalues of a matrix, 76, 83, 326 

Eisenberg, M.A., 239 

Eldredge, N., 148 

Elimination methods, see interval division 

methods 

Elitist strategy, 157 

Elkin, R.M., 44, 66, 67 

Elliott, D.F., 83 

Ellipsoid method, 166 

Emad, F.P., 98 

Emery, F.E., 48, 87 

Engelhardt, M., 20 

Engeli, M., 43 

Enumeration methods, see grid method 

Epigenetic apparatus, 153, 154 

Equation, di erential, 15, 65, 68, 93, 246, 

345, 346 

Equations, system of, 5, 13, 14, 23, 39, 

65, 66, 75, 83, 93, 172, 235, 336 

Equidistant search, see grid method 

Erlicki, M.S., 48 

Ermakov, S., 19 

Ermoliev, Yu., 19, 90 

Errors, computational, 47, 174, 205, 209, 

210, 212, 219, 228, 229, 236 

Euclid of Alexandria, 32 

Euclidean norm, 167, 335 

Euclidean space, 10, 24, 49, 97 

Euler, L., 2, 15 

Even block search, 27 

Evolution, cultural, 244 

Evolution, organic, 1, 3, 100, 102, 105, 

106, 109, 142, 153, 237, 238

430 Index 

Evolution strategy, 3, 6, 7, 16, 105{151, 

168, 173, 175, 177, 179, 200, 203, 

210, 213, 219, 228{230, 232{235, 

248, 333, 337, 350, 354, 355, 359, 

361, 364, 365, 367, 413, see also 

two membered and multimembered 

evolution strategies 

Evolution strategy, asynchronous parallel, 

248 

Evolution strategy, parallel, 248 

Evolution strategy, 1=5 success rule, 110, 

112, 114, 116, 118, 142, 200, 213{ 

215, 237, 349, 361 

Evolution strategy (1+1), 105{119, 125, 

163, 177, 185, 191, 200, 203, 212, 

213, 216, 217, 228, 231{233, 328, 

349, 363 

Evolution strategy (1+ ), 123, 134, 145 

Evolution strategy (1 , ), 145 

Evolution strategy (10 , 100), 177, 186, 

191, 200, 203, 211{215, 217, 228, 

231{233 

Evolution strategy ( +1), 119 

Evolution strategy ( + ), 119 

Evolution strategy ( , ), 119, 145, 148, 

238, 244, 248 

Evolution strategy ( ), 247 

Evolution, synthetic theory, 106 

Evolutionary algorithms, 151, 152, 161 

Evolutionary computation, 152 

Evolutionary operation, see EVOP method 

Evolutionary principles, 3, 100, 106, 118, 

146, 244 

Evolutionary programming, 151 

Evolutionism, 244 

EVOP method, 6, 7, 9, 64, 68, 69, 89, 101 

Experimental optimization, 6{9, 36, 44, 

68, 89, 91, 92, 95, 110, 113, 245, 

247, see also design and analysis 

of experiments 

Expert system, 248 

Extreme value controller, see optimizer 

Extremum, see minimum 

Faber, M.M., 18 

Fabian, V., 20, 90 

Factorial design, 38, 58, 65, 68, 246 

Faddejew, D.K. and Faddejewa, W.N., 27, 

67, 240 

Fagiuoli, E., 96 

Falkenhausen, K. von, 246 

Favreau, R.F., 95, 96, 98, 100 

Feasible region, 8, 9, 12, 16, 17, 25, 101 

Feasible region, not connected, 217, 239, 

360 

Feasible starting point, search for, 62, 91, 

115 

Feistel, R., 102, 163 

Feldbaum, A.A., 6, 9, 88{90, 99 

Fend, F.A., 48 

Fiacco, A.V., 16, 76, 81, 115, see also 

SUMT method 

Fibonacci division, 29{32, 38, 177, 178, 

181, 187, 192, 200, 202 

Fielding, K., 83 

Finiteness of a sequence of iterations, 68, 

166, 172 

Finkelstein, J.J., 18 

Fisher, R.A., 7 

Fletcher, R., 24, 38, 68{71, 74, 77, 80{84, 

97, 170, 171, 204, 205, 335, 349 

Fletcher-Powell strategy, see DFP strategy 

Fletcher-Reeves strategy, 69, 70, 78, 170{ 

172, 204, 233, see also conjugate 

gradients 

Flood, M.M., 68, 89 

Floudas, C.A., 91 

Fogarty, L.E., 68 

Fogel, D.B., 151 

Fogel, L.J., 102, 105, 151 

Forrest, S., 152 

Forsythe, G.E., 34, 66, 67 

Fox, R.L., 23, 34, 205 

Frankhauser, P., 246 

Frankovic, B., 9 

Franks, R., 95, 96, 98, 100 

Fraser, A.S., 152

Index 431 

Friedberg, R.M., 102, 152 

Friedmann, M., 41 

Fu, K.S., 94, 99 

Function space, 10 

Functional analysis theory, 11 

Functional optimization, 10{12, 15, 23, 

54, 68, 70, 85, 89, 90, 151, 174 

Furst, H., 98 

Gaede, K.W., 8, 108, 144 

Gaidukov, A.L., 98 

Gal, S., 31 

Galar, R., 102 

Game theory, 5,6,20 

Gar nkel, R.S., 18 

Gauss, C.F., 41, 84 

Gauss-Newton method, 84 

Gauss-Seidel strategy, see coordinate 

strategy 

Gaussian approximation, see sum of 

squares minimization 

Gaussian distribution, see normal distribution 

Gaussian elimination, 14, 75, 172 

Gaviano, M., 96 

Gelatt, C.D., 160 

Gelfand, I.M., 89 

Gene duplication and deletion, 247 

Gene pool, 146, 148 

Generalized least squares, 84 

Genetic algorithms, 151{160 

Genetic code, 153, 154, 243 

Genotype, 106, 152, 153, 157 

Geo rion, A.M., 24 

Geometric programming, 14 

Gerardin, L., 105 

Gersht, A.M., 90 

Gessner, P., 11 

Gibson, J.E., 88, 90 

Gilbert, E.G., 68, 89 

Gilbert, H.D., 90, 98 

Gilbert, P., 239 

Gill, P.E., 81 

Ginsburg, T., 43, 69 

Girsanov, I.V., 11 

Glass, H., 48, 87 

Gla , K., 105 

Glatt, C.R., 68 

Global convergence, 39, 88, 94, 96, 98, 

117, 118, 149, 216, 217, 238, 239 

Global minimum, 24{26, 90, 168, 329, 344, 

348, 356, 357, 359, 360 

Global optimization, 19, 29, 84, 88{91, 

236, 244 

Global penalty function, 16 

Glover, F., 162, 163 

Gnedenko, B.W., 137 

Goldberg, D.E., 152, 154 

Golden section, 32, 33, 177, 178, 181, 187, 

192, 200, 202 

Goldfarb, D., 81 

Goldfeld, S.M., 76 

Goldstein, A.A., 66, 67, 76, 81, 88 

Golinski, J., 92 

Goll, R., 244 

Golub, G.H., 57, 84 

Gomory, R.E., 18 

Gonzalez, R.S., 95 

Gorges-Schleuter, M., 159, 247 

Gorvits, G.G., 174 

GOSPEL program package, 68 

Goto, K., 82 

Gottfried, B.S., 23 

Gould, S.J., 148 

Gradient strategies, 6, 15, 19, 37, 40, 65{ 

69, 88{90, 94, 95, 98, 166, 167, 

171, 172, 174, 235 

Gradient strategies, second order, see 

Newton strategies 

Gradstein, I.S., 136 

Gram-Schmidt orthogonalization, 48, 53, 

54, 57, 69, 177, 178, 183, 188, 194, 

201, 202, 209, 229, 230, 362 

Gran, R., 88 

Graphical methods, 20 

Grasse, P.P., 243 

Grassmann, P., 100 

Grauer, M., 20

432 Index 

Graves, R.L., 23 

Great deluge algorithm, 164 

Greedy algorithm, 162, 248 

Greenberg, H., 18 

Greenberg, H.-J., 162 

Greenstadt, J., 70, 76, 81, 83, 326 

Grefenstette, J.J., 152 

Grid method, 12, 26, 27, 32, 38, 39, 65, 

92, 93, 100, 149, 168, 236 

GROPE program package, 68 

Guilfoyle, G., 38 

Guin, J.A., 64 

Gurin, L.S., 89, 97, 98 

Hadamard, J., 66 

Hadley, G., 12, 17, 166 

Haeckel strategy, 163 

Haefner, K., 103 

Hague, D.S., 68 

Haimes, Y.Y., 10 

Hamilton, P.A., 77 

Hamilton, W.R., 15 

Hammel, U., 245, 248 

Hammer, P.L., 19 

Hammersley, J.M., 93 

Hamming cli s, 154, 155 

Hancock, H., 14 

Handscomb, D.C., 93 

Hansen, P.B., 239 

Haploidy, 148 

Harkins, A., 89 

Harmonic division, 32 

Hartmann, D., 151, 246 

Haubrich, J.G.A., 68 

Heckler, R., 246, 248 

Heidemann, J.C., 70 

Heinhold, J., 8, 108, 144 

Hemstitching, 16 

Henn, R., 20 

Herdy, M., 164 

Hermitian interpolation, 37, 38, 69, 77, 

88 

Herschel, R., 99 

Hertel, H., 105 

Hesse, R., 88 

Hessian matrix (Hesse, L.O.), 13, 69, 75, 

169, 170 

Hestenes, M.R., 11, 14, 69, 70, 81, 172 

Heuristic methods, 7, 18, 40, 88, 91, 98, 

102, 162, 173 

Heusener, G., 245 

Hext, G.R., 57, 58, 64, 68, 89 

Heydt, G.T., 93, 98, 99 

Heynert, H., 105 

Hilbert, D., 10, 11 

Hildebrand, F.B., 66 

Hill climbing strategies, 23 , 85, 87 

Hill, I.D., 33, 178 

Hill, J.C., 88, 90 

Hill, J.D., 94 

Himmelblau, D.M., 23, 48, 81, 87, 174, 

176, 229, 339 

Himsworth, F.R., 57, 58, 64, 68, 89 

History vector method, 98 

Hit-or-miss method, 93 

Ho, Y.C., 68 

Hock, W., 174 

Hodanova, D., 106 

Ho mann, U., 23, 74 

Ho meister, F., 151, 234, 246, 248 

Ho er, A., 151, 246 

Hofmann, H., 23, 74 

Holland, J.H., 105, 152, 154 

Hollstien, R.B., 152 

Holst, W.R., 67 

Homeostat, 9, 91, 100 

Hoo, S.K., 88 

Hooke, R., 44, 87, 90, 92 

Hooke-Jeeves strategy, 44{48, 87, 90, 177, 

178, 182, 188, 193, 200, 202, 210, 

228, 230, 233, 332, 339 

Hopper, M.J., 178 

Horner, computational scheme of, 14 

Horst, R., 91 

Hoshino, S., 57, 81 

Hotelling, H., 36 

House, F.R., 77 

Householder, A.S., 27, 75

Index 433 

Householder method, 57 

Houston, B.F., 48 

Howe, R.M., 68 

Hu, T.C., 18 

Huang, H.Y., 70, 78, 81, 82 

Huberman, B.A., 103 

Huelsman, L.P., 68 

Hu man, R.A., 48 

Hull, T.E., 93 

Human brain, 6, 102 

Humphrey, W.E., 67 

Hunter, J.S., 65 

Hupfer, P., 92, 94, 98 

Hurwicz, L., 17, 18, 165 

Hutchinson, D., 61 

Hwang, C.L., 20 

Hybrid computers, 12, 15, 68, 89, 99, 236 

Hybrid methods, 38, 162{164, 169 

Hyperplane annealing, 162 

Hyslop, J., 206 

Idelsohn, J.M., 93, 94 

Illiac IV, 239 

Imamura, H., 89 

Indirect optimization, 13{15, 27, 35, 75, 

170, 235 

Indusi, J.P., 87 

In mum, 9 

Information theory, 5 

Integer optimization, 18, 247 

Interior point method, 166 

Intermediary recombination, 148, 153, 

156 

Interpolation methods, 14, 27, 33{38 

Interval division methods, 27, 29{33, 41 

Invention, 2 

Inverse Hessian matrix, 77, 78 

Inversion of a matrix, 76, 170, 175 

Isolation, 106, 244 

Iterative methods, 11, 13 

Ivakhnenko, A.G., 102 

Jacobi, C.G.J., 15 

Jacobi method, 65, 326 

Jacobian matrix, 16, 84 

Jacobson, D.H., 12 

Jacoby, S.L.S., 23, 67, 174 

James, F.D., 33, 178 

Janac, K., 90 

Jarratt, P., 34, 35, 84 

Jarvis, R.A., 91, 93, 94, 99 

Jeeves, T.A., 44, 84, 87, 90, 92, see 

also Hooke-Jeeves strategy 

Johannsen, G., 99 

John, F., 166 

John, P.W.M., 7 

Johnk, M.D., 115 

Johnson, I., 38 

Johnson, M.P., 81 

Johnson, S.M., 31, 32 

Jones, A., 84 

Jones, D.S., 81 

Jordan, P., 109 

Kamiya, A., 100 

Kammerer, W.J., 70 

Kantorovich, L.V., 66, 67 

Kaplan, J.L., 64 

Kaplinskii, A.I., 90 

Kappler, H., 18, 166 

Karmarkar, N., 166, 167 

Karnopp, D.C., 93, 94, 96 

Karp, R.M., 239 

Karplus, W.I., 12, 89 

Karr, C.L., 160 

Karreman, H.F., 11 

Karumidze, G.V., 94 

Katkovnik, V.Ya., 88, 90 

Kaupe, A.F., Jr., 39, 44, 178 

Kavanaugh, W.P., 95, 98, 99 

Kawamura, K., 70 

Keeney, R.E., 20 

Kelley, H.J., 15, 68, 70, 81 

Kempthorne, O., 7, 67, 68 

Kenworthy, I.C.,69 

Kesten, H., 20 

Kettler, P.C., 82, 83 

Khachiyan, L.G., 166, 167 

Khovanov, N.V., 96, 102

434 Index 

Khurgin, Ya.I., 89 

Kiefer, J., 19, 29, 31, 32, 178 

Kimura, M., 239 

King, R.F., 35 

Kirkpatrick, S., 160 

Kitajima, S., 94 

Kivelidi, V.Kh., 89 

Kiwiel, K.C., 19 

Kjellstrom, G., 98 

Klerer, M., 24 

Klessig, R., 15, 70 

Klimenko, E.S., 94 

Klingman, W.R., 48, 87 

Klockgether, J., 7, 245 

Klotzler, R., 11 

Kobelt, D., 246 

Koch, H.W., 244 

Kopp, R.E., 18 

Korbut, A.A., 18 

Korn, G.A., 12, 24, 89, 93, 99 

Korn, T.M., 12, 89 

Korst, J., 161 

Kosako, H., 99 

Kovacs, Z., 80, 179 

Kowalik, J.S., 23, 42, 67{69, 84, 174, 334, 

335, 345 

Koza, J., 152 

Krallmann, H., 246 

Krasnushkin, E.V., 99 

Krasovskii, A.A., 89 

Krasulina, T.P., 20 

Krauter, G.E., 246 

Kregting, J., 93 

Krelle, W., 17, 18, 166 

Krolak, P.D., 38 

Kuester, J.L., 18, 58, 179 

Kuhn, H.W., 17, 166 

Kuhn-Tucker theorem, 17, 166 

Kulchitskii, O.Yu., 90 

Kumar, K.K., 160 

Kunzi, H.P., 17, 18, 20, 166 

Kursawe, F., 102, 148, 245, 248 

Kushner, H.J., 20, 90 

Kussul, E., 101 

Kwakernaak, H., 89 

Kwasnicka, H. and Kwasnicki, W., 102 

Kwatny, H.G., 90 

Laarhoven, P.J.M. van, 161 

Lagrange multipliers, 15, 17 

Lagrange, J.L., 2, 15 

Lagrangian interpolation, 27, 35{37, 41, 

56, 64, 73, 80, 89, 101, 177, 182, 

187, 193, 200, 202 

Lam, L.S.-B., 100 

Lance, G.M., 54 

Land, A.H., 18 

Lange-Nielsen, T., 54 

Langguth, V., 89 

Langton, C.G., 103 

Lapidus, L., 68 

Larichev, O.I., 174 

Larson, R.E., 239 

Lasdon, L.S., 70 

Lattice search, see grid method 

Lau ermair, T., 162 

Lavi, A., 23, 48, 93 

Lawler, E.L., 160 

Lawrence, J.P., 87, 98 

Learning (and forgetting), 9, 54, 70, 78, 

98, 101, 103, 162, 236 

Least squares method, see sum of squares 

minimization 

LeCam, L.M., 102 

Lee, R.C.K., 11 

Lehner, K., 248 

Leibniz, G.W., 1 

Leitmann, G., 11, 18 

Lemarechal, C., 19 

Leon, A., 68, 89, 174, 337, 356 

Leonardo of Pisa, 29 

Lerner, A.Ja., 11 

Lesniak, Z.K., 92 

Lethal mutation, 115, 136, 137, 158 

Levenberg, K., 66, 84 

Levenberg-Marquardt method, 84 

Levine, L., 65 

Levine, M.D., 10

Index 435 

Levy, A.V., 70, 78, 81 

Lew, A.Y., 96 

Lew, H.S., 100 

Lewallen, J.M., 174 

Lewandowski, A., 20 

Ley ner, U., 151, 246 

Lilienthal, O., 238 

Lill, S.A., 80, 178, 179 

Lindenmayer, A., 103 

Line search, 25{38, 42, 54, 66, 70, 71, 77, 

89, 101, 167, 170, 171, 173, 180, 

214, 228, see also interval division 

and interpolation methods 

Linear convergence, 34, 168, 169, 172, 173, 

236, 365 

Linear model objective function, 96, 124{ 

127 

Linear programming, 17, 57, 88, 100, 101, 

151, 166, 212, 235, 353 

Little, W.D., 93, 244 

Lobac, V.P., 89 

Local minimum, 13, 23{26, 88, 90, 329 

Locker, A., 102 

Log-normal distribution, 143, 144, 150 

Loginov, N.V., 90 

Lohmann, R., 164 

Long step methods, 66 

Longest step procedure, 66 

Lootsma, F.A., 24, 81, 174 

Lowe, C.W., 69, 101 

Lucas, E., 32 

Luce, A.D., 21 

Luenberger, D.G., 18 

Luk, A., 101 

Lyvers, H.I., 16 

MacDonald, J.R., 84 

MacDonald, P.A., 48 

Machura, M., 54, 179 

MacLane, S., 48 

MacLaurin, C., 13 

Madsen, K., 35 

Mamen, R., 81 

Manderick, B., 152 

Mandischer, M., 160 

Mangasarian, O.L., 18, 24 

Manner, R., 152 

Marfeld, A.F., 6 

Markwich, P., 246 

Marquardt, D.W., 84 

Marti, K., 118 

Masters, C.O., 61 

Masud, A.S.M., 20 

Mathematical biosciences, 102 

Mathematical optimization, 6{9 

Mathematical programming, 15{17, 23, 

85, see also linear, quadratic, and 

non-linear programming 

Mathematization, 102 

Matthews, A., 76, 81 

Matyas, J., 97{99, 240, 338 

Maximum likelihood method, 8 

Maximum, see minimum 

Maybach, R.L., 97 

Mayne, D.Q., 12, 81 

Maze method, 44 

McArthur, D.S., 92, 94, 98 

McCormick, G.P., 16, 67, 70, 76, 78, 81, 

82, 88, 115, 348, see also SUMT 

method 

McGhee, R.B., 65, 68, 89, 93 

McGlade, J.M., 102 

McGrew, D.R., 10 

McGuire, M.R., 239 

McMillan, C., Jr., 18 

McMurtry, G.J., 94, 99 

Mead, R., 58, 84, 97, see also simplex 

strategy 

Medvedev, G.A., 89, 99 

Meerkov, S.M., 94 

Meissinger, H.F., 99 

Meliorization, 1 

Memory gradient method, 70 

Meredith, D.L., 160 

Merzenich, W., 101 

Metropolis, N., 160 

Meyer, J.-A., 103 

Michalewicz, Z., 152, 159 

Michel, A.N., 70

436 Index 

Michie, D., 102 

Mickey, M.R., 58, 89, 95 

Midpoint method, 33 

Miele, A., 68, 70 

Mi in, R., 19 

Migration, 106, 248 

Miller, R.E., 239 

Millstein, R.E., 239 

Milyutin, A.A., 11 

Minima and maxima, theory of, see optimality 

conditions 

Minimax concept, 26, 27, 31, 34, 92 

Minimum, 8, 13, 16, 24, 36 

Minimum 2 method, 8 

Minot, O.N., 102 

Minsky, M., 102 

Miranker, W.L., 233, 239 

Missing links, 1 

Mitchell, B.A., Jr., 99 

Mitchell, R.A., 64 

Mixed integer optimization, 18, 164, 243 

Mize, J.H., 18 

Mlynski, D., 69, 89 

Mockus, J.B., see Motskus, I.B. 

Model, internal (of a strategy), 9, 10, 28, 

38, 41, 90, 169, 204, 231, 235{237 

Model, mathematical (of a system), 7, 8, 

65, 68, 160, 235 

Modi ed Newton methods, 76 

Moler, C., 87 

Moment rosetta search, 48 

Monro, S., 19 

Monte-Carlo methods, 92{94, 109, 149, 

160, 168 

Moran, P.A.P., 101 

More, J.J., 81, 179 

Morgenstern, O., 6 

Morrison, D.D., 84 

Morrison, J.F., 334 

Motskus, I.B., 88, 94 

Motzkin, T.S., 67 

Movshovich, S.M., 96 

Mufti, I.H., 18 

Mugele, R.A., 44 

Muhlenbein, H., 163 

Mulawa, A., 54, 179 

Muller, M.E., 115 

Muller, P.H., 98 

Muller-Merbach, H., 17, 166 

Multicellular individuals, 247 

Multidimensional optimization, 2, 38 , 85 

Multimembered evolution strategy, 101, 

103, 118{151, 153, 158, 235{248, 

329, 333, 335, 344, 347, 355{357, 

359, 360, 362, 363, 365, 366, 375, 

413, see also evolution strategy 

( , )and( + ) 

Multimodality, 12, 24, 85, 88, 157, 159, 

239, 245, 248 

Multiple criteria decision making 

(MCDM), 2, 20, 148, 245 

Munson, J.K., 95 

Murata, T., 44 

Murray, W., 24, 76, 81, 82 

Murtagh, B.A., 78, 82 

Mutation, 3, 100{102, 106{108, 154, 155, 

237 

Mutation rate, 100, 101, 154, 237 

Mutator genes, 142, 238 

Mutseniyeks, V.A., 99 

Myers, G.E., 70, 78, 81 

Nabla operator, 13 

Nachtigall, W., 105 

Nag, A., 58 

Nake, F., 49 

Narendra, K.S., 94 

Nashed, M.Z., 70 

Neave, H.R., 116 

Neighborhood model, 247 

Nelder, J.A., 58, 84, 97 

Nelder-Mead strategy, see simplex strategy 

Nemhauser, G.L., 18 

Nenonen, L.K., 70 

Network planning, 20 

Neumann, J. von, 6 

Neustadt, L.W., 11, 18 

Newman, D.J., 39

Index 437 

Newton, I., 2, 14 

Newton direction, 70, 75{77, 84 

Newtonian interpolation, 27, 35 

Newton-Raphson method, 35, 75, 76, 97, 

167, 169{171 

Newton strategies, 40, 71, 74{85, 89, 171, 

235 

Neyman, J., 102 

Niching, 100, 106, 238, 248 

Nicholls, R.L., 23 

Nickel, K., 168 

Niederreiter, H., 115 

Niemann, H., 5 

Nikolic, Z.J., 94 

Nissen, V., 103 

Nollau, V., 98 

Non-linear programming, 17, 18, 166 

Non-smooth or non-di erentiable optimization, 

19 

Nonstationary optimum, 248 

Norkin, K.B., 88 

Normal distribution, 7, 90, 94, 95, 97, 

101, 108, 116, 120, 128, 153, 236, 

240, 243 

North, J.H., 102 

North, M., 246 

Numerical mathematics, 5, 27, 239 

Numerical optimization, see direct optimization 

Nurminski, E.A., 19 

Objective function, 2, 8 

Observational calculus, 5, 7 

Odd block search, 27 

Odell, P.L., 20 

Oettli, W., 18, 167 

O'Hagan, M., 48, 87 

Oi, K., 82 

Oldenburger, R., 9 

Oliver, L.T., 31 

One dimensional optimization, 25{38, see 

also line search 

One step methods, see relaxation methods 

O'Neill, R., 58, 179 

Ontogenetic learning, 163 

Opacic, J., 88 

Operations research, 5, 17, 20 

Optimal control, see control theory 

Optimality conditions, 2, 13{15, 23, 167 , 

235 

Optimality of organic systems, 99, 100, 

105 

Optimization, prerequisites for, 1 

Optimization problem, 2, 5{8, 14, 20, 24 

Optimizer, 9, 10, 48, 99, 248 

Optimum, see minimum 

Optimum,maintaining (and hunting), see 

dynamic optimization 

Optimum gradient method, 66 

Optimum principle of Bellman, 11, 12 

Oren, S.S., 82 

Ortega, J.M., 5, 27, 41, 42, 82, 84 

Orthogonalization, see Gram-Schmidt 

and Palmer orthogonalization 

Osborne, M.R., 23, 42, 68, 69, 84, 174, 

335, 345 

Osche, G., 106, 119 

Ostermeier, A., 118 

Ostrowski, A.M., 34, 66 

Overadaptation, 148 

Overholt, K.J., 31{33, 178 

Overrelaxation and underrelaxation, 43, 

67 

Owens, A.J., 102, 105, 151 

Page, S.E., 155 

Pagurek, B., 70 

Palmer, J.R., 57, 178 

Palmer orthogonalization, 57, 177, 178, 

183, 188, 194, 202, 209, 230 

Papageorgiou, M., 23 

Papentin, F., 102 

Parallel computers, 161, 163, 234, 239, 

243, 245, 247, 248 

Parameter optimization, 6, 8, 10{13, 15, 

16, 20, 23, 105 

Parameterization, 15, 151, 346 

Pardalos, P.M., 91 

Pareto-optimal, 20, 245

438 Index 

Parkinson, J.M., 61 

Partan (parallel tangents) method, 67{69 

Pask, G., 101 

Path-oriented strategies, 98, 160, 236, 248 

Patrick, M.L., 239 

Pattern recognition, 5 

Pattern search, see Hooke-Jeeves strategy 

Paviani, D.A., 87 

Pearson, J.D., 38, 70, 76, 78, 81, 82, 205 

Peckham, G., 84 

Penalty function, 15, 16, 48, 49, 57, 207 

Perceptron, 102 

Peschel, M., 20 

Peters, E., 163, 248 

Peterson, E.L., 14 

Phenotype, 106, 153{155, 157, 158 

Pierre, D.A., 23, 48, 68, 95 

Pierson, B.L., 82 

Pike, M.C., 33, 44, 178 

Pincus, M., 93 

Pinkham, R.S., 93 

Pinsker, I.Sh., 44 

Pixner, J., 33, 178 

Pizzo, J.T., 23, 67, 174 

Plane, D.R., 18 

Plaschko, P., 151, 246 

Pleiotropy, 243 

Pluznikov, L.N., 94 

Polak, E., 15, 18, 70, 76, 77, 167, 169 

Policy, 11 

Polyak, B.T., 70 

Polygeny, 243 

Polyhedron strategies, see simplex and 

complex strategies 

Ponstein, J., 17 

Pontrjagin, L.S., 18 

Poor man's optimizer, 44 

Population principle, 101, 119, 238 

Posynomes, 14 

Powell, D.R., 84 

Powell, M.J.D., 57, 70, 71, 74, 77, 82, 84, 

88, 97, 170, 202, 205, 335, 337, 

349, see also DFP, DFP-Stewart, 

and Powell strategies 

Powell, S., 18 

Powell strategy, 69{74, 88, 163, 170{172, 

177, 178, 183, 189, 195, 200, 202, 

204, 209, 210, 219, 228{230, 327, 

332, 339, 341, 343, 364 

Poznyak, A.S., 90 

Practical algorithms, 167 

Predator-prey model, 247 

Press, W.H., 115 

Price, J.F., 76, 81, 88 

Probabilistic automaton, 94 

Problem catalogue, see catalogue of problems 

Process computers, 10 

Projected gradient method, 57, 70 

Proofs of convergence, 42, 47, 66, 77, 97, 

167, 168 

Propoi, A.I., 90 

Prusinkiewicz, P., 103 

Pseudo-random numbers, see random 

number generation 

Pugachev, V.N., 95 

Pugh, E.L., 89 

Pun, L., 23 

Punctuated equilibrium, 148 

Pure random search, 91, 92, 100, 237 

Q-properties, 169, 170, 172, 179, 243 

Quadratic convergence, 68, 69, 74, 76, 78, 

81{83, 168, 169, 200, 202, 236 

Quadratic interpolation, see Lagrangian 

and Hermitian interpolation 

Quadratic programming, 166, 233, 235 

Quandt, R.E., 76 

Quasi-Newton method, 37, 70, 76, 83, 89, 

170, 172, 205, 233, 235, see also 

DFP and DFP-Stewart strategies 

Rabinowitz, P., 84 

Rai a, H., 20, 21 

Rajtora, S.G., 82 

Ralston, A., 27 

Random direction, 20, 88, 90, 98, 101, 202 

Random evolutionary operation, see 

REVOP method

Index 439 

Random exchange step, 88, 166 

Random number generation, 115, 150, 210, 

212, 217, 237 

Random sequence, 87, 93 

Random step length, 95, 96, 108 

Random strategies, 3, 12, 19, 87{103, 105, 

240 

Random walk, 247 

Randomness, 87, 91, 93, 237 

Rank one methods, 82, 83, 172 

Raphson, J., see Newton-Raphson method 

Rappl, G., 118 

Raster method, see grid method 

Rastrigin, L.A., 93, 95, 96, 98, 99 

Rate of convergence, 7, 38, 39, 64, 66, 67, 

69, 90, 94{98, 101, 110, 118, 120{ 

141, 167{169, 179{204, 217{232, 

234, 236, 239, 240, 242, see also 

linear and quadratic convergence 

Rauch, S.W., 82 

Rawlins, G.J.E., 152 

Rayleigh-Ritz method, 15 

Razor search, 48 

Rechenberg, I., 6, 7, 97, 100, 105, 107, 

118{120, 130, 142, 149, 164, 168, 

172, 179, 231, 238, 245, 352 

Recognition processes, 102 

Recombination, 3, 101, 106, 146{148, 153{ 

159, 186, 191, 200, 203, 204, 211{ 

213, 215{217, 228, 231, 232, 240, 

335, 355, 357, 363, 365, 366, see 

also discrete and intermediary recombination 

Reeves, C.M., 38, 69, 93, 170, 204, see 

also Fletcher-Reeves strategy 

References, 249{323 

Regression, 8, 19, 84, 235, 246 

Regression, non-linear, 84 

Regula falsi (falsorum), 27, 34, 35, 39 

Reid, J.K., 66 

Rein, H., 100 

Reinsch, C., 14 

Relative minimum, 38, 42, 43, 66, 209 

Relaxation methods, 14, 20, 41, 172, see 

also coordinate strategy 

Reliability, see robustness 

Repair enzymes, 142, 238 

Replicator algorithm, 163 

Restart of a search, 61, 67, 70, 71, 88, 

89, 169, 201, 202, 205, 210, 219, 

228{230, 362, 364 

REVOP method, 101 

Reynolds, O., 238 

Rhead, D.G., 74 

Rheinboldt, W.C., 5, 27, 41, 42, 82, 84 

Ribiere, G., 70, 82 

Rice, J.R., 57 

Richardson, D.W., 155 

Richardson, J.A., 58, 179 

Richardson, M., 239 

Riding the constraints, 16 

Riedl, R., 102, 153 

Ritter, K., 24, 70, 82, 88, 168 

Rivlin, L., 48 

Robbins, H., 19 

Roberts, P.D., 70 

Roberts, S.M., 16 

Robots,6,9,103 

Robustness, 3, 13, 34, 37{39, 53, 61, 64, 

70, 90, 94, 118, 178, 204{217, 236, 

238 

Rocko , M.L., 41 

Rodlo , R.K., 246 

Rogson, M., 100 

Roitblat, H., 103 

Rosen, J.B., 18, 24, 57, 91, 352 

Rosen, R., 100 

Rosenblatt, F., 102 

Rosenbrock, H.H., 23, 29, 48, 50, 54, 343, 

349 

Rosenbrock strategy, 16, 48{54, 64, 177, 

179, 184, 190, 196, 201, 202, 207, 

209, 212, 213, 216, 228, 230{232, 

357, 363, 365, 366 

Rosenman, E.A., 11 

Ross, G.J.S., 84

440 Index 

Rotating coordinates method, see Rosenbrock 

and DSC strategies 

Rothe, R., 25 

Roughgarden, J.W., 102 

Rounding error, see accuracy of computation 

Rozonoer, L.I., 90 

Rozvany, G., 247 

Ruban, A.I., 99 

Rubin, A.I., 95 

Rubinov, A.M., 11 

Rudd, D.F., 98, 356 

Rudelson, L.Ye., 102 

Rudolph, G., 91, 118, 134, 151, 154, 161, 

162, 241, 243, 248 

Rustay, R.C., 68, 89 

Rutishauser, H., 5, 41, 43, 48, 65, 75, 172, 

326 

Rybashov, M.V., 68 

Ryshik, I.M., 136 

Saaty, T.L., 20, 27, 166 

Sacks, J., 20 

Saddle point, 13, 14, 17, 23, 25, 35, 36, 

39, 66, 76, 88, 168, 176, 209, 211, 

345 

Sala , S., 100 

Sameh, A.H., 239 

Samuel, A.L., 102 

Sargent, R.W.H., 78, 82 

Saridis, G.N., 90, 98 

Satterthwaite, F.E., 98, 101 

Saunders, M.A., 57 

Savage, J.M., 119 

Savage, L.J., 41 

Sawaragi, Y., 48 

Sayama, H., 82 

Scaling of the variables, 7, 44, 54, 58, 74, 

146{148, 232, 239 

Scha er, J.D., 152 

Schechter, R.S., 15, 23, 28, 32, 37, 41{43, 

64, 65 

Schee er, L., 14 

Scheel, A., 118 

Schema theorem, 154 

Scheraga, H.A., 89 

Scheuer, E.M., 241 

Scheuer, T., 98, 164 

Schinzinger, R., 67 

Schittkowski, K., 174 

Schley, C.H., Jr., 70 

Schlierkamp-Voosen, D., 163 

Schmalhausen, I.I., 101 

Schmetterer, L., 90 

Schmidt, E., see Gram-Schmidt orthogonalization 

Schmidt, J.W., 35, 39 

Schmitt, E., 20, 90 

Schmutz, M., 103 

Schneider, G., 246 

Schneider, M., 100 

Schrack, G., 98, 240 

Schumer, M.A., 89, 93, 96{99, 101, 200, 

240 

Schuster, P., 101 

Schwarz, H.R., 5, 41, 65, 75, 172, 326 

Schwefel, D., 246 

Schwefel, H.-P., 7, 102, 103, 118, 134, 148, 

151, 152, 155, 163, 204, 234, 239, 

242, 245{248 

Schwetlick, H., 39 

Scott, E.L., 102 

Sebald, A.V., 151 

Sebastian, D.J., 82 

Sebastian, H.-J., 24 

Secant method, 34, 39, 84 

Second order gradient strategies, see Newton 

strategies 

Sectioning algorithms, 14 

Seidel, P.L., 41, see also coordinate strategy 

Selection, 3, 100{102, 106, 142, 153, 157 

Sensitivity analysis, 17 

Separable objective function, 12, 42 

Sequential methods, 27 , 38 , 88, 237 

Sequential unconstrained minimization 

technique, see SUMT method 

Sergiyevskiy, G.M.,69 

Sexual propagation, 3, 101, 106, 146, 147

Index 441 

Shah, B.V., 67, 68 

Shanno, D.F., 76, 82{84 

Shapiro, I.J., 94 

Shedler, G.S., 239 

Shemeneva, V.V., 95 

Shimelevich, L.I., 88 

Shimizu, T., 92 

Shindo, A., 64 

Short step methods, 66 

Shrinkage random search, 94 

Shubert, B.O., 29 

Sigmund, K., 21 

Silverman, G., 84 

Simplex method, see linear programming 

Simplex strategy, 57{61, 64, 84, 89, 97, 

177, 179, 184, 190, 196, 201, 202, 

208, 210, 228{231, 341, 361{364 

Simplex, 17, 58, 353 

Simulated annealing, 160{162 

Simulation, 13, 93, 102, 103, 152, 245, 246 

Simultaneous methods, 26{27, 92, 168, 

237 

Singer, E., 44 

Single step methods, see relaxation methods 

Singularity, 70, 74, 78, 82, 205, 209 

Sirisena, H.R., 15 

Slagle, J.R., 102 

Slezak, N.L., 241 

Smith, C.S., 54, 71, 74 

Smith, D.E., 174 

Smith, F.B., Jr., 84 

Smith, J. Maynard, 21, 102 

Smith, L.B., 44, 178 

Smith, N.H., 98, 356 

Soeder, C.-J., 103 

Somatic mutations, 247 

Sondak, N.E., 66, 89 

Sonderquist, F.J., 48 

Sorenson, H.W., 68, 70 

Southwell, R.V., 20, 41, 43, 65 

Spang, H.A., 93, 174 

Spath, H., 84 

Spears, W., 152 

Spedicato, E., 82 

Spendley, W., 57, 58, 61, 64, 68, 84, 89 

Speyer, J.L., 70 

Sphere model objective function, 110, 117, 

120, 123, 124, 127{134, 142, 173, 

179, 203, 215, 325, 338 

Spider method, 48 

Sprave, J., 247, 248 

Spremann, K., 11 

Stagnation, 47, 58, 61, 64, 67, 87, 88, 100, 

157, 201, 205, 238, 341 

Standard deviation, see variance 

Stanton, E.L., 34 

Stark, R.M., 23 

Static optimization, 9, 10 

Stebbins, G.L., 106 

Steepest descent/ascent, 66{68, 166, 169, 

235 

Steiglitz, K., 87, 96, 98, 99, 101, 200, 240 

Stein, M.L., 14, 67 

Steinberg, D., 23 

Steinbuch, K., 6 

Stender, J., 152 

Step length control, 110{113, 142{145, 

168, 172, 237, see also evolution 

strategy, 1=5 success rule 

Steuer, R.E., 20 

Stewart, E.C., 95, 98, 99 

Stewart, G.W., 78, 84, see also DFP- 

Stewart strategy 

Stiefel, E., 5, 41, 43, 65, 67, 69, 75, 172, 

326 

Stochastic approximation, 19, 20, 64, 83, 

90, 94, 99, 236 

Stochastic optimization, 18 

Stochastic perturbations, 9, 20, 36, 58, 

68, 69, 89, 91, 92, 94, 95, 97, 99, 

236, 245 

Stoer, J., 18 

Stoller, D.S., 241 

Stolz, O., 14 

Stone, H.S., 239 

Storage requirement, 47, 53, 57, 180, 232{ 

234, 236

442 Index 

Storey, C., 23, 50, 54 

Strategy, 2,6,100 

Strategy comparison, 57, 64, 68, 71, 78, 

80, 83, 84, 92, 97, 165{234 

Strategy parameter, 144, 204, 238, 240{ 

242 

Stratonovich, R.L., 90 

Strong minimum, 24, 328, 333 

Strongin, R.G., 94 

Structural optimization, 247 

Struggle for existence, 100, 106 

Suboptimum, 15 

Subpopulations, 248 

Success/failure routine, 29 

Suchowitzki, S.I., 18 

Sugie, N., 38 

Sum of squares minimization, 5, 83, 331, 

335, 346 

SUMT method, 16 

Supremum, 9 

Sutti, C., 88 

Suzuki, S., 352 

Svechinskii, V.B. (Svecinskij, V.B.), 90, 

102 

Swann, W.H., 23, 28, 54, 56, 57, see also 

DSC strategy 

Sweschnikow, A.A., 137 

Sworder, D.D., 83 

Sydow, A., 68 

Sylvester, criterion of, 240 

Synge, J.L., 44 

Sysoyev, V.V., 95 

Szego, G.P., 24, 70, 88 

Tabak, D., 18, 78, 82 

Tabu search, 162{164 

Tabulation method, see grid method 

Takamatsu, T., 82 

Talkin, A.I., 68 

Tammer, K., 24 

Tan, S.T., 17 

Tapley, B.D., 174 

Taran, V.A., 94 

Taylor, G., 84 

Taylor series (Taylor, B.), 75, 84 

Tazaki, E., 64 

Tchebyche approximation (Tschebyschow, 

P.L.), 5, 331, 370 

Termination of the search, 35, 38, 49, 54, 

59, 64, 67, 71, 96, 113, 114, 117, 

145, 146, 150, 167, 168, 175, 176, 

180, 212, 238 

Ter-Saakov, A.P., 69 

Theodicee, 1 

Theory of maxima and minima, 11 

Thom, R., 102 

Thomas, M.E., 20 

Three point scheme, 29 

Threshold strategy, 98, 164 

Tietze, J.L., 70 

Timofejew-Ressowski, N.W., 101 

Todd, J., 326 

Togawa, T., 100 

Tokumaru, H., 82 

Tolle, H., 18, 68 

Tomlin, F.K., 44, 178 

Torn, A., 91 

Total step procedure, 65 

Tovstucha, T.I., 89 

Trabattoni, L., 88 

Trajectory optimization, see functional 

optimization 

Traub, J.F., 14, 27 

Travelling salesperson problem (TSP), 159, 

161 

Treccani, G., 70, 88 

Trial and error, 13, 41 

Trial polynomial, 27, 33{35, 37, 68, 235 

Trinkaus, H.F., 35 

Trotter, H.F., 76 

Tschebyschow, P.L., see Tchebyche approximation 

Tse, E., 239 

Tseitlin, B.M., 44 

Tsetlin, M.L., 89 

Tsypkin, Ya.Z., 6, 9, 89, 90 

Tucker, A.W., 17, 166 

Tui, H., 88

Index 443 

Turning point, see saddle point 

Two membered evolution strategy, 97, 

101, 105{118, 172, 238, 329, 352, 

357, 359, 363, 366, 367, 374, see 

also evolution strategy (1+1) 

Tzschach, H.G., 18 

Ueing, U., 88, 358, 359 

Umeda, T., 64 

Unbehauen, H., 18 

Uncertainty,interval of, 26{28, 32, 39, 92, 

180 

Uniform distribution, 91, 92, 95, 115 

Unimodality, 24, 27, 28, 39, 168, 236 

Uzawa, H., 18 

Vagin, V.N., 102 

Vajda, S., 7, 18 

Vanderplaats, G.N., 23 

VanNice, R.I., 44 

VanNorton, R., 41 

Varah, J.M., 68 

Varela, F.J., 103 

Varga, J., 18 

Varga, R.S., 43 

Variable metric, 70, 77, 83, 169{172, 178, 

233, 242, 243, 246, see also DFP 

and DFP-Stewart strategies 

Variables, 2, 8, 11 

Variance analysis, 8 

Variance ellipse, 109 

Variance methods, 82 

Variational calculus, 2, 11, 15, 66 

Vaysbord, E.M., 90, 94, 99 

Vecchi, M.P., 160 

Venter, J.H., 20 

Vetters, K., 39 

Vilis, T., 10 

Viswanathan, R., 94 

Vitale, P., 84 

Vogelsang, R., 5 

Vogl, T.P., 24, 48, 93 

Voigt, H.-M., 163 

Voltaire, F.M., 1 

Volume-oriented strategies, 98, 160, 236, 

248 

Volz, R.A., 12, 70 

Wacker, H., 12 

Waddington, C.H., 102 

Wagner, K., 151, 246 

Wald, A., 7, 89 

Walford, R.B., 93 

Wallack, P., 68 

Walsh, J., 27 

Walsh, M.J., 102, 105, 151 

Ward, L., 58 

Wasan, M.T., 19 

Wasscher, E.J., 68, 76 

Weak minimum, 24, 25, 113, 328, 332, 

333 

Weber, H.H., 17 

Wegge, L., 76 

Weierstrass, K., theorem of, 25 

Weinberg, F., 18, 91 

Weisman, J., 23, 47, 89 

Weiss, E.A., 48 

Wells, M., 77 

Werner, J., 82 

Wets, R.J.-B., 19 

Wetterling, W., 5 

Wheatley, P., 38 

Wheeling, R.F., 95, 98 

White, L.J., 97 

White, R.C., Jr., 93, 95 

Whitley, L.D., 152, 155 

Whitley, V.W.,76 

Whitting, I.J., 84 

Whittle, P., 18 

Wiedemann, J., 151 

Wiener, N., 6 

Wierzbicki, A.P., 20 

Wilde, D.J., 1, 20, 23, 26, 27, 29, 31{33, 

38, 39, 87 

Wilf, H.S., 27 

Wilkinson, J.H., 14, 75 

Wilson, E.O., 146 

Wilson, K.B., 6, 65, 68, 89

444 Index 

Wilson, S.W., 103 

Witt, U., 103 

Witte, B.F.W., 67 

Witten, I.H., 94 

Witzgall, C., 18 

Wolfe, P., 19, 23, 39, 66, 70, 82, 84, 166, 

360 

Wol , W., 103 

Wolfowitz, J., 19 

Wood, C.F., 47, 48, 89 

Woodside, C.M., 70 

Wright, S.J., 179 

Yates, F., 7 

Youden, W.J., 101 

Yudin, D.B., 90, 94, 99 

Yvon, J.P., 96 

Zach, F., 9 

Zadeh, N., 41 

Zahradnik, R.L., 23 

Zakharov, V.V., 94 

Zangwill, W.I., 18, 41, 66, 71, 74, 170, 

202 

Zehnder, C.A., 18, 91 

Zeleznik, F.J., 84 

Zellnik, H.E., 66, 89 

Zener, C., 14 

Zerbst, E.W., 105 

Zero-one optimization, see binary optimization 

Zettl, G., 344, 348 

Zhigljavsky, A.A., 91 

Zigangirov, K.S., 89 

Zilinskas, A., 91 

Zoutendijk, G., 18, 70 

Zurmuhl, R., 27, 35, 172 

Zwart, P.B., 88 

Zypkin, Ja.S., see Tsypkin, Ya.Z.

Evolution and Optimum Seeking

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?