tuning evolutionary search for closed-loop optimization - Computer ...

TUNING EVOLUTIONARY SEARCH

FOR CLOSED-LOOP OPTIMIZATION

A thesis submitted to the University of Manchester

**for** the degree of Doctor of Philosophy

in the Faculty of Engineering and Physical Sciences

2012

By

Richard Allmendinger

School of **Computer** Science

Contents

Abstract 12

Declaration 13

Copyright 14

Acknowledgements 15

Nomenclature 16

1 Introduction 17

1.1 Objectives and contributions . . . . . . . . . . . . . . . . . . . . . 21

1.2 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.2.1 Publications resulting from the thesis . . . . . . . . . . . . 25

2 Closed-Loop Optimization 26

2.1 Concepts of **closed**-**loop** **optimization** . . . . . . . . . . . . . . . . 26

2.2 Applications of **closed**-**loop** **optimization** . . . . . . . . . . . . . . 29

2.3 Challenges of **closed**-**loop** **optimization** . . . . . . . . . . . . . . . 38

2.4 Design of experiments . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5 Response surface methods . . . . . . . . . . . . . . . . . . . . . . 41

2.6 Simulated evolution in **closed**-**loop** **optimization** . . . . . . . . . . 43

2.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Evolutionary Optimization 46

3.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . 46

3.1.1 Analyzing and explaining **evolutionary** algorithms . . . . . 49

3.1.2 Tuning **evolutionary** algorithms . . . . . . . . . . . . . . . 53

3.2 Optimization in uncertain environments . . . . . . . . . . . . . . 59

2

3.2.1 Optimization subject to noise . . . . . . . . . . . . . . . . 59

3.2.2 Searching **for** robust and reliable solutions . . . . . . . . . 62

3.2.3 Further **for**ms of uncertainties in **optimization** . . . . . . . 63

3.3 Constrained **optimization** . . . . . . . . . . . . . . . . . . . . . . . 64

3.4 Dynamic **optimization** . . . . . . . . . . . . . . . . . . . . . . . . 68

3.4.1 Dynamically-constrained **optimization** . . . . . . . . . . . 71

3.5 Online **optimization** . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.6 Rein**for**cement learning . . . . . . . . . . . . . . . . . . . . . . . . 76

3.6.1 Rein**for**cement learning in **evolutionary** **optimization** . . . . 81

3.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 ERCs: Definitions and Concepts 83

4.1 ERCOPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.1.1 General problem definition of an ERCOP . . . . . . . . . . 84

4.1.2 Further comments on ERCOPs . . . . . . . . . . . . . . . 87

4.2 Specific types of ERCs . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2.1 Fundamental elements of ERCs . . . . . . . . . . . . . . . 88

4.2.2 Commitment relaxation ERCs . . . . . . . . . . . . . . . . 91

4.2.3 Periodic ERCs . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2.4 Random ERCs . . . . . . . . . . . . . . . . . . . . . . . . 95

4.2.5 Commitment composite ERCs . . . . . . . . . . . . . . . . 97

4.3 Theoretical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.3.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.3.2 Modeling ERCs with Markov models . . . . . . . . . . . . 103

4.3.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . 110

4.3.4 Summary of theoretical study . . . . . . . . . . . . . . . . 115

4.4 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5 Strategies **for** Coping with ERCs 118

5.1 Static constraint-handling strategies . . . . . . . . . . . . . . . . . 118

5.1.1 Specific static constraint-handling strategies . . . . . . . . 119

5.1.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 122

5.1.3 Experimental results . . . . . . . . . . . . . . . . . . . . . 129

5.1.4 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

5.1.5 Summary and conclusion . . . . . . . . . . . . . . . . . . . 142

5.2 Learning-based constraint-handling strategies . . . . . . . . . . . 144

3

5.2.1 Offline learning-based strategy . . . . . . . . . . . . . . . . 145

5.2.2 Online learning-based strategy . . . . . . . . . . . . . . . . 146

5.2.3 Experimental analysis . . . . . . . . . . . . . . . . . . . . 147

5.2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . 154

5.3 Online resource-purchasing strategies . . . . . . . . . . . . . . . . 155

5.3.1 Specific online purchasing strategies . . . . . . . . . . . . . 156

5.3.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 163

5.3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . 165

5.3.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . 168

5.4 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

6 Further Resourcing Issues 170

6.1 Changes of variables . . . . . . . . . . . . . . . . . . . . . . . . . 170

6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

6.1.2 Fair mutation . . . . . . . . . . . . . . . . . . . . . . . . . 172

6.1.3 Testing Environment . . . . . . . . . . . . . . . . . . . . . 174

6.1.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 176

6.1.5 Experimental results . . . . . . . . . . . . . . . . . . . . . 180

6.1.6 Summary and conclusion . . . . . . . . . . . . . . . . . . . 185

6.2 Lethal environments . . . . . . . . . . . . . . . . . . . . . . . . . 187

6.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.2.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 189

6.2.3 Experimental results . . . . . . . . . . . . . . . . . . . . . 195

6.2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . 200

6.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

7 Conclusion 204

7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

A Further Results of Strategies Applied to ERCOPs 211

A.1 Commitment composite ERCs . . . . . . . . . . . . . . . . . . . . 211

A.2 Periodic ERCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

B The D-MAB Algorithm 217

Word Count: 56459

4

List of Tables

2.1 Domains of **closed**-**loop** applications. . . . . . . . . . . . . . . . . 29

3.1 Nomenclature used in **evolutionary** algorithms . . . . . . . . . . . 47

4.1 Different types of ephemeral resource constraints (ERCs) and examples.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.1 EA parameter settings as used in the study of static constrainthandling

strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.2 Parameter settings of static constraint-handling strategies. . . . . 128

5.3 Parameter settings of test functions f as used in the study of static

constraint-handling strategies. . . . . . . . . . . . . . . . . . . . . 129

5.4 Parametersettingsofstaticandlearning-basedconstraint-handling

strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.5 Parameter settings of online resource-purchasing strategies. . . . . 165

6.1 Parameter settings of dynamic **optimization** techniques. . . . . . . 178

6.2 Parameter settings of test functions f modelling problems subject

to changes of variables. . . . . . . . . . . . . . . . . . . . . . . . . 179

6.3 Accuracy and adaptability results of various dynamic **optimization**

strategies on problems subject to changes of variables . . . . . . . 187

6.4 Default parameter settings of **search** algorithms. . . . . . . . . . . 194

6.5 Average best solution fitness obtained with various parameter settings

of **search** algorithms optimizing problems subject to lethal

environments (N = 50,K = 4,α = 2.0) . . . . . . . . . . . . . . . 202

6.6 Average best solution fitness obtained with various parameter settings

of **search** algorithms optimizing problems subject to lethal

environments (N = 50,K = 10,α = 2.0) . . . . . . . . . . . . . . 203

6

List of Figures

1.1 Schematic of the experimental setup as used by Schwefel (1968) in

the shape **optimization** of a flashing nozzle . . . . . . . . . . . . . 18

2.1 Schematic of **closed**-**loop** **optimization** . . . . . . . . . . . . . . . . 28

2.2 Experimental setup as used in the configuration **optimization** of

an FPGA chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3 Experimental setup as used in an application **for** quantum control 33

2.4 Experimental setup as used in the robot scientist study . . . . . . 35

2.5 Schematic of the **closed** **loop** as employed with EVOP . . . . . . . 37

2.6 Box-Behnken experimental design . . . . . . . . . . . . . . . . . . 40

2.7 A response surface plot . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1 Visualization of one-point crossover and bit flip mutation . . . . . 49

3.2 Modeling genetic algorithms with a dynamical systems approach . 51

3.3 Classification of parameter setting methods **for** EAs . . . . . . . . 54

3.4 Comparison between a brute-**for**ce and a statistical racing method 55

3.5 The general scheme of an EA employing an adaptive operator selection

scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.6 Visualization of the effect of a noisy fitness function . . . . . . . . 60

3.7 Evolution strategy with rescaled mutation . . . . . . . . . . . . . 62

3.8 Visualization of the effect of approximation error in meta-models . 64

3.9 A dynamically changing fitness landscape . . . . . . . . . . . . . . 68

3.10 A schematic illustration of the effect of anticipation . . . . . . . . 75

3.11 The agent-environment interaction in rein**for**cement learning . . . 76

4.1 Distribution of evaluable **search** space E t and feasible region X . . 86

4.2 Schematic of the constraint time frame, preparation and recovery

period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.3 An illustration of a commitment relaxation ERC . . . . . . . . . . 91

7

4.4 An illustration of a periodic ERC . . . . . . . . . . . . . . . . . . 95

4.5 An illustration of a random ERC . . . . . . . . . . . . . . . . . . 97

4.6 A visual example of a commitment composite ERC . . . . . . . . 99

4.7 Theoretical simulation result of periodic ERCs as a function of

selection steps (f(A) = f(B)) . . . . . . . . . . . . . . . . . . . . 111

4.8 Theoretical simulation result of periodic ERCs as a function of the

activation period (f(A) = f(B)) . . . . . . . . . . . . . . . . . . . 112

4.9 Theoretical simulation result of periodic ERCs as a function of

selection steps (f(A) ≠ f(B)) . . . . . . . . . . . . . . . . . . . . 113

4.10 Theoretical simulation result of periodic ERCs as a function of the

preparation time and activation period (f(A) ≠ f(B)) . . . . . . 114

4.11 Theoretical simulation result of periodic ERCs as a function of the

recovery time (f(A) ≠ f(B)) . . . . . . . . . . . . . . . . . . . . . 115

4.12 Theoretical simulation result of periodic ERCs as a function of the

fitness ratio f(A)/f(B)) . . . . . . . . . . . . . . . . . . . . . . . 116

5.1 Drift-effect caused by repairing . . . . . . . . . . . . . . . . . . . 119

5.2 Illustration of repairing strategies . . . . . . . . . . . . . . . . . . 123

5.3 A TwoMax function . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.4 Empirical results of commitment relaxation ERCs on OneMax . . 131

5.5 Empirical results of commitment relaxation ERCs on OneMax

(heat map I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.6 Empirical results of commitment relaxation ERCs on OneMax

(heat map II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.7 Empirical results of commitment relaxation ERCs on MAX-SAT

(heat map) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.8 Empirical results of periodic ERCs on OneMax . . . . . . . . . . 136

5.9 Empirical results of periodic ERCs on OneMax (heat map I) . . . 137

5.10 Empirical results of periodic ERCs on OneMax (heat map II) . . 138

5.11 Empirical results of commitment relaxation ERCs on NKα landscapes

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5.12 EmpiricalresultsofcommitmentrelaxationERCsonafitnesslandscape

interpolated from real-world data . . . . . . . . . . . . . . . 142

5.13 Empirical results of learning-based constraint-handling strategies . 150

5.14 Optimal actions learnt by a rein**for**cement learning agent . . . . . 151

5.15 Arms played by the multi-armed bandit algorithm . . . . . . . . . 152

8

5.16 Per**for**mance of rein**for**cement learning agent as a function of different

parameter settings . . . . . . . . . . . . . . . . . . . . . . . 153

5.17 Learning-based strategies per**for**ming on an unknown fitness landscape

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.18 Visualization of an online resource-purchasing strategy . . . . . . 160

5.19 Empirical results of commitment composite ERCs on MAX-SAT

(heat map) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

5.20 Empirical results of commitment composite ERCs on MAX-SAT

as a function of available budget . . . . . . . . . . . . . . . . . . . 167

5.21 Empirical results of commitment composite ERCs on MAX-SAT

(heat map-comparison) . . . . . . . . . . . . . . . . . . . . . . . . 168

6.1 An illustration of the effect of changing variables . . . . . . . . . . 171

6.2 Empirical results of dynamic **optimization** strategies optimizing

MJ models subject to frequent changes of few variables . . . . . . 181

6.3 Empirical results of dynamic **optimization** strategies optimizing

NK landscapes subject to frequent changes of few variables . . . . 182

6.4 Empirical results of dynamic **optimization** strategies optimizing

MJ models subject to frequent changes of few variables . . . . . . 183

6.5 Empirical results of dynamic **optimization** strategies optimizing

NK landscapes subject to frequent changes of few variables . . . . 184

6.6 Empirical results of dynamic **optimization** strategies optimizing

MJ models subject to frequent changes of few variables . . . . . . 185

6.7 Empirical results of dynamic **optimization** strategies optimizing

NK landscapes subject to frequent changes of few variables . . . . 186

6.8 Empiricalresultsof**search**algorithmsonproblemssubjecttolethal

environments (heat map) . . . . . . . . . . . . . . . . . . . . . . . 195

6.9 Empiricalresultsof**search**algorithmsonproblemssubjecttolethal

environments (heat map-comparison) . . . . . . . . . . . . . . . . 197

6.10 Effect of population size on the average best solution fitness obtained

with various **search** algorithms optimizing problems subject

to lethal environments . . . . . . . . . . . . . . . . . . . . . . . . 199

A.1 Empirical results of commitment relaxation ERCs on MAX-SAT . 212

A.2 Empirical results of commitment relaxation ERCs on TwoMax . . 213

A.3 Empirical results of periodic ERCs on MAX-SAT . . . . . . . . . 214

9

A.4 Empirical results of periodic ERCs on TwoMax . . . . . . . . . . 215

A.5 Empirical results of periodic ERCs on TwoMax (heat map I) . . . 216

A.6 Empirical results of periodic ERCs on TwoMax (heat map II) . . 216

10

List of Algorithms

3.1 A basic generational **evolutionary** algorithm . . . . . . . . . . . . 48

3.2 A single generation of deterministic crowding (Mahfoud, 1995) . . 70

3.3 Tabular TD(λ) **for** estimating V π . . . . . . . . . . . . . . . . . . 78

3.4 Tabular Sarsa(λ), an on-policy TD control method . . . . . . . . 80

4.1 Implementation of a commitment relaxation ERC . . . . . . . . . 92

4.2 Illustration of how we manage ERCs and the evaluation of solutions 93

4.3 Implementation of a periodic ERC . . . . . . . . . . . . . . . . . 96

4.4 Implementation of a random ERC . . . . . . . . . . . . . . . . . . 98

4.5 Implementation of a commitment composite ERC . . . . . . . . . 100

5.1 Generational EA with static constraint-handling strategies . . . . 124

5.2 Steady-stateEAwithstaticandlearning-basedconstraint-handling

strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

5.3 Just-in-time strategy with repairing . . . . . . . . . . . . . . . . . 161

5.4 Generational EA with online resource-purchasing strategies . . . . 164

6.1 Fair mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.2 Generational EA with dynamic **optimization** strategies **for** handling

problems subject to changes of variables . . . . . . . . . . . 177

6.3 Tournament selection based GA (TGA) . . . . . . . . . . . . . . . 191

6.4 Population of hill-climbers (PHC) . . . . . . . . . . . . . . . . . . 193

11

Abstract

Closed-**loop** **optimization** deals with problems in which candidate solutions are

evaluated by conducting experiments, e.g. physical or biochemical experiments.

Although this **for**m of **optimization** is becoming more popular across the sciences,

it may be subject to rather unexplored resourcing issues, as any experiment may

require resources in order to be conducted. In this thesis we are concerned with

understanding how **evolutionary** **search** is affected by three particular resourcing

issues — ephemeral resource constraints (ERCs), changes of variables, and lethal

environments — and the development of **search** strategies to combat these issues.

The thesis makes three broad contributions. First, we motivate and **for**mally

define the resourcing issues considered. Here, concrete examples in a range of

applications are given. Secondly, we theoretically and empirically investigate the

effect of the resourcing issues considered on **evolutionary** **search**. This investigation

reveals that resourcing issues affect **optimization** in general, and that clear

patterns emerge relating specific properties of the different resourcing issues to

per**for**mance effects. Thirdly, we develop and analyze various **search** strategies

augmented on an **evolutionary** algorithm (EA) **for** coping with resourcing issues.

To cope specifically with ERCs, we develop several static constraint-handling

strategies, and investigate the application of rein**for**cement learning techniques

to learn when to switch between these static strategies during an **optimization**

process. We also develop several online resource-purchasing strategies to cope

with ERCs that leave the arrangement of resources to the hands of the optimizer.

For problems subject to changes of variables relating to the resources, we find

that knowing which variables are changed provides an optimizer with valuable

in**for**mation, which we exploit using a novel dynamic strategy. Finally, **for** lethal

environments, where visiting parts of the **search** space can cause the permanent

loss of resources, we observe that a standard EA’s population may be reduced in

size rapidly, complicating the **search** **for** innovative solutions. To cope with such

scenarios, we consider some non-standard EA setups that are able to innovate

genetically whilst simultaneously mitigating risks to the evolving population.

12

Declaration

No portion of the work referred to in this thesis has been

submitted insupport ofanapplication**for**another degree

or qualification of this or any other university or other

institute of learning.

13

Copyright

i. The author of this thesis (including any appendices and/or schedules to

this thesis) owns certain copyright or related rights in it (the “Copyright”)

and s/he has given The University of Manchester certain rights to use such

Copyright, including **for** administrative purposes.

ii. Copies of this thesis, either in full or in extracts and whether in hard or

electronic copy, may be made only in accordance with the Copyright, Designs

and Patents Act 1988 (as amended) and regulations issued under it

or, where appropriate, in accordance with licensing agreements which the

University has from time to time. This page must **for**m part of any such

copies made.

iii. TheownershipofcertainCopyright, patents, designs, trademarksandother

intellectual property (the “Intellectual Property”) and any reproductions of

copyright works in the thesis, **for** example graphs and tables (“Reproductions”),

which may be described in this thesis, may not be owned by the

author and may be owned by third parties. Such Intellectual Property and

Reproductions cannot and must not be made available **for** use without the

priorwrittenpermissionoftheowner(s)oftherelevantIntellectualProperty

and/or Reproductions.

iv. Further in**for**mation on the conditions under which disclosure, publication

and commercialisation of this thesis, the Copyright and any Intellectual

Property and/or Reproductions described in it may take place is available

in the University IP Policy (see http://documents.manchester.ac.uk/

DocuInfo.aspx?DocID=487), inany relevant Thesis restriction declarations

deposited in the University Library, The University Library’s regulations

(seehttp://www.manchester.ac.uk/library/aboutus/regulations) and

in The University’s policy on presentation of Theses

14

Acknowledgements

First and **for**emost I would like to thank my PhD supervisor, Joshua Knowles, **for**

his valuable moral support, constructive criticism, help and feedback throughout

the progress of this work, and **for** being a good friend too.

I am also very grateful to Hans-Paul Schwefel, Alan Reynold, David Corne,

and Warwick Dunn **for** answering many questions about their experience with

resource constraints.

Thanks to Arjun Chandra and Julia Handl **for** reading portions of my thesis

and making useful suggestions.

Thanks also to Mauricio, Michalis, Kevin, Adam, Ahmad, Peter, Ruofei, Nicolo,

Manuela, Stefan, Freddy, and Richard **for** making the office more fun and

enjoyable!

To Neil Lawrence **for** being a fair opponent in internal swimming and cycling

competitions, interesting conversations about sports, and the BBQs.

Warm thanks to all the people I have met and lived with, and made my time

in Manchester more enjoyable, particularly Eyad, Liz, Claudia, Hagen, Abdul,

Chris, Dan, Steve, Zoe, Lotti, Freddy, and Karla.

Finally, many thanks to my family, without whom I would never have been

able to do what I did.

15

Nomenclature

σ t

Set of problem-specific and timeevolving

parameters

E t

Set of evaluable solutions at time step

t

k

Length of activation period

t start

ctf

Time step at which the constraint time

frame of an ERC starts

t end

ctf

Time step at which the constraint time

frame of an ERC is finished

T

Total **optimization** time

H

Constraint schema

l(H)

Length of a constraint schema

o(H)

Order of a constraint schema

V

Duration of an epoch

commRelaxERC(t start

ctf ,t end

ctf ,V,H) Representation of a commitment relaxation

ERC

P

Period length

perERC(t start

ctf ,t end

ctf ,k,P,H) Representation of a periodic ERC

p

Activation probability

randERC(t start

ctf ,t end

ctf ,k,o(H),p) Representation of a random ERC

H #

High-level constraint schema

#SC

Number of storage cells

TL

Time lag

RN

Reuse number

SL

Shelf life

c order

Cost per submitted composite order

c time step

Cost per time step

C

Budget

commCompERC(H # ,#SC,TL, Representation of a commitment composite

RN,SL)

ERC

16

Chapter 1

Introduction

In the late 60s, Hans-Paul Schwefel reported an ingenious set of experiments designed

to optimize the shape of a flashing nozzle (Schwefel, 1968; Klockgether

and Schwefel, 1970). Figure 1.1 illustrates the setup employed. The aim was to

set aside traditional design principles — and the equations of fluid dynamics —

and turn the design process instead over to a series of trial and error experiments

akin to natural evolution. Schwefel was using an early **for**m of **evolutionary** algorithm

(EA) and evaluating designs, not through simulation, but by conducting

real (physical) experiments. Although expensive (i.e. time consuming and/or resource

expensive), this setup is effective because experiments replace the need

**for** having available, or designing, sufficient mathematical models of the problem

being solved.

This thesis will consider problems featuring experimental setups of very similar

character to that of Schwefel’s, which are nowadays commonly referred to as

**closed**-**loop** **optimization** problems. Thedistinct differencebetweentheseproblems

andstandard**optimization**problems is thatwhile candidatesolutions (genotypes)

toaproblem(e.g.setofparametervaluesspecifying nozzleshapes) areplanned on

a computer, their phenotypes (e.g. an actual flashing nozzle) are realized or prototyped

ex-silico (e.g. relying onaphysical experiment of some sort). In general, the

process of measuring the fitness (or quality) of the phenotype involves conducting

a physical experiment too, as it was also the case in Schwefel’s setup. Over

the years, many scientific and technological problems — including in areas like

quantum control, analytical biochemistry, robotics, electronics design, medicine,

food science, neuroscience, and other sciences (an overview of **closed**-**loop** applications

will be given in Section 2.2) — have been tackled using an experimental

17

18 CHAPTER 1. INTRODUCTION

Conical rings

Boiler

Superheated

water

Fluid steam

Measuring device

Velocity of

fluid steam

Shape of nozzle

**Computer** selects

next nozzle shape

Figure 1.1: Schematic of the experimental setup as used by Schwefel (1968) in the

shape **optimization** of a flashing nozzle. The nozzle consisted of a series of conical brass

rings, each having its own diameter. The quality or fitness of the nozzle was measured

by injecting superheated water into one side of the nozzle using a pressurized boiler,

and measuring the velocity of the fluid steam at the other side of the nozzle. Based on

this quality measure, a computer then selected the next nozzle shape **for** testing.

procedure similar to that of Schwefel’s.

However, the reliance on physical experiments comes along with a variety of

challengestobefacedbyanexperimentalist/optimizer. Schwefel’s applicationfeatures

several challenges that are also common to other problems in this domain,

suchasnoisyorimprecise measurements, uncontrolledenvironmental factors, and

expensive fitnessevaluations; anoverview ofcommonchallengesin**closed**-**loop****optimization**will

beprovided inSection2.3. Tocopewiththese challenges, Schwefel

andother pioneering re**search**ers in**closed**-**loop****optimization**(Box, 1957;Schwefel,

1968; Klockgether and Schwefel, 1970; Rechenberg, 1973; Schwefel, 1975; Rechenberg,

2000) have opted **for** methods based on simulated evolution, such as EAs. 1

EAs have several advantages over other optimizers when it comes to **closed**-**loop**

**optimization**(see Section 2.6**for**examples) but oneimportant practical reason **for**

their usage is that they employ a set of solutions rather than a single one. This

property is convenient when experimental equipment is capable of doing many

experiments in parallel.

There seems to be an increasing interest in applying EAs to **closed**-**loop** **optimization**

(Calzolari et al., 2008; Wong et al., 2008; Chen et al., 2009b; Knowles,

2009; Shir et al., 2009; Caschera et al., 2010), due perhaps to the increasing popularity

of EAs as **optimization** methods across the sciences, the greater availability

1 When an EA is used, **closed**-**loop** **optimization** is sometimes referred to as **evolutionary**

experimentation or experimental evolution.

19

of automation of experiments in areas like physics, chemistry and the biosciences,

and a push towards more interdisciplinary working. The tutorials by Shir and

Bäck (2009) and by Bäck et al. (2010), and recent keynote talks at international

conferences such as GECCO (Bedau, 2010) and PPSN (Michalewicz, 2010) are

further evidence of a burgeoning interest.

Whilst there is growing evidence that **closed**-**loop** **optimization** works, there

are various unexplored resourcing issues an experimentalist may face during an

iterative experimental **loop**. The aim of this thesis is to understand how **evolutionary**

**search** is affected by three particular resourcing issues in a **closed****loop**

**optimization** scenario — ephemeral resource constraints (ERCs), changes

of variables, and lethal environments — and the development of **search** strategies

to combat these issues. The resourcing issue around ERCs was encountered by

Knowles in several **closed**-**loop** **optimization** settings (see O’Hagan et al. (2005,

2007); Knowles (2009); Jarvis et al. (2010)), while changing variables were encountered

in our own collaborative work (Small et al., 2011). 2 The third issue,

lethal environments, is a challenge that may occur when equipment is potentially

destructible as in software testing of autonomous robots or nanotechnology

testing (we have not encountered this particular issue ourselves but believe that

similar scenarios will arise in future).

The primary resourcing issue we are focusing on is related to the temporary

non-availability of resources required in the evaluation process of solutions. This

situation may cause solutions that are perfectly feasible candidate solutions to

the problem to be temporarily non-realizable and thus not available **for** fitness

assessment; we will refer to such solutions as (temporarily) non-evaluable. We

refer to the constraints specifying which solutions are not evaluable at a given

time as ephemeral resource constraints (ERCs), and any **optimization** problem

that involves ERCs as an ephemeral resource-constrained **optimization** problem

(ERCOP).

Although the resourcing issue around ERCs has not been raised much in the

literaturetodate, atleastinthetermssimilar tothoseweuseinthisthesis, precedents

doexist. Wehavespokentoseveral peoplewhodo, orhavedone**closed**-**loop**

work, and they have recounted how they dealt with the issue. For example, from

discussions with Schwefel (April 2009 - January 2010), we know that he faced

2 Note that ERCs are not discussed specifically in the work of O’Hagan et al. (2005, 2007);

Jarvis et al. (2010), but they existed and were circumvented using waiting or other resourcewasteful

strategies.

20 CHAPTER 1. INTRODUCTION

ERCs in his flashing nozzle work. In fact, not knowing in advance which shape

would be realized, Schwefel sometimes ran out of brass rings of certain diameters.

In that case, additional brass rings were ordered, produced and delivered,

causing the **optimization** process to wait until these resources arrived; that is,

here ERCs prevented the genotype-phenotype mapping. In modern **closed**-**loop**

problems, such as combinatorial drug discovery and **optimization** of analytical

instrument configurations (Weuster-Botz and Wandrey, 1995; Davies et al., 2000;

King et al., 2004; O’Hagan et al., 2007; Knowles, 2009), there are several aspects

that may further complicate the process of ensuring resource availability. For instance,

the storage space **for** resources may be limited, there may be time lags

between ordering resources and receiving them, and resources may be associated

with shelf-lives or may be viable **for** usage only a limited number of times.

In another problem encountered by Schwefel, the fitness of a single solution

wasmeasuredbyrunningatime-consumingsimulationonacomputer(whilethere

were no resources required in the genotype-phenotype mapping). During some

simulations, the process ended prematurely (i.e. an execution error or exception

occurred)andnofitnesswasreturned.FromadiscussionwithReynoldsandCorne

(September 2010) we know that they faced a similar problem recently; execution

errors were also encountered by other re**search**ers including Booker et al. (1999);

Büche et al. (2002).

Motivated by a **closed**-**loop** problem involving the identification of effective

drug combinations drawn fromanon-static drug library, the next resourcing issue

considered in this thesis is related to problems featuring dynamically changing

variables. The effect of changing variables is that we introduce a change in the

fitness landscape we are optimizing over, which may cause the global optimum to

shift; notice the difference to ERCOPs, which feature a static fitness landscape

but dynamic constraints. Problems that are subject to changing variables can be

seen as traditional dynamic **optimization** problems (Branke, 2001). However, in

thescenario considered inthisthesis anoptimizer knows andmay even beallowed

to control when and which variables (drugs) are replaced. Compared to standard

dynamic **optimization** problems, this property yields two important advantages:

(i) changes in the fitness landscape do not need to be detected explicitly be**for**e

a potentially shifting optimum can be tracked, and (ii) tracking of optima is

simplified because there is a correlation between the changed variables and the

parts of the landscape that undergo a change. While these two aspects provide

1.1. OBJECTIVES AND CONTRIBUTIONS 21

clearly some useful extra knowledge, the challenge remaining is how to exploit

this knowledge efficiently to accelerate tracking of optima.

The final resourcing issue we are considering is related to **optimization** in

lethal environments. The general setup is that we wish to evolve (i.e. improve)

a finite population of entities, but when a lethal solution (i.e. one that damages

equipment) is evaluated, it is immediately removed from the population and

the population size is reduced by one. This models certain **closed**-**loop** evolution

scenarios that may be encountered when the hardware on which individuals are

tested are reconfigurable, destructible and non-replaceable. A motivational example

of an **optimization** problem subject to lethal environments will be given in

Section 6.2.1. In contrast to the other two resourcing issues (ERCs and changes

of variables) lethal environments feature a static fitness landscape and feasible

region; the dynamic component is associated with the fact that the population

size may reduce over the course of the **optimization**.

1.1 Objectives and contributions

Our goal is to gain a better understanding of why and how the three resourcing

issues mentioned above affect the application of **evolutionary** **search** **for** **closed****loop**

**optimization**, and consequently devise **search** strategies **for** combating these

issues.

We claim the thesis makes the following contributions:

• A general problem definition of ERCOPs, including the development of

various types of ERCs. As our primary focus is on the investigation of the

effects of ERCs; a large body of the thesis is devoted to advancing this

subject (Chapter 4).

• An initial theoretical analysis of the effect of ERCs on simple EAs using

Markovchains. Themajoraimofthisanalysisistounderstandwhetherand

how ERCs should affect configuration choices of an EA; the configuration

choices considered relate to different selection and reproduction schemes

commonly used within EAs (Chapter 4).

• Various static constraint-handling strategies, including repairing, waiting,

and penalty strategies, **for** coping with ERCs. The empirical study of these

strategies provides evidence that knowing about the type of ERC may be

22 CHAPTER 1. INTRODUCTION

sufficient to select an effective constraint-handling strategy **for** efficiently

coping with it a priori, even when knowledge of the fitness landscape is

limited (Chapter 5).

• Empirical investigation of two learning-based approaches **for** coping with

ERCs. Both approaches aim at learning when to switch between various

static constraint-handling strategies during the **optimization** process. However,

whileoneapproachlearnsthis taskoffline using arein**for**cement learning

method, theother learns it online using a multi-armed bandit algorithm

(Chapter 5).

• Various online resource-purchasing strategies to cope with ERCs that leave

the arrangement of resources to the hands of the optimizer. We consider a

specific scenario motivated by a real-world ERC (Chapter 5).

• A novel technique, which we call fair mutation, **for** dealing with problems

featuring changing variables. To track potentially shifting optima upon a

variable change, fair mutation exploits the correlation existing between

the changed variables and the parts of the landscape undergoing a change

(Chapter 6).

• Guidelines**for****tuning**simpleEAconfigurationparameters(degree-of-elitism,

population size, selection pressure, mutation mode and strength, and crossover

rate) **for** coping with lethal environments. The guidelines are validated

on different fitness landscape topologies, helping us understand the underlying

challenging features of landscapes in lethal environments (Chapter 6).

The thesis contributions in a wider scope are:

• A review of the field of **closed**-**loop** **optimization** including general concepts,

applications, challenges, and **optimization** algorithms employed to address

these challenges (Chapter 2).

• A taxonomy **for** describing ERCOPs and the different ERCs types we introduce

(Chapter 4).

• A variety of detailed suggestions on directions of future re**search** that we

believe are necessary and fruitful in combating resourcing issues in **closed****loop**

**optimization** (Chapter 7).

1.2. OUTLINE OF THE THESIS 23

1.2 Outline of the thesis

In Chapter 2, we review literature specifically relating to **closed**-**loop** **optimization**.

After defining general concepts of **closed**-**loop** **optimization**, we review a

variety of **closed**-**loop** applications we found in the literature across the sciences

and engineering. Following this review we summarize the main challenges common

to **closed**-**loop** applications, and finally give an overview of methods used to

tackle**closed**-**loop**problems. Inparticular, weintroducethedesignofexperiments

discipline, response surface methods, and methods based on simulated evolution

and EAs. This thesis favours the application of EAs to **closed**-**loop** problems.

The reasons **for** this choice will be described in detail in Section 2.6.

In Chapter 3, we review the application of EAs to problem classes that are related

to **closed**-**loop** problems with resourcing issues, including noisy, constrained,

dynamic, and online **optimization** problems. Prior to this review we give a more

detailed description of the basic principles and concepts of **evolutionary** **search**,

and outline techniques employed **for**: (i) theoretically analyzing and explaining

the working of EAs, and (ii) **tuning** the per**for**mance of EAs. The final part of

this chapter is devoted to the field of rein**for**cement learning (RL), which offers a

variety of tools **for** **tuning** EAs. We introduce the fundamental ideas of RL and

review several studies that looked at extending EAs with RL concepts to improve

per**for**mance. This concludes the review of literature; the other chapters present

original re**search**.

In Chapter 4, we give a general problem definition of an ERCOP and define

several types of ERCs derived from the experimental work of Knowles (2009).

The second part of this chapter presents an initial theoretical analysis, using

Markov chains, of the effect of one of the ERC types on common selection and

reproduction schemes used within an EA. The theoretical analysis concludes that

an order relation-based selection operator, such as tournament selection, is more

robust to simple ERCs than a fitness proportionate-based selection operator. A

perhaps moresurprising result isthat, while anEA withanon-elitist generational

reproduction scheme converges more quickly to some optimal population state

than with a non-elitist steady state scheme during unconstrained **optimization**

periods, the opposite is the case during activation periods of an ERC. This result

implies that ERCs should be accounted **for** when **tuning** EAs **for** ERCOPs. An

empirical study of ERCOPs follows in the subsequent chapter.

In Chapter 5, we develop and empirically analyze various constraint-handling

24 CHAPTER 1. INTRODUCTION

strategies **for** coping with the ERCs introduced in Chapter 4. First we develop

several general static constraint-handling strategies including repairing, penalizing

and waiting strategies. Some of these strategies are inspired by existing

strategies used **for** coping with standard (static) constraints, and some others are

realizations of the tested approaches of Schwefel, Reynolds and Corne, described

in the preface of this chapter. The empirical analysis reveals that different strategiesshouldbefavoureddependingonthepropertiesofanERC,andgivesevidence

that it is possible to select a suitable strategy **for** an ERCOP offline, if the ERCs

are known in advance. Inspired by this observation, we investigate whether an

EA that learns offline (using a rein**for**cement learning approach) when to switch

between the static strategies during the **optimization** process can do better than

the static strategies themselves. We consider also a strategy that learns the same

switching task online using a multi-armed bandit algorithm. Finally, we investigatestrategies**for**copingwithanERCtypethatrequiresanoptimizertopurchase

resources online and pre-emptively so that they arrive in time to be used in the

evaluation process of solutions. This ERC imposes three additional limitations

related to the storage space, the shelf life and reuse number of the resources purchased.

We show that this challenging ERC affects EA per**for**mance significantly

but that certain online purchasing strategies that we propose and investigate here

can be effective against it.

In Chapter 6, we consider the resourcing issues related to changing variables

in the first part of the chapter, and lethal environments in the latter. A change

in the variables causes parts of the fitness landscape to change in unpredictable

ways; the fact that we know which variables are changed means that tracking

of a potentially shifting optimum is slightly easier than in a standard dynamic

**optimization** scenario. Based on this, we propose a novel strategy, which we call

fair mutation, and compare it against standard techniques in dynamic **optimization**.

For coping with lethal environments, we do not propose novel strategies

per se but look at how this issue can be dealt with best by **tuning** some of the

basic EA configuration parameters: degree-of elitism, population size, selection

pressure, mutation mode and strength, and crossover rate. We also vary aspects

of the fitness landscape we are optimizing to observe which landscape topologies

pose a particular challenge when optimizing in a lethal environment.

In Chapter 7, we conclude the thesis, summarizing what we have learnt about

1.2. OUTLINE OF THE THESIS 25

the effects of resourcing issues on **evolutionary** **search** when applied within **closed****loop**

**optimization**, and how EAs can be extended to cope with these issues. In

addition we summarize directions **for** further re**search**.

1.2.1 Publications resulting from the thesis

Referred journal papers

B. G. Small, B. W. McColl, R. Allmendinger, J. Pahle, G. López-Castejón, N. J.

Rothwell, J. Knowles, P. Mendes, D. Brough, and D. B. Kell (2011). Efficient

discovery of anti-inflammatory small molecule combinations using **evolutionary**

computing. Nature Chemical Biology, 7:902–908, 2011.

Referred conference papers

R. Allmendinger and J. Knowles (2011). Evolutionary **search** in lethal environments.

In Proceedings of the Evolutionary Computation Theory and Applications

Conference, to appear.

R. Allmendinger and J. Knowles (2011). Policy learning in resource-constrained

**optimization**. In Proceedings of the Genetic and Evolutionary Computation Conference.

ACM Press, pages 1971-1978.

R. Allmendinger and J. Knowles (2010). Evolutionary **optimization** on problems

subject to changes of variables. In Proceedings of Parallel Problem Solving from

Nature – PPSN XI. Springer LNCS 6239, pages 151-160.

R. Allmendinger and J. Knowles (2010). On-line purchasing strategies **for** an **evolutionary**algorithmper**for**mingresource-constrained**optimization**.

InProceedings

of Parallel Problem Solving from Nature – PPSN XI. Springer LNCS 6239, pages

161-170.

Under review

R. Allmendinger and J. Knowles (2011). Policies **for** dealing with ephemeral resource

constraints in **closed**-**loop** **optimization**. IEEE Transactions on Evolutionary

Computation, under review.

Chapter 2

Closed-Loop Optimization

In the previous chapter, we introduced the class of **closed**-**loop** **optimization** problems

in which solutions are evaluated through physical experimentation rather

than simulation. The objective of this chapter is to give a more comprehensive

introduction to the field of **closed**-**loop** **optimization** and its historic development.

We begin this introduction with a more detailed description of the general concepts

of **closed**-**loop** **optimization** in Section 2.1. An overview of applications of

**closed**-**loop** **optimization** from various fields of science and engineering is provided

in Section 2.2, followed by a summary of common challenges faced by experimentalists

and algorithmic designers in these applications (Section 2.3). Finally, we

give a historical review of approaches used to cope with **closed**-**loop** **optimization**

problems. Theseinclude approachesbasedondesignofexperiments (Section2.4),

the response surface fitting (Section 2.5), and simulated evolution (Section 2.6).

2.1 Concepts of **closed**-**loop** **optimization**

Optimization is concerned with finding an answer ⃗x from a set of alternatives X

that is best with respect to some criterion f. A more technical definition of an

**optimization** problem of generic **for**m is

maximize y = f(⃗x) (2.1)

subject to ⃗x ∈ X,

where ⃗x = (x 1 ,...,x l ) is a solution vector (also referred to as solution or candidate

solution) and its l components **for** which values must be found are called

26

2.1. CONCEPTS OF CLOSED-LOOP OPTIMIZATION 27

decision variables. The variables may be integer, real, or a mixture. The feasible

**search** space X is the set of solutions over which the **search** is per**for**med. In

a standard constrained **optimization** problem, the set X is defined by a set of

equality and non-equality constraints. The static objective function (also known

as fitness function) f : X ↦→ Y represents a mapping from X into the objective

space Y ⊂ R.

In many standard **optimization** problems, the function f is available in algebraic

**for**m allowing an optimizer to evaluate solutions by simply computing f.

However, **for**some problems the mapping between thedecision variables andtheir

effects on the fitness of a solution is cost prohibitive to define or simply unknown;

in this case we also say that the function f is black box. For instance, when optimizing

combinations of drugs, the drugs interact in non-linear and complex ways

that are difficult to predict and express in terms of a model. The approach taken

to deal with such problems is to evaluate solutions by a process of physical experimentation

or alternatively expensive computer simulations (Schütz and Schwefel,

2000). This is the type of **optimization** problem we consider in this thesis, and

they are commonly referred as **closed**-**loop** or experimental problems. The term

**closed**-**loop**, first used by Box (1957), suggests here that the setup in such problems

establishes an interactive **loop** between an optimizer and the experimental

plat**for**m, potentially including a (human) experimentalist.

Figure 2.1 illustrates a **closed**-**loop** setup commonly used in experimental **optimization**;

Schwefel employed a very similar setup in his flashing nozzle problem

(see Figure 1.1). To make the description of this **loop** clearer we will distinguish

between two concepts: the genotype or structure of a solution (e.g. nozzle

shape), and the phenotype or actual solution that can be evaluated (e.g. an actual

nozzle). The genotype of a solution ⃗x is planned on a computer using an

**optimization** algorithm. Then, the process of evaluating this genotype involves

two steps: (i) realizing or prototyping the phenotype of ⃗x, and (ii) measuring the

(inherently noisy) fitness f(⃗x) of the phenotype. Both steps may require experiments

to be conducted (although a strict separation between the two steps is not

always possible). Upon obtaining the fitness value of a solution, this value is fed

back to the computer and considered in the planning of the next solution. This

**closed**-**loop** is repeated until some termination criterion is met. The real-world

nature of experimental problems means that the termination criterion may be as

simple as reaching a pre-determined fixed number of experiments conducted, but

28 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

Optimization algorithm

(running on a computer)

Physical Experimentation or

expensive computer simulations

Plan new solution ⃗x

Decision variables (genotype)

of solution ⃗x

- Ranking, selection, variation

- Prepare per**for**mance statistics

Prototype ⃗x

E.g.: - Mix drugs

- Adjust instrument

- Run simulation

Phenotype of ⃗x

Noisy measurements

of quality f(⃗x)

Measure fitness of phenotype of ⃗x

E.g.: - Try drug combination in vivo

- Run sample through instrument

- Aggregate simulated data

Figure2.1: Schematicof**closed**-**loop****optimization**. Thegenotypeofacandidatesolution

⃗x is generated on the computer but its phenotype is experimentally prototyped. The

quality or fitness f(⃗x) of a solution may be obtained experimentally too and thus may

be subject to measurement errors (noise).

time and budgetary limitations may en**for**ce premature termination too.

It is common that an experimental **loop** involves human intervention. For

example, biologists may be involved in the fitness measurement of drug cocktails,

while optimizing the configuration of an instrument may rely on engineers

**for** replacing and adjusting of instrument parts. Typically, human intervention is

needed whendealingwithlarge-scale**optimization**problems, suchas**optimization**

of industrial processes, or problems involving a complex and meticulous setup,

such as applications in biochemistry. An alternative way **for** experimenting is to

employ a fully automated **closed**-**loop** setup. Here, a host computer is connected

to all the hardware possibly needed in the experimental **loop**, allowing it to autonomously

plan, prepare and launch experiments, process their outcomes and

subsequently use them to plan new experiments if needed. 1 Whilst this is a convenient

way of experimenting, it may incur additional costs and ef**for**t **for** setting

upthe **loop**. The next section takes acloser lookatexperimental setups employed

in applications from different areas of science and engineering.

1 Other re**search**ers, such as Hornung et al. (2001), refer to a fully automated **closed**-**loop** as

a self-learning **closed**-**loop**.

2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 29

Table 2.1: Domains of **closed**-**loop** applications.

Application domain

Literature

Shape design optimizatiogether

and Schwefel (1970); Olhofer et al. (2011)

Rechenberg (1964, 1973, 2000); Schwefel (1968, 1975); Klock-

Evolvable hardware Thompson (1997, 1996a,b); Thompson et al. (1996); Thompson

and Layzell (1999)

Evolving controllers **for** Nolfi et al. (1994); Floreano and Mondada (1994); Matarić and

robots Cliff (1996); Harvey et al. (1996); Husbands and Meyer (1998);

Nolfi and Floreano (2004); Nelson et al. (2009)

Embodied evolution Ficici et al. (1999); Watson et al. (1999, 2002); Pollack et al.

(1999, 2001); Lipson and Pollack (2000); Zykov et al. (2005)

Experimental quantum Judson and Rabitz (1992); Baumert et al. (1997); Bardeen et al.

control (1997);Hornunget al. (2001);Zeidleret al. (2001);Pearsonet al.

(2001);Shir et al.(2007,2009);Shir(2008);Roslundet al.(2009)

Adaptive optics Sherman et al. (2002); Lubeigt et al. (2002); Marsh et al. (2003);

Wright et al. (2005b,a)

Fermentation optimizatiorni

(1995); Weuster-Botz and Wandrey (1995); Weuster-Botz

Rinconetal.(1993);Montserratet al.(1993);PoornaandKulka-

et al. (1995); Weuster-Botz (2000); Davies et al. (2000)

Functional genomics Zytkow et al. (1990); King et al. (2004); Byrne (2007)

Optimization of analyti-

Vaidyanathan et al. (2003); O’Hagan et al. (2005, 2007); Wallace

calinstrumentconfigura-

tions

et al. (2007); Jarvis et al. (2010)

Drug discovery Singh et al. (1996); Calzolari et al. (2008); Wong et al. (2008);

Optimizing taste and

aroma of food

Optimization of running

industrial processes

Caschera et al. (2010); Small et al. (2011)

Herdy(1997);FerrierandBlock(2001);YangandVickers(2004);

Ferreira et al. (2007); Knowles (2009); Veeramachaneni et al.

(2010)

Box (1957); Hunter and Kittrell (1966)

2.2 Applications of **closed**-**loop** **optimization**

Thissectiongivesahistoricaloverviewofapplicationsin**closed**-**loop****optimization**.

The main applications are also summarized in Table 2.1.

In the 60s and 70s, **closed**-**loop** **optimization** problems were mainly concerned

with shape design problems **for** fluid dynamics. Schwefel’s work on shape

**optimization** of a flashing nozzle, which we described in the previous chapter,

is an example of this type. Another example is the work of Rechenberg (1973)

on optimizing the shape of a pipe bend so as to obtain minimal flow losses. The

bending was held by 6 manually adjustable shafts, and an **optimization** algorithm

running on a computer provided the shaft setting (genotype) to be tested. The

shafts were then adjusted accordingly (i.e. the phenotype was constructed) by

30 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

an experimentalist. In a more modern version of the experiment, this task was

carried out by an industrial robot. Similar to Schwefel’s application, the **for**m of

a bending was assessed by injecting pressured air into one side of the pipe using a

kettle, and measuring the pressure of the air at the other side of the pipe. There

are many further examples of shape design problems; **for** detailed descriptions

please refer to (Rechenberg, 1964, 1973; Schwefel, 1975; Rechenberg, 2000).

Due to the increasing computational power and the constant development of

innovative simulation software, problems related to fluid dynamics can nowadays

often be optimized in silico. This technological progress affected the prominence

of **closed**-**loop** **optimization**. However, in the last 20 years or so, **closed**-**loop** **optimization**

regained its popularity thanks to applications in areas like evolvable

hardware, evolution of controllers **for** robots, and applications in biochemistry.

The general objective in the field of evolvable hardware (Greenwood and

Tyrrell,2006)istoimprovethesetupofanelectriccircuitusinganapproachbased

on simulated evolution. The work of Thompson and his collaborators (Thompson,

1996a,b;Thompson et al.,1996;Thompson andLayzell, 1999)wasseminal to

this field. One of his studies (Thompson, 1996a) was related to the configuration

of a FPGA consisting of 10×10 reconfigurable cells. The genotype of a particular

configuration was encoded onto a bit string of length 1800, with 8 configurable

bits allocated to each cell (the bits are related to the Boolean function a cell can

per**for**m). A computer, and on it implemented an optimizer, was connected to

the chip allowing **for** automatic reading in and updating of the current configuration

(see Figure 2.2). The experimental setup was such that no configuration

could damage the chip nor had legality constraints to be checked. The aim was

to design a configuration that is able to discriminate between two power levels of

square waves presented at the input using a tone generator by accordingly changing

the output voltage (which was visually inspected by an oscilloscope). The

quality of a configuration was measured by calculating the mismatch between

outputs observed over a random sequence of bursts of the two input waves. As

an example where such a circuit could be used, Thompson (1996a) mentions the

demodulation of frequency-modulated binary data received over a telephone line.

Later, Thompson (1997) extended this work by considering difficult implementation

constraints such as fault-tolerance. The interested reader is referred to Yao

and Higuchi (1999) **for** a general review of challenges in the interesting and still

very active field of evolvable hardware.

2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 31

Output

(to oscilloscope)

Analogue

integrator

FPGA

Configuration

Tone

generator

Figure 2.2: Experimental arrangement as used by Thompson (1996a) in the configuration

**optimization** of an FPGA chip. A computer, and on it implemented an optimizer,

was connected to the chip allowing **for** automatic reading in and updating of the current

configuration. A tone generator drove the circuit’s input, and an oscilloscope was

employed to visually inspect the voltage of the circuit’s output.

Compared to the pioneering design problems tackled by Schwefel and Rechenberg,

we notice that the experimental **loop** employed in Thompson’s FPGA

work is fully automated and works without human intervention. A fully automated

**closed**-**loop** (although sometimes supported by simulation experiments

conducted be**for**ehand) has also been adopted in the evolution of controllers

**for** robots (Nolfi et al., 1994; Floreano and Mondada, 1994; Matarić and Cliff,

1996; Harvey et al., 1996; Husbands and Meyer, 1998; Nolfi and Floreano, 2004;

Nelson et al., 2009). The general objective in this application is to improve the

structure (e.g. weights and thresholds of activation functions) of a neural network

(Bishop, 1995). The neural network is then used to trans**for**m the sensory

inputs describing the environment into motor outputs of the robot. The quality

of a controller is measured by assessing the behavior of the robot when applied

to a particular task. For example, when the task is to reach the centre of a room

from a fixed starting point, then a control system can be considered better the

smaller the average distance between the robot and the centre is. As the fitness

is expressed in terms of a robot’s behaviour, the design of an appropriate fitness

function becomes increasingly more challenging with the complexity of tasks to

be per**for**med by the robot (Harvey et al., 1996).

Some re**search**ers believe that the future in evolving controllers **for** robots

lies in a fully automated **closed**-**loop** procedure referred to as embodied evolution

(Ficici et al., 1999; Watson et al., 2002): In this procedure, a population

32 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

of physical robots autonomously evolves in the task environment by reproducing

with one another. A robot evaluates its own per**for**mance and broadcasts it along

with in**for**mation related to its controller setup to other robots in the population.

These can then decide whether they want to use this in**for**mation to update their

own controller setup. Despite the difficulty of realizing embodied evolution in

real-world (e.g. due to power and communication constraints), preliminary and

promising results have been published e.g. by Pollack et al. (1999); Watson et al.

(1999); Lipson and Pollack (2000); Pollack et al. (2001); Zykov et al. (2005).

A fully automated **closed**-**loop** that is rather inexpensive to execute is employed

in experimental quantum control. The aim in this application is to

tailor laser light fields by means of spectral amplitude and phase variables in order

to control things like the position, orientation, velocity, and quantum states

of the sample of interest (Zeidler et al., 2001; Pearson et al., 2001; Roslund et al.,

2009). The ultimate objective can be, **for** example, establishing a laser light field

that yields maximal molecular alignment (Shir et al., 2007, 2009). The convenient

property of quantum control experiments is their short duration, which is of

order of 1 second (Shir et al., 2009). This property allows **for** testing of different

**optimization** algorithms and setups to rather low cost; Figure 2.3 shows a setup

commonly employed in experimental quantum control. Additional in**for**mation

related to applications in experimental quantum control including physical background

can be found in (Judson and Rabitz, 1992; Baumert et al., 1997; Bardeen

et al., 1997; Hornung et al., 2001; Shir, 2008).

Inexpensive experiments within a fully automated **closed**-**loop** setup can also

be conducted in adaptive optics. The general aim in this field is to design microscopytechniquesthatachieveopticallysectionedimageswithhighresolutionof

thebiologicalsampleunderinvestigation. However, whenimagingatdepthwithin

a biological sample, the resolution may suffer due to optical aberrations. Active

optical elements, such as de**for**mable mirrors, aim at counteracting this effect by

altering the wavefront of the light source. Using a fully automated **closed**-**loop**

approach, Wright et al. (2005a) showed that the resolution, or, more precisely,

the brightness of the image of a sample can be optimized by appropriately configuring

the shape of the de**for**mable mirrors. In this particular application, the

shape of the mirrors was controlled by 37 electrostatic actuators resulting in a

continuous 37-dimensional **search** space. To test a mirror shape, a test sample

was used instead of a real one in order to allow **for** accurate controlling of the

2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 33

Figure 2.3: Experimental arrangement of an application in quantum control. A dynamic

pulse shaper is used to generate modulated laser pulses. Experimental feedback

signals are then collected from the liquid-phase emission spectroscopy, the Cs gas cell,

and filtered optically and detected by the amplifier. The computer is used **for** reading

in the processed signals and **for** updating the modulator. Figure taken from Meshulach

and Silberberg (1998).

degree of aberration. Wright et al. report that an entire **optimization** run takes

on average between 1 and 12 minutes, depending on the algorithm. Re**search**ers

in adaptive optics relying on a similar **closed**-**loop** setup include e.g. Sherman

et al. (2002); Lubeigt et al. (2002); Marsh et al. (2003); Wright et al. (2005b).

When dealing with large-scale applications where experimentation involves

complicated and/or extensive preparation and testing steps, it can be difficult

and costly to set up a fully automated **closed**-**loop**. Applications in fermentation

**optimization** as considered by Davies et al. (2000) belong to this class of

**closed**-**loop** problems. The goal in the work of Davies et al. was to design silage

additive combinations that yield herbage with high nutritional values and good

storage properties. Be**for**e the herbage could be analyzed, it had to be grown **for**

4 weeks, cut, additively treated and packed at the start of the ensilage process,

and after 2 days again unpacked. The evaluation of an entire generation, which

was limited by logistical constraints to 50 additive treatments, took two weeks

34 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

(as two plots of grass were available); and, the number of generations was limited

due to the constraints of herbage production over the normal growth season.

To account **for** the apparent risk of measurement errors coming from biological,

experimental and machine variability, each generation included replicates of control

silage (the same herbage but without any additives) and replicate testing of

silage additive combinations was applied too. An interesting aspect of the work

is that costs were associated with additive concentrations in order to discourage

the **search** **for** silages containing additives of high concentrations (which would

be difficult to realize on a farm scale). Similar **closed**-**loop** approaches to fermentation

**optimization** have been considered by Weuster-Botz and Wandrey (1995);

Weuster-Botz et al. (1995); Weuster-Botz (2000); while these studies use **optimization**

algorithms based on simulated evaluation, the work of Rincon et al.

(1993); Montserrat et al. (1993); Poorna and Kulkarni (1995) employs response

surface methods (see Section 2.5).

Closed-**loop** **optimization** is often the only way of experimenting in biochemistry

applications, such as combinatorial drug discovery and **optimization** of analytical

instrument configurations. A seminal **closed**-**loop** study in this field includes

the robot scientist work of King et al. (2004). In this work a system has

been developed (see Figure 2.4) to autonomously solve an inference problem

in the field of functional genomics by a process of originating hypotheses to

explain experimental data, devising and running experiments to test these hypotheses

using a laboratory robot, interpreting the results to falsify hypotheses

inconsistent with the data, and originating new hypotheses. A fully automated

**closed**-**loop** **for** tackling inference problems has already been employed by Zytkow

et al. (1990), though on a limited scale. In addition to achieving a high inference

accuracy, the robot scientist accounts also **for** variable experimental costs. This

gives risetotheproblemofhowtoselect anin**for**mative batchofexperiments that

can be tested with little expense; Byrne (2007) has opted **for** a multi-objective

**optimization** approach to tackle this problem.

Modern analytical instruments such as mass spectrometers involve a large

number of configuration parameters, and testing them systematically in order to

optimizesomeobjectiveisoftenimpossiblebecauseofthelargenumberofpossible

combinations. Consider the instrumentation **optimization** work of O’Hagan

et al. (2005, 2007) aimed at detecting as many metabolites in a biological sample

as possible using mass spectrometry. This application involved 15 integer

2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 35

Background

knowledge

Machine

learning

Analysis

Consistent

hypotheses

Experiment(s)

Final

hypothesis

Experiment

selection

Robot

Results

Figure 2.4: Experimental arrangement as used in the robot scientist study of Kinget al.

(2004). An inference problem is solved using a computer-controlled **closed**-**loop** setup:

hypotheses are originated to explain experimental data; experiments are devised and

executed by a robot to investigate these hypotheses; results are interpreted to falsify

hypotheses inconsistent with the data, and new hypotheses are originated.

variables or instrument parameters that made up a **search** space of 7.32 × 10 9

different instrument configurations. A typical **closed**-**loop** cycle in analytical experimentation

involves to set up the instrument (phenotype) according to some

configuration (genotype), and then to evaluate the quality of the configuration

by running a biological sample through the instrument and measuring properties

of the instrument’s output signal, such as the number of peaks detected,

signal-to-noise ratio of the peaks, and run time. O’Hagan et al. (2005, 2007) fully

automated this cycle, while the studies presented by Vaidyanathan et al. (2003);

Wallace et al. (2007); Jarvis et al. (2010) involved human intervention, with the

human being involved in tasks like changing instrumentation settings.

Combinations of pharmaceuticals (also referred to as drugs, reagents, or compounds)interactincomplexwaysthattypicallycannotbepredicted.

Hence, when

optimizing combinations of drugs and/or concentrations of these drugs according

to some criterion, such as the expression of pro-inflammatory cytokines (Small

et al., 2011), an experimental approach needs to be adopted. In a common drug

discovery application, a human experimentalist arranges a library of promising

or relevant drugs to be considered by an optimizer running on a computer.

The library size may range from 10 drugs (Wong et al., 2008) to a few dozen

drugs (Small et al., 2011), and sometimes be as large as 100 drugs (Calzolari

et al., 2008). Drug mixtures (genotypes) devised by the optimizer are then mixed

together (phenotyped) commonly using a laboratory robot, and their potency is

36 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

measured using analytical tests. However, this process may involve several complex

and manually per**for**med steps related to the storage and maintenance of

individual drugs, setup of analytical tests, and the actual testing. A glance at

the method section of the above cited studies including the work of Singh et al.

(1996); Caschera et al. (2010) will confirm the sophistication of experimental

setups in drug discovery applications.

Recent studies indicate that **closed**-**loop** **optimization** has also been used by

the food industry **for** improving taste and aroma of food. Knowles (2009)

reports on the **optimization** of a cocoa roasting process employed by a chocolate

manufacturer. The precise objective in this application is to optimize the design

of a roast curve so as to create cocoa of a specific desired flavour. A genotype

defines the properties of a roast curve including the temperature with which

cocoa nibs (the center of the beans) are roasted, and the placement and duration

of water injections during the roasting. Upon physically realizing a roast curve,

the quality of the resulting cocoa is subjectively assessed by a human taster.

Obvious challenges in this application are the variability in the quality of the

raw cocoa nibs, and the subjective fitness assessment, which greatly depends on

the experience of the human taster. Un**for**tunately, the **optimization** of taste and

aroma is often a business secret, limiting the number of studies publicly available.

Nevertheless, we have found applications related to the **closed**-**loop** **optimization**

of blends of coffee (Herdy, 1997), the blending of wine (Ferrier and Block, 2001),

and the taste of cheddar (Yang and Vickers, 2004). More accessible is re**search**

related to the **optimization** of procedures **for** the isolation of taste in a food

sample (Ferreira et al., 2007). Of interest is the recent study of Veeramachaneni

et al. (2010), which mentions the application of **closed**-**loop** **optimization** in the

design of food flavors. More precisely, a small set of mixtures of ingredients are

subjectively evaluated by a panel of consumers using a **closed**-**loop** approach. A

model (surrogate) of the consumer data is then generated and used to optimally

determine a set of flavors liked by some target group of panellists.

Further and perhaps more isolated applications of **closed**-**loop** **optimization**

can be found, **for** example, in the field of combinatorial chemistry (Schlögl,

1998; Gobin et al., 2007; Wolf et al., 2000), ceramic ink-jet printing (Evans

et al., 2001), combustion process **optimization** (Bücheet al., 2002;Paschereit

et al., 2003; Hansen et al., 2009), directed evolution (Arnold, 1998; Knight

et al., 2009; Platt et al., 2009), and deep brain stimulation therapy (Feng

2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 37

Responses

In**for**mation board

Plant

EVOP committee

Plant configuration

Plant manager

Figure 2.5: Schematic of the **closed** **loop** as employed with evolution operation

(EVOP)(Box, 1957) tooptimizearunningmanufacturingprocess. TheEVOPcommittee

contains specialists on EVOP that serve as advisors but the ultimate responsibility

**for** running the scheme rests with the plant manager. She sets up the plant configuration

to be tested. The resulting responses are then used to update the in**for**mation

board on which future decisions are made.

et al., 2007). 2 Detailed case studies of selected applications in instrument **optimization**,

drug discovery, directed evolution, and cocoa roasting have also been

presented by Knowles (2009).

The above **closed**-**loop** applications have in common that the **search** **for** an

optimal solution is terminated after some fixed period of time (e.g. due to time

and/or budgetary limitations) or once a solution of high quality has been found.

Typically, the aim is then to manufacture the solution found (e.g. a drug combination,

nozzle, software, electric circuit, etc.) many times and sell it to the public

or other companies; perhaps improving it further at some future point in time

when the quality of raw material and technological equipment has advanced.

However, **closed**-**loop** setups have also been adopted in the continuous improvement

of a running industrial process, similar to the cocoa roasting process

mentioned above but on a larger scale. The goal here is to optimize some cumulative

per**for**mance of the process over an infinite or finite time horizon, rather than

finding a single optimal (and ultimate) solution, and this needs to be done in a

controlled and restricted fashion to reduce the risk of producing off-quality material.

To achieve this goal, Box (1957) suggested an approach termed evolution

operation (EVOP) (see Figure 2.5). The objective of EVOP is to continuously

improve a running industrial process by applying small changes (mutations) to

2 Although the work of Feng et al. (2007) describes a **closed**-**loop** **optimization** framework

(which the authors hope to realize in future) **for** achieving deep brain stimulation therapy of

Parkinsonian motor symptoms, **for** validation purposes a computational model is used.

38 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

it. Rather than taking random mutation steps, EVOP is guided by a committee

including human experimentalists and specialists, such as statisticians and

experts on EVOP. Since its introduction in 1957, EVOP has been applied in numerous

sectors including chemical manufacturers, automotive industry, natural

gas producers, food industry, and canning industry. Thanks to EVOP, savings

were made, **for** example, due to reduced usage of raw material, reduced time cycles

in batch operations, and equipment modifications. In addition to savings and

the resulting profits, EVOP contributed also to the understanding of certain processes,

and pointed engineers to the importance of specific control variables. An

extensive review of EVOP applications and benefits of EVOP has been published

by Hunter and Kittrell (1966).

Nowadays, the movement of continuously improving a process including products

and services with the aim to reduce waste and time is referred to as lean

manufacturing or lean management (Shah and Ward, 2003). Tools **for** realizing

lean manufacturing include EVOP but also techniques such as just-in-time production

(Sugimori et al., 1977), design of experiments (Montgomery, 1976) and

Taguchi methods (Taguchi, 1989; Peace, 1993).

2.3 Challenges of **closed**-**loop** **optimization**

From the description of some of the applications listed above it is apparent that

**closed**-**loop** problems feature challenges that may not be prevalent in standard

problems tackled by computer simulation. We will briefly recall the main challenges

here; **for** a more detailed description please refer to Knowles (2009).

Arguably the most obvious challenge of **closed**-**loop** **optimization** is that evaluations

of solutions are generally expensive. This concerns both the time and

costs needed to conduct experiments. In the presence of time and/or budgetary

limitations this factor can restrict the number of evaluations available to several

tens or hundreds only.

Experiments may be of different durations and associated with non-homogeneous

experimental costs. This scenario has been encountered e.g. in the robot

scientist work (King et al., 2004) and in fermentation **optimization** (Davies et al.,

2000). In the presence of a limit budget, this challenge can cause an optimizer to

account **for** both fitness gradients and differential costs of evaluations.

Designing a suitable fitness function may be difficult. This challenge exists

2.4. DESIGN OF EXPERIMENTS 39

in both computational and physical experimentation, but the complexity of real

world systems means often that fitness function design is more delicate and laborious

in practice (Mondada and Floreano, 1995; Matarić and Cliff, 1996; Knowles,

2009). As an example, consider the design of a fitness function that evaluates

the behavior of a robot in achieving a certain task. The difficulty of designing a

suitable fitness function tends to increase with the complexity of the task to be

fulfilled (Harvey et al., 1996).

Experiments are inherently noisy, and may be subject to uncertainty and uncontrolled

factors. Noise is a challenge that is much studied, and several methods

have been proposed to account **for** it. In Section 3.2 we will introduce some of

these methods in the context of **evolutionary** **optimization**.

Experimental setup and environmental factors may dictate settings of the **optimization**

algorithm. Recall that this situation was present, **for** instance, in

the fermentation **optimization** work of Davies et al. (2000). A population-based

optimizer was employed in this application. Its population size was limited by

logistical constraints to 50 additive treatments, and the number of generations

the optimizer could be employed **for** was also limited due to the constraints of

herbage production over the normal growth season. In biochemistry applications,

such as in drug discovery (Small et al., 2011), the use of standard laboratorytools

like microwell plates may dictate the population size too.

Experimentation may be subject to encoding issues and constraints. It is common

**for** **closed**-**loop** problems to have an encoding with components being both

discrete (or binary) and real values. Constraints may restrict the values certain

variables may take but also complicate the **optimization** process in other ways

(as investigated in this thesis in Chapters 4 to 6). For example, in the evolution

of robot controllers, the lifetime of the robot battery and the robot itself may

slow down the **optimization** procedure due to recharging, maintenance and repair

duties (Matarić and Cliff, 1996). Constraints may also be challenging to identify

and define as reported by Knowles (2009).

2.4 Design of experiments

Traditionally, **closed**-**loop** problems have been dealt with using methods from the

design of experiments (DoE) (also known as experimental design (ED)) (Montgomery,

1976;Masonetal., 1989;Boxet al.,2005)discipline. These arestatistical

40 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

x 2

x 3

x 1

Figure 2.6: Plot showing factor combinations to be tested with a Box-Behnken experimental

design **for** a system with l = 3 factors with each factor having three levels.

The factor combinations are specifically chosen to allow the efficient estimation of the

coefficients of a quadratic model as defined in Equation (2.2).

methods traditionally concerned with explaining a system or process in terms of

a model using as few in**for**mative experiments as possible. 3 This model can then

be used to study different properties of a system, such as the most important

factors of the system or the per**for**mance of the system in the presence of noise.

Figure 2.6 shows an example of a particular experimental design. For exhaustive

reviews of experimental designs, including conditions when their application

is recommended, and instructions to construct them, please refer to the books

(ibid.).

The three basic principles employed by many DoE techniques are replication,

randomization, and blocking (also known as local control) (Preece, 1990; Box

et al., 2005). Replication (i.e. repeating of the entire experiment or a portion of

it under two or more sets of conditions) is a way **for** estimating and accounting **for**

noise, while randomization and blocking are used to deal with variabilities caused

by unavoidable sources, or nuisance factors (e.g. ambient temperature or the time

of the day the experiment was conducted). Randomization refers to the process

by which factor combinations **for** testing are allocated to individual experiments

in a random or unbiased approach. Blocking is the process by which the total set

of trials is partitioned into subsets (blocks) that are as homogeneous as possible.

TheDoEdiscipline hasalsoproduced methods**for**seeking optimal factorcombinations:

factorial, univariate, steepest-ascent, and random methods (Brooks,

1958, 1959). The univariate and steepest ascent method are sequential procedures,

while the other two methods generate an entire set of trials **for** testing

3 The terminology in the field of DoE differs slightly from the one in **optimization**. Most

notably, decision variables are referred to as factors, their values as levels, fitness as response,

and an experiment testing a particular factor combination (candidate solution) is called a trial.

2.5. RESPONSE SURFACE METHODS 41

be**for**ehand. The random method tests factor combinations selected at random

in the factor space. Compared to factorial and univariate methods, which test

levels of all factors in some systematic approach, random methods can be applied

to systems involving a larger number of factors. Compared to steepest-ascent

methods, random methods are also less likely to find only local optimal factor

combinations. In any of these methods, there are two options by which an optimal

factor combination can be estimated. The first option is to simply select

the factor combination with the highest response from the set of all trials made.

The other option is to construct a response surface by fitting a polynomial surface

to the responses of the trials made, and then estimating the optimal factor

combination by finding the highest point of this surface. We will describe this

methodology in more detail in the next section.

2.5 Response surface methods

Traditionally, a response surface is fitted using polynomial models (in the factors)

**for** the responses y (Box and Wilson, 1951). As an example, consider the

quadratic model

y = b 0 +

l∑

b i x i +

i=1

∑l−1

l∑

i=1 j=i+1

b ij x i x j +

l∑

b ii x 2 i +ǫ . (2.2)

i=1

This model is expressed in terms of main effects of factor levels (x 1 ,...,x l ), their

interactions or joint effects with other factor levels (x 1 x 2 ,...,x l−1 x l ), and their

quadratic components (x 2 1 ,...,x2 l ). To account **for** noise and experimental errors,

a random error ǫ (assumed to be uncorrelated and distributed with mean 0 and

constant variance) may beconsidered inthe model aswell. The normal procedure

is to fit a surface to the available data using the least square method; Figure 2.7

shows an example of a response surface. Following this, the validity of the fitted

surface may be investigated using an analysis of variance (ANOVA), whereby the

observed variability of the responses (measured by sums of squared deviations

from mean response) is partitioned into independent, or orthogonal, components

from identifiable sources. A comprehensive introduction to least squares, and

the application of ANOVA in the context of response surfaces can be found in

Chapter 10 of (Box et al., 2005).

Response surface methods (RSMs) employing the above described approach

42 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

x 2

5

0

-0.5

4.5

-1

4

3.5

-2

-2.5

-1.5

1.5

1

2

3

2

1

0

3

-1

2.5

-2

2

0.5 -3

-5 -4 -3 -2 -1 0

x 1

Figure 2.7: A response surface plot **for** a system with l = 2 factors. The surface is

fitted to the nine data points representing different experiments.

generate a response surface that tends to be non-interpolating because they minimize

only the sum of squared deviations from some pre-determined (polynomial)

function. Such a surface may be misleading **for** seeking maxima as it may not

sufficiently approximate the shape of the actual underlying model (Jones, 2001).

This drawback encouraged many re**search**ers to develop RSMs that approximate

the actual model more accurately, ideally, also in fewer trials. Modern RSMs

tend to generate a surface that interpolates the observed data points using a linear

combination of ‘basis functions’ (one ‘basis function’ term is centred around

one data point), such as thin-plate splines and kriging (Matheron, 1963; Sacks

et al., 1989).

RSMs can be further classified into two-stage and one-stage (or sequential)

DoE methods (Robbins, 1952b). Two-stage approaches fit a surface to observed

data in stage one, estimating any required parameters (related to the surface). In

the second stage, these parameters are assumed to be correct and the surface is

used to select the next data point **for** evaluation. The drawback of this approach

is that the **search** can be misguided if the estimated surface from stage one is

incorrect. Sequential DoE methods aim at overcoming this issue in that they

skip the first stage of fitting a surface to observed data. Instead, hypotheses

about the location of the optimum (its variables ⃗x and fitness f(⃗x)) are made

and evaluated by determining properties of the best-fitting response surface that

passes through the observed data and the point (⃗x,f(⃗x)) (Jones, 2001). However,

this approach comes at a cost of high computational demands, particularly when

kriging is used (because a matrix of estimated correlations between responses

needs to be inverted). For a more comprehensive review of one- and two-stage

2.6. SIMULATED EVOLUTION IN CLOSED-LOOP OPTIMIZATION 43

RSMs please refer to Jones (2001); Myers et al. (2009).

An alternative approach to RSMs are methods based on simulated evolution,

which we will discuss in the next section.

2.6 Simulatedevolutionin**closed**-**loop****optimization**

Proposals to consider ideas from nature, in particular, the concept of natural selection

and variation, **for** **closed**-**loop** **optimization** were already made in the late

1950s. Brooks (1958) proposed a creeping random method that extends the DoEbased

random method introduced elsewhere (see Section 2.4) with the concept

of natural selection. The experimental design technique of Box (1957), EVOP

(see Section 2.2), employed the concepts of natural selection and variation too,

although the variation steps are proposed by some committee rather than being

random. Remember, however, that the objective of EVOP is to optimize a cumulative

score over an infinite or finite time horizon rather than to find a single

optimal (and ultimate) solution.

The work of Bremermann (1962); Fraser (1957); Friedberg (1958); Friedberg

et al. (1959) and others has further popularized approaches based on simulated

evolution and initiated the field of **evolutionary** computation. In particular,

this pioneering work influenced the development of the three related approaches,

genetic algorithms (GAs) (Holland, 1975; De Jong, 1975), evolution

strategies (ESs) (Rechenberg, 1973; Schwefel, 1975), and **evolutionary** programming

(EP) (Fogel, 1962, 1964), nowadays commonly referred to as **evolutionary**

algorithms (EAs). All three approaches are population-based **search** techniques

that generate solutions based on a process of simulated evolution. For a more

comprehensive historical overview of the development of EAs please refer to (Fogel,

1994; Bäck et al., 1997; Fogel, 1998).

WhilstcertainDoEmethods, suchasthecreeping randommethodandEVOP,

and EAs may share similar working principles, there are differences in the objectives

pursued by the two algorithm types: DoE is typically applied to lowdimensional

**search** spaces with the aim of obtaining statistically robust results

using as few experiments as possible. EAs, on the other hand, are approaches

44 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

commonly used **for** finding a single optimal solution, using typically many evaluation

steps, to problems featuring high-dimensional **search** spaces e.g. combinatorial

**optimization** problems such as the TSP. Closed-**loop** **optimization** problems

often fall in the intersection of the two scenarios and thus require ideas from

both areas. In particular, we believe that **closed**-**loop** evolution methods should

draw on DoE in areas like noise-handling, replication, and blocking (e.g. the pipe

bending setup of Rechenberg (2000) used blocking). To reduce the number of

expensive evaluations, ideas from sequential DoE and RSMs can also be incorporated

into **evolutionary** **search**, where they are known variously as surrogate

models, meta-models or fitness approximation methods; this step has already

been realized in recent years (see e.g. Jin et al. (2002); Ong et al. (2004); Jin

(2005) **for** surveys on this subject). To our knowledge, however, the DoE field has

not so far considered resourcing issues as a perturbing influence on conducting

the most in**for**mative experiments, as we do here.

There are also several aspects that make EAs suitable candidates **for** **closed****loop**

**optimization**: Evolutionary **search** relies solely on the fitness of previously

evaluated solutions and not on in**for**mation related to the problem instance being

solved, making EAs suitable **for** problems featuring a black-box function, such as

**closed**-**loop** problems. In fact, evolution does not even need to be given specific

fitness values but **search**ing **for** behavioral novelty may be sufficient to solve a

problem (Lehman and Stanley, 2011). EAs can also be hybridized easily if prior

knowledgeabouttheproblemathand, gradientin**for**mation, orothermethodsare

available (Blum and Roli, 2003). The ability to employ a population of solutions

allows EAs to conveniently adjust the population size to meet a given experimental

setup, and, moreover, to copewithproblems potentiallyfeaturing multiple objectives

(Deb, 2001), complex (non-linear and dynamic) constraints (Michalewicz

and Schoenauer, 1996; Coello, 2002; Nguyen and Yao, 2009a), and a dynamically

changing fitness landscape (Branke, 2001).

2.7 Chapter summary

In this chapter we have reviewed the field of **closed**-**loop** **optimization**, which is

characterized by the property that solutions are evaluated by conducting experiments

(asopposedtoevaluating a**closed**-**for**mmathematical expression). Wefirst

introduced the general concepts of **closed**-**loop** **optimization**. Then, we reviewed a

2.7. CHAPTER SUMMARY 45

variety of **closed**-**loop** applications across the sciences and engineering, and summarized

the main challenges of **closed**-**loop** **optimization**. Finally, we provided an

overview oftechniques employed in**closed**-**loop****optimization**including approaches

based on design of experiments, response surfaces and simulated evolution. We

favour theapplicationofsimulatedevaluationapproachesand, inparticular, EAs,

to **closed**-**loop** problems and focus on them in this thesis. The reasons **for** this

choice have been stated in the previous section. The next chapter will introduce

EAs in more detail, and review techniques employed within EAs **for** coping with

many of the challenges common to **closed**-**loop** **optimization**, such as noise, constraints,

and dynamically changing environments. These challenges are related

to the resourcing issues considered in this thesis, and understanding how they are

dealt with is important and beneficial **for** the development of our own strategies.

Chapter 3

Evolutionary Optimization

In the previous chapter we introduced the field of **closed**-**loop** **optimization** and

motivatedthechoice **for**using **evolutionary** algorithms(EAs) asoptimizers inthis

domain. The next section introduces some of the basic principles of **evolutionary**

**search** andreviews variousapproaches**for****tuning**andcontrollingparametersofan

EA. The application domain of EAs includes several problem categories related

to **closed**-**loop** problems and the particular resourcing issues (ephemeral resource

constraints, changingvariables, andlethalenvironments) consideredinthisthesis:

the main categories are noisy, constrained, dynamic, and online **optimization**

problems. Section 3.2 to 3.5 review how EAs have been tuned to tackle these

problem categories (these sections will also highlight the relationship between the

resourcingissuesconsideredinthisthesisandthedifferentproblemcategories, but

**for** a more detailed description of the resourcing issues themselves please refer to

the subsequent chapters). Tuning of EAs has also been realized with techniques

from machine learning, such as rein**for**cement learning techniques. Section 3.6

gives a brief introduction to the RL field, and reviews methods **for** extending

EAs with RL to improve per**for**mance.

3.1 Evolutionary Algorithms

From Section 2.6 we know that **evolutionary** algorithms (EAs) (Fogel, 1962;

Rechenberg, 1973;Schwefel, 1975;Holland, 1975;Goldberg,1989)arepopulationbased

techniques that tackle **optimization** problems based on models of biological

evolution. We can think of biological evolution as a process where fit individuals

of a population are passed on from one generation to the next upon undergoing

46

3.1. EVOLUTIONARY ALGORITHMS 47

Table 3.1: Nomenclature used in **evolutionary** algorithms

Term

Meaning

Individual Solution

Population Set of individuals

Fitness In the context of EAs, the term fitness is used to refer

an individual’s quality, while in biology it refers to the

reproductive success of an individual in passing its

genetic in**for**mation to the next generation

Objective function Function that computes the fitness of an individual

Chromosome/Genome Encoded individual; often represented as a string of

parameters

Gene Basic unit of an individual; in **optimization**, a gene

can be seen as a decision variable

Allele Value of a gene

Locus Position of a gene in a string; this term is often

used interchangeably with the term gene

Genotype Genetic structure that represents an individual

Phenotype Individual that can be evaluated

Crossover/Recombination Process by which two or more individuals exchange

some of their genetic in**for**mation; alternatively, the term(s)

refer to the operator that per**for**ms this process

Mutation Process by which an individual’s genetic in**for**mation

undergoes random changes; alternatively, the term refers

to the operator that per**for**ms this process

Variation Crossover and/or mutation

Parent Individual used to generate new individuals (offspring)

Offspring/Child Individual resulting after applying crossover

and/or mutation to parents

(Parental) selection Process by which parents of the current population are

selected **for** crossover and/or mutation

Generation One iteration through parental selection, variation and

environmental selection, to arrive at an updated population

Environmental selection Process by which individuals from the combined pool of

offspring and/or current population are selected to **for**m

the new population

variation in **for**m of mutation and/or crossover. Due to the relationship to biology,

many biological terms have been adopted to describe EAs; these terms and

their meanings are summarized in Table 3.1.

Let us describe the working scheme of a basic EA (see Algorithm 3.1) in more

detail; modifications of this scheme will be introduced in the remainder of this

section. Thisthesisconsiders EAsfeaturingaverysimilarstructuretothatshown

in Algorithm3.1. Inthe algorithm, µ denotes the size of the current populationof

solutions Pop, also referred to as parent population. Upon initializing Pop with,

48 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.1 A basic generational **evolutionary** algorithm

Require: µ (parent population size), λ (offspring population size), f (objective function)

1: g = 0 (generation counter), Pop = ∅ (parent population), OffPop = ∅ (offspring population)

2: initialize Pop

3: evaluate all solutions in Pop using f

4: while termination condition not satisfied do

5: repeat

6: select parents from Pop

7: generate offspring by applying variation operators to selected parents

8: evaluate offspring using f and add them to OffPop

9: until λ offspring are created

10: **for**m new population Pop by selecting solutions from Pop and OffPop

11: g++

12: return best solution of Pop

**for** example, µ randomly generated individuals, an offspring population OffPop

of size λ is incrementally generated. This is done by first selecting parents **for**

reproduction from Pop, and then applying variation operators to them in order

to produce offspring solutions; commonly used variation operators are crossover

and/or mutation. Be**for**e offspring individuals are added to OffPop, they are evaluated

using the objective function f. Once the offspring population is complete,

a new population Pop is **for**med by selecting fit solutions from the combined pool

of solutions from Pop and OffPop. This step is referred to as environmental selection,

and may involve selecting the fittest µ solutions from Pop and OffPop. This

variant is referred to as (µ+λ) reproduction, and the special case where λ = 1

is also referred to as steady-state reproduction. The concept where the fittest

individuals found so far are allowed to survive to the next generation is known

as elitism (De Jong, 1975). Alternatively, if λ ≥ µ, a new population Pop can be

**for**med by selecting the fittest λ offspring individuals from OffPop; this scheme is

referredtoas(µ,λ)reproduction, andthespecial casewhereλ = µisalsoreferred

to as generational reproduction. The notations (µ+λ) and (µ,λ) were originally

used within evolution strategies (ESs) (Rechenberg, 1973; Schwefel, 1975).

ThevarietyofEAsisverylarge. Inadditiontodifferentreproductionschemes,

there are different schemes **for** per**for**ming parental selection, recombination, and

mutation. We will give examples of common EA setups. Two commonly used

selectionoperatorsarefitnessproportionalselection(alsoknownasroulette-wheel

selection) and tournament selection. The **for**mer operator selects a parent with a

probability that is proportional to its fitness. The latter operator first chooses a

set of TS solutions (TS is referred to as the tournament size) at random from the

3.1. EVOLUTIONARY ALGORITHMS 49

One-point crossover

Bit flip mutation

Parent 1

1 0 1 1 0 1

Parent 2 0 0 0 1 1 0

Crossover point e = 2

Offspring 1

1

0 0 1 1 0

1

1

0 1 1 0

Offspring 2

0 0 1 1 0 1 0 0 0 1 1 1

Figure 3.1: Visualization of one-point crossover (left) and bit flip mutation (right).

One-point crossover selects a single crossover point e (here e = 2) at random, and

then swaps all genes beyond that point in either parent’s chromosome between the two

parents. Bit flip mutation flips each gene in an offspring’s chromosome independently

with some probability p m .

current population, compares these solutions, and then selects the best one as a

parent. Note, in a standard EA, this selection step is carried out twice to obtain

two parents. Typically, with probability p c the two parents are then recombined

by, **for** example, swapping each of their genes with a probability of 0.5; this

**for**m of crossover is known as uni**for**m crossover (Syswerda, 1989). Alternatively,

parents can be recombined using one-point crossover, in which case a random

crossover point e ∈ {1,...,l − 1} is chosen and all genes of the chromosome

after that point are swapped; analogously, one can define a two-point or, more

generally, a multi-point crossover operator. The idea of mutation is to introduce

new genetic material into the population by altering an individual’s chromosome

in some random fashion. Commonly used mutation operators are ones that flip

eachgeneinachromosome independently withsomeprobabilityp m , orflipafixed

number of randomly selected genes; Figure 3.1 visualizes the interplay between

(one-point) crossover and (bit flip) mutation. Different variation operators may

be applied depending on the encoding of solutions. For a more comprehensive

introduction to EAs including an overview of different selection and variation

operators please refer to e.g. Bäck (1996); Bäck et al. (1997); Eiben and Smith

(2003); Reeves and Rowe (2003).

3.1.1 Analyzing and explaining **evolutionary** algorithms

Intheabove, wehave takenaratheroperationalviewtointroduceEAs. However,

we did not answer the question of how EAs work under the hood or how new, on

50 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

average, fitter individuals are evolved. There are several alternative viewpoints

on this subject and we introduce some of them in the following; **for** a more

comprehensive overview please refer to e.g. Reeves and Rowe (2003).

The schema theorem of Holland (1975) was the first attempt to explain EAs,

in particular genetic algorithms (GAs), from a theoretical point of view. This

theorem is based on the concept of schemata, which are subsets of solutions

that share some common properties. For instance, consider solution vectors to be

binary strings of length l = 5. Then, the schema H = (∗1 ∗ ∗0) describes the

set of all solutions with value x 2 = 1 at gene 2 and value x 5 = 0 at gene 5; the

∗ is a wildcard symbol, meaning that a gene can have any possible value (thus

in the binary case either value 0 or 1). The two properties of a schema are its

order o(H) and its length l(H), representing the number of defined genes and

the distance between the first and last defined gene position, respectively (Reeves

and Rowe, 2003). Thus, **for** the above example we have o(H) = 2 and l(H) = 3.

The schema theorem relies on the assumption of implicit (or intrinsic) parallelism

—theideathatin**for**mationonmanyschematacanbeprocessedinparallel.

Based on this assumption, Holland argues that there are two competing **for**ces in

the GA: the **for**ce of selection increases the representation of fit schemata in line

with their fitness, while the destructive **for**ce of crossover and mutation tends to

break schemata up, in particular schemata of long and high order. Technically,

we can define the theorem as

{ }

l(H)

E[N(H,t+1)] ≥ 1−p c

l −1 −p mo(H) r(H,t)N(H,t), (3.1)

where E[.] is the expectation, N(H,t) is the number of instances of schema H

at time t, p c and p m are the probabilities of applying crossover and mutation, respectively,

and r(H,t) the fitness ratio expressing the average fitness of schema H

relative to the average fitness of the population at time t. The theorem says that

the expected number of instances of a schema H at time t+1 is greater than at

time t if the schema is fitter than average, and its order and length are short. The

third idea related to the theorem is that the surviving schemata, which tend to be

fit schemata of short and low order, are recombined into larger schemata (building

blocks) of potentially higher fitness; Goldberg (1989) calls this idea the building

block hypothesis. Although Holland’s schema theorem has been sometimes criticized,

often because it was overinterpreted (Radcliffe, 1997), fundamental ideas

3.1. EVOLUTIONARY ALGORITHMS 51

Figure 3.2: The simplex represents the space of populations containing three individuals.

The flow indicates the movement of populations under the effect of selection using

a dynamical systems approach. The trajectory represents the movement of a single

population. Figure taken from Reeves and Rowe (2003).

have been proved to be valid. Some clarification of the schema theorem has been

given by the work of Altenberg (1995). He uses Price’s theorem (Price, 1970) to

show that macroscopic properties of the population, such as the population average

fitness, can be derived by accounting **for** properties related to the variation

and selection operators employed, and the encoding of solutions.

An alternative perspective on EAs is to view transitions between populations

as a stochastic process (Nix and Vose, 1992; Davis and Principe, 1993). In the

simplest case, the population at time t + 1 depends only on the population at

time t. This kind of stochastic process is referred to as Markov process, and the

sequence of population transitions **for**ms a Markov chain. With this approach,

a transition matrix needs to be calculated to model the likelihood of an EA

moving from a population at time t to any other possible population at time

t+1 after applying the stochastic effects of selection, crossover and/or mutation.

Markov chains are a convenient tool **for** analyzing EAs, which we will consider in

Section 4.3 to investigate the impact of ephemeral resource constraints (ERCs)

on **evolutionary** **search**; this section will also give a more detailed introduction to

Markov chains.

A perspective on EAs related to Markov chains is to model EAs in terms

of a dynamical system (Whitley, 1993; Vose, 1995, 1999). In essence, dynamical

systems model the population as a state (expressed in terms of a probability

vector) whose trajectory through the state space (see Figure 3.2) evolves according

to deterministic, mathematical equations (defined by the effects of selection,

crossover and mutation). The trajectory of the population is exact in the infinite

population limit. However, as the number of equations to be solved depends on

52 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

the population size µ, in practice, this approach becomes impractical **for** large

µ. Techniques that approximate the working of an EA in terms of large-scale

or average properties of the system modelled, such as a statistical mechanics approach

(Prügel-Bennett and Shapiro, 1994; Rattray, 1996), can help circumventing

this complexity problem. Although, strictly speaking, this approach yields

approximate results only (Reeves and Rowe, 2003), predictions of EA dynamics

tend to be accurate, and this is also true **for** complex problems (Shapiro, 2001).

Variousotherperspective ontheworking principles ofEAscanbefoundinthe

literature. For instance, underlying working principles of an EA can be viewed

from an experimental design perspective (Reeves and Wright, 1995, 1996). The

perspective taken by Mühlenbein (1991) is that the schema theorem cannot be

used to explain the **search** strategy of a GA. Instead, he emphasizes the importanceofconstraining

selectionandmutationbyaspatial population structure, and

allow some or all solutions in the population to improve their fitness by local hillclimbing

(Mühlenbein, 1992); these concepts have been realized by Mühlenbein

et al. (1991) in the parallel GA. A perhaps more applied perspective is related

to how EAs balance exploration and exploitation necessary **for** effective **optimization**.

It is assumed that conventional EAs per**for**m a highly explorative **search** at

the beginning of the **optimization** process, which, as the **optimization** progresses,

becomes a moreexploitative onefocusedonthe best solutions foundso far. When

balancing exploration against exploitation it is crucial to maintain a diverse population,

i.e. solutions with different properties, when per**for**ming an explorative

**search** (please refer to Section 3.4 **for** an overview of diversity-maintaining techniques).

However, there are several aspects that may complicate the process of

maintaining diversity in the population. Two common aspects are hitchhiking

and genetic drift, which we describe next.

Hitchhiking (Mitchell, 1996) describes the scenario where newly discovered

high-order schemata spread quickly through the population, slowing down the

discovery of schemata in other positions; especially those that are close to the

defined positions of highly fit schemata. Genetic drift (De Jong, 1975), on the

other hand, refers to the statistical drift of gene frequencies in a population due

to random sampling effects. It occurs because of the finite size of a population

and the stochastic choices made in selection, population updates and genetic

operations, whereby, duetoaccidentsofsamplingamongthesurvived individuals,

highly fit individuals may be lost from the population or the population may

3.1. EVOLUTIONARY ALGORITHMS 53

suffer from a loss of variety concerning some alleles. One effect of drift is that a

population might be pushed down a hill and finds itself in a less fit valley. Drift

in combination with selection enables a population to move uphill (through the

effect of selection) as well as downhill (through the effect of drift), and, of course,

whether the population will find its way back to the same hill is not guaranteed.

TheseconceptshavebeensuggestedbyWright(1932)and**for**mpartofhisshifting

balance theory of evolution. Geographic isolation of individuals and speciation

are other parts of his model.

The choice of optimal parameter settings of EAs has also been approached

from several theoretical viewpoints. For example, Goldberg et al. (1992) suggest

thatalineardependence ofthepopulationsizeonthestringlengthisappropriate.

Empirical results of e.g. Grefenstette (1986), on the other hand, have indicated

that a population size of 30 is sufficient **for** many applications. The choice of the

mutation rate p m has also been the focus of many investigations. The theoretical

analysisofMühlenbein(1992)suggeststhatasettingofp m = 1/l(withlbeingthe

genotype length) is generally adequate. Similar theoretical analyses have been

conducted **for** the choice of the recombination rate and the selection pressure,

which refers to the bias towards selecting fit individuals. The general conclusion

is that the mutation and recombination rate should increase with the selection

pressure (Bäck, 1996; Ochoa et al., 2000).

Recent theoretical studies on EAs focus rather on estimating the running

time complexity of an algorithm as a function of the input space size l of a

problem (see e.g. thework of Doerret al. (2011);Auger and Doerr(2011); Oliveto

and Witt (2011)). Such analyses can be per**for**med using, **for** example, a drift

analysis (Hajek, 1982; He and Yao, 2001).

Despite the drawbacks that some analytical methods might have, theoretical

investigations of EAs contribute to the understanding of **evolutionary** **optimization**

techniques. This in turn, helps us with the tasks of selecting and designing

good strategy variants and operators. In other words, an improved understanding

of **evolutionary** **search** allows **for** more effective **tuning** of EAs, a subject we

consider next.

3.1.2 Tuning **evolutionary** algorithms

The objective of **tuning** an EA is to enhance its **search** per**for**mance. This process

involves to select things like suitable operators and their parameter settings, a

54 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Parameter setting

be**for**e the run

(offline)

during the run

(online)

Parameter **tuning**

Parameter control

Deterministic Adaptive Self-adaptive

Figure 3.3: A classification of different parameter setting methodologies **for** EAs according

to Eiben et al. (1999). A similar classification can be made **for** other design

choices involved in EAs, such as operator selection, population size, etc.

representation of individuals, the population size and other EA parameters. We

will now review a number of techniques that facilitate this **tuning** process.

Be**for**e **tuning** of an EA can be commenced, however, the goal of the experiment

needs tobespecified, suchasdesigning amorerobust, fasterand/orcheaper

algorithm. This helps to identify the hypotheses to test, appropriate test problems,

and the per**for**mance measures to use. Let us assume that all these aspects

have been determined, and we are now faced with the design and **tuning** of an

efficient EA. We can adopt two fundamental methodologies to realize this **tuning**

step: (i) **tuning** offline be**for**e running an algorithm on the actual problem at

hand (this method is referred to as parameter **tuning**), and (ii) adapting parameter

values online during an algorithmic run (this method is referred to as parameter

control). Parameter control is sometimes further divided into deterministic,

adaptive, and self-adaptive approaches (Eiben et al., 1999); this classification is

also shown in Figure 3.3. In the following we will introduce commonly-used techniques

to each method but a more complete overview has been given e.g. by Lobo

et al. (2007); Fialho (2010); Eiben and Smit (2011).

A naïve technique **for** parameter **tuning** is to try out many different parameter

settings by hand and then select the best setup (De Jong, 1975). Such an ad hoc

approach, however, can be time-consuming and requires an a priori definition

of certain aspects, such as the number of different configurations to be tested

and the number of runs to be per**for**med **for** each configuration. Statistical racing

methods (Maron and Moore, 1997; Birattari, 2005) aim at eliminating some

of these drawbacks. The idea here is to empirically evaluate a set of candidate

configurations and successively discard bad ones as soon as sufficient statistical

3.1. EVOLUTIONARY ALGORITHMS 55

Brute-**for**ce

Computational ef**for**t

Statistical racing

Parameter configurations

Figure 3.4: Comparison of the computational ef**for**t needed by a statistical racing

method (dashed region) and a brute-**for**ce approach (rectangular area) to find an optimal

parameter setting. While brute-**for**ce spends an equal amount of computation **for**

all parameter configurations, a statistical racing method discards poor configurations

as soon as sufficient statistical evidence is gathered against them. Figure reproduced

from Birattari (2005).

evidence is gathered against them (Birattari et al., 2002); a visualization of this

procedure is given in Figure 3.4. Another branch of techniques **for** parameter

**tuning** is inspired by the experimental design discipline introduced in Section 2.4.

Popular techniques adopted from this discipline include space-filling designs (Myers

and Hancock, 2001) and response surface methods (Coy et al., 2001; François

and Lavergne, 2001; Czarn et al., 2004; Bartz-Beielstein et al., 2004); in this context,

a response surface represents the per**for**mance of an algorithm as a function

of different parameter configurations. An approach that, similar to response surface

methods, aims at tackling the **optimization** of expensive functions has been

proposed by Corne et al. (2003); Rowe et al. (2006). The authors suggest to

model a landscape as a finite state machine, inferred from preliminary sampling

runs, and then run offline different EAs on this model to chose the most efficient

one. An early drawback of this approach was its limitation to algorithms

employing mutation only, but more recent versions of the technique also model

crossover (Rowe et al., 2010), allowing both mutation and crossover to be tuned

(in addition to population size and selection pressure). However, be**for**e (offline)

**tuning** can be started, one needs to conduct a certain number of (expensive)

evaluations on the real-world landscape to establish a sufficiently accurate model

of the landscape. Finally, some re**search**ers, such as Grefenstette (1986); Nannen

and Eiben (2006); Branke and Elomari (2011), have **for**mulated parameter **tuning**

as an **optimization** problem and employed an EA as a meta-algorithm to tackle

56 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

this problem. The use of multi-objective EAs is particularly convenient when the

aim is to find configurations that are optimal with respect to several per**for**mance

measures (Dréo, 2009; Smit et al., 2010).

A general property of methods **for** parameter **tuning** is that the parameter

configuration found (offline) is kept fixed during the **optimization** process of the

actual problem at hand. This property is a drawback if the dynamic of an **evolutionary**

**search** procedure is such that different parameter settings are more

suitable at different stages of the **search** procedure. As an example, one would

anticipate that a largemutation rateisbeneficial atthe beginning ofthe **search** to

facilitateexploration, while a small mutationratemight bemore suitable towards

theend ofa**search** procedure topromoteexploitation ofpromising **search** regions.

Methods that control parameters (online) during a run account **for** such dynamic

scenarios. Forinstance, inthe motivating example given above, the mutationrate

may be changed according to some deterministic rule based on the evaluation or

generation counter (Holland, 1975; Fogarty, 1989) (deterministic parameter control).

More sophisticated techniques account **for** the state in which an algorithm

is, **for** example, by monitoring the population diversity or the fitness improvement

between generations, and then use these measurements to trigger changes

in the parameter settings according to some heuristic rule (Rechenberg, 1973;

Bäck, 1996) (adaptive parameter control); an example of an adaptive scheme is

thefamous“1/5success rule” ofRechenberg (1973). 1 Adifferent approach, which

has its origin in the evolution strategy community, is to consider parameters like

the mutation rate as additional genes in the chromosome and evolve them alongside

with the actual decision variables of a problem (Schwefel, 1977; Hansen and

Ostermeier, 2001; Beyer and Schwefel, 2002). This **for**m of parameter control

can be seen as a self-adaptive scheme; the obvious advantage of this scheme is

that it does not require the algorithm designer to come up with a heuristic rule

according to which parameters are changed.

Anotherexampleofself-adaptiveparametercontrolisthedynamicselectionof

parameter configurations or operators based on their recent per**for**mance. Davis

(1989) developed the first approach of this kind. He assigned each operator a

probability of being selected in a reproduction process, and this probability was

proportional to the fitness of individuals created with a particular operator. That

1 This rule states that the ratio of successful mutations to all mutations should be 1/5; a

successful mutation is one where the offspring is fitter than its parent. If the ratio is greater

than 1/5, then the mutation step size should be increased, otherwise decreased.

3.1. EVOLUTIONARY ALGORITHMS 57

EA

AOS

Operator

selection

Impact

evaluation

Operator

Impact

(e.g. fitness

difference)

Operator

selection

quality op 1

quality op 2

.

Credit

assignment

Credit

(e.g. average

of fitness

differences)

Figure 3.5: The general scheme of an EA employing an adaptive operator selection

scheme: the EA asks **for** an operator and theoperator selection mechanisms returnsone

based on the empirical quality estimates of the operators; after applying the operator,

its impact is assessed (e.g. by the fitness difference between the generated individual

and its parent); the credit assignment scheme trans**for**ms the the impact into a credit,

which in turn is used to update the quality estimates of operators. Figure reproduced

from (Fialho, 2010).

is, operators that are more successful in creating fitter solutions are assigned a

greater probability **for** being selected in future. There is a variety of self-adaptive

parameter control approaches of this **for**m, and, sometimes, they are also referred

to as adaptive operator selection (AOS) approaches (Fialho, 2010). AOS

approaches are defined by two concepts (see Figure 3.5): (i) the way credit is assigned

to operators upon generating an offspring (credit assignment mechanism),

and (ii) the way this credit is considered in the selection of an operator (operator

selection rule). The credit assigned to an operator can, **for** instance, be the fitness

difference between the generated offspring and its parents (Tuson and Ross,

1998; Lobo and Goldberg, 1997) or the current best individual (Davis, 1989). To

account **for** delayed credit effects, it has also been proposed to pass credit back

to operators involved in the generation of ancestors of the offspring (Davis, 1989;

Julstrom, 1995; Pearson et al., 2001). The operator selection rule first updates

the empirical quality estimates of operators based on the credits obtained (e.g.

by taking the average of credits obtained so far), and then uses these estimates to

select the next operator. Operators can be selected, **for** instance, with a probability

that is proportional to the quality estimates. Two popular operator selection

rules that follow this procedure are probability matching (Goldberg, 1990) and

adaptive pursuit (Thierens, 2005).

Other AOS approaches trans**for**m the operator selection rule into a multiarmed

bandit (MAB) problem (Mahajan and Teneketzis, 2008). A MAB problem

58 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

isanon-associative, purelyselectional learningproblemoriginallydesignedtotest

the ability of an algorithm to trade off exploration against exploitation (Robbins,

1952a); MAB problems are related to problems in the field of rein**for**cement learning,

which we introduce in Section 3.6. In a MAB problem one is faced with a

number of actions, each action being associated with an unknown reward distribution

from which a reward is drawn upon taking the action. The goal of the

algorithm is to maximize the sum of rewards received over a number of actions

taken. When considering operator selection as an MAB problem, the available

actions would correspond to the operators, and the reward to some credit associated

with the application of an operator. Fialho (2010) provides a comprehensive

review of AOS approaches that consider operator selection as an MAB problem.

In general, these approaches employ the same credit assignment schemes as other

AOS approaches. However, their operator selection scheme employs a bandit algorithm,

such as the upper confidence bound (UCB) algorithm (Auer et al., 2002).

We will introduce this algorithm in detail in Section 5.2.2, where we investigate

an AOS approach to deal with one of the resourcing issues (ephemeral resource

constraints) considered in this thesis.

Note that parameter **tuning** and control has not only been considered **for** EAs

but also **for** other metaheuristics, in particular local **search** algorithms. However,

although these metaheuristics share many common features with EAs, they are

not as deeply re**search**ed as EAs. In general, the field around self-**tuning** **search**

heuristicsisbecomingincreasinglypopularinrecentyears.Therecentself- ∗ **search**

workshop(Ochoaetal.,2010)andtrack(OchoaandSchoenauer,2011)heldatthe

international conferences PPSN and GECCO, respectively, are further evidence

of the burgeoning interest in this subject within the **evolutionary** computation

community.

The pure task of **tuning** an EA can be accompanied by the task of actually

trying to understand how different parameter configurations might affect **search**.

For instance, to investigate the effect of different representations one may look at

measures characterizing the problem difficulty (Rothlauf, 2006); examples of such

measures include the fitness-distance correlation (Jones and Forrest, 1995) and

the negative slope coefficient (Vanneschi et al., 2006). Distance measures that

account **for** the choice of operators have been analyzed e.g. by Altenberg (1997).

The concept of heritability (Mühlenbein and Paaß, 1996) is another measure

that has been employed in the analysis of operators. For detailed descriptions of

3.2. OPTIMIZATION IN UNCERTAIN ENVIRONMENTS 59

these concepts please refer to the references provided. Of course, an understanding

of the per**for**mance effect of operators and their parameter settings can also

be gained from a theoretical point of view; the previous section has introduced

various approaches that can be employed **for** this purpose.

Finally we want to remark that when **tuning** and comparing algorithms,

one must not **for**get that according to the no free lunch theorem (Wolpert and

Macready, 1997), all **search** algorithms have identical per**for**mance when their

per**for**mance is averaged over all possible **search** spaces. Although this implies

that there is no single general-purpose **search** algorithm, it also implies that,

over a restricted set or class of problems (one which is not **closed** under permutation)

(Schumacher et al., 2001; Igel and Toussaint, 2004), per**for**mance differences

must exist and there**for**e a best algorithm or set of algorithms must also exist.

In the following sections we will look at different problem classes, and successful

strategies employed to tackle these problems; we begin with problems subject to

some **for**m of uncertainty.

3.2 Optimization in uncertain environments

Uncertainty in **optimization** can be due to a variety of sources. In this section, we

focus on sources related to noise, and the **search** **for** robust and reliable solutions.

3.2.1 Optimization subject to noise

Uncertainty in the **for**m of noisy (inaccurate) fitness values is a common side

effect in problems where solutions are evaluated through physical experimentation

or computer simulations (e.g. Monte Carlo simulations). For instance, when

trying to find potent combinations of drugs, replicating the experiment of a drug

combination will yield variations in the fitness measurements even if identical

drugs and concentrations of drugs are used.

Noise can be attributed to a number of sources: Inaccuracies in realizing the

decision variables of a genotype, e.g. concentrations of drugs, can introduce noise

during the genotype-phenotype mapping, while measurement errors, human errors,

and sampling errors can introduce noise during the fitness measurement

process. Additional perturbations in the fitness may be due to uncontrolled environmental

factors, such as ambient temperature and deviations in the per**for**mance

of experimental equipment (e.g. instruments and robots).

60 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

f(⃗x)

⃗x local minimum ⃗x global minimum ⃗x

Figure 3.6: Visualization of the effect of a noisy fitness function f. The error bars

indicate the level of noise **for** different solutions ⃗x. There does not need to be a correlation

between the noise level and the fitness. The large noise level at the local

minima ⃗x local minima may mislead the **search** to this region and thus away from the

region containing the true optimal solution ⃗x global minimum .

Arguablythemost commonnoisemodel considered isadditivewhite Gaussian

noise (Beyer, 2000), where noise is added to the true fitness value according

to a Gaussian distribution N(0,σ) with zero mean and standard deviation σ.

Alternative noise models include uni**for**m and Laplacian noise but these tend to

be less common in real-world systems.

The challenge associated with noise is that one cannot be sure whether an

improvement obtained by a decision variable change is a real improvement or the

effect of noise (see Figure 3.6). In the extreme case of large noise, there is no

useful selection in**for**mation (because fitness improvements can be equally due to

parameter changes andnoise), causing the parameters to be optimized to per**for**m

arandomwalk(Beyer,2000). However, noiseisnotalwaysbad. Acertainamount

of it might be helpful during the initial phases of the **search** (Rana et al., 1996)

as well as improve the reliability of convergence to the global optimum (Bäck

and Hammel, 1994; Levitan and Kauffman, 1995) (because noise can allow an

optimizer to escape from local optima).

The standardapproach**for**copingwithnoisyfitness functionswithin EAsisto

repeat the evaluation of a solution several times (referred to as the sampling rate)

and then take the average over the fitness values to approximate its true fitness.

This approach is referred to as resampling (Fitzpatrick and Grefenstette, 1988)

in the **evolutionary** computation community; recall that the experimental design

discipline refers to the same approach as replication (see Section 2.4). Obviously,

an increase in the sampling rate results in a more accurate fitness approximation

but this comes at the cost of a larger number of (potentially expensive)

3.2. OPTIMIZATION IN UNCERTAIN ENVIRONMENTS 61

evaluations. Suggestions on how to set or adapt the sampling rate, such as increasing

the sample rate **for** individuals exhibiting high variances in their fitness

value, and more sophisticated techniques, have been proposed e.g. by Aizawa and

Wah (1994); Stagge (1998); Sano and Kita (2000); Branke and Schmidt (2004).

The alternative to obtaining highly accurate fitness values is to per**for**m ‘rough’

evaluations but allow an EA to consider many more candidate solutions per generation;

this effect is achieved by an increase of the population size (Fitzpatrick

and Grefenstette, 1988). It is unclear which of the two alternatives, accurate or

rough evaluations, is more efficient, given a fixed number of fitness evaluations.

Theoretical and empirical studies analyzing the trade-off between the sampling

rate and the population size have been conducted e.g. by Fitzpatrick and Grefenstette

(1988); Beyer (1993); Bäck and Hammel (1994); Hammel and Bäck (1994);

Rattray (1996); Miller (1997); Beyer (2000).

The degree of the selection pressure (i.e. the degree to which fitter solutions

are favoured in parental selection) has a significant per**for**mance impact in the

presence of noise too. Several re**search**ers, including Miller (1997); Beyer (2000),

have shown that fitness proportionate selection is less sensitive to the level of

noise than tournament selection. On the other hand, selection schemes with high

selection pressure, such as tournament selection with relatively large tournament

sizes (Beyer (2000)considers atournament sizeof5), canyield betterfinalresults.

Some re**search**ers suggest to diverge from common parental and environmental

selection schemes in the presence of noise. Markon et al. (2001), **for** instance,

suggest to replace a parent by its offspring only if the offspring is fitter than

some pre-defined threshold. On the other hand, Hughes (2001) suggests not to

select the individual that has a higher mean fitness than some other individual

but the one that has a lower probability of being worse than some other individual.

In other words, Hughes recommends to minimize the probability of making

a wrong decision. He analyzed this approach in a multi-objective **optimization**

setup. Un**for**tunately, the number of studies considering noise and other **for**ms

of uncertainty within a multi-objective **optimization** is limited, and so is also our

understanding in this domain.

The use of elitism to keep track of the best solutions found does not seem

to improve the convergence behavior of an EA in the presence of noise; at least

**for** some versions of evolution strategies (ESs) (Beyer, 1993, 2000). The reason

is the risk of overemphasising elite solutions, which may misguide the **search**.

62 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Figure 3.7: A (1,λ)-ES with λ = 5 using rescaled mutations. The five (gray) offspring

candidate solutions are generated by mutating the (black) candidate solution in the

center. All offspring are inferior to their parent. The parent of the next generation

is constructed by taking a reduced mutation step in direction of the best offspring

(circled). Figure reproduced from Arnold (2005).

To circumvent this issue, Büche et al. (2002) looked at various re-evaluation

techniques and strategies that limit an individual’s lifetime.

A noise-handling approach that has its origin in the ES community is the

method of rescaled mutations (Rechenberg, 1994). The idea of this approach

is to make large mutation steps when generating offspring in order to cancel

out the effect of noise and thus increase the chance of selecting the single, truly

best offspring (see Figure 3.7). Since the mutation steps are large, the offspring

are likely to have a poorer fitness than the parent. The hope is that a reduced

(rescaled) step towards the best offspring will yield a solution that is fit enough

to replace the current parent and be carried over to the next generation. The

obvious challenge with this approach is to appropriately set the rescaling factor;

theoretical andempirical work addressing thischallenge include e.g. (Beyer, 1998,

2000; Arnold, 2005).

Several other noise-handling strategies have been proposed over the years.

The interested reader is referred to Jin and Branke (2005) **for** a review.

3.2.2 Searching **for** robust and reliable solutions

Let us now have a closer look at uncertainties associated with the **search** **for** robust

and reliable solutions. We can define robust solutions as ones that are fitness

insensitive to small perturbations in the decision variables or environmental conditions

(e.g. manufacturing tolerances), and reliable solutions as ones that remain

feasible or safe under such variation (Branke, 2001; Knowles, 2009). Perturbations

in the **optimization** environment may result in a changing fitness landscape

and thus cause a shift in the optimal solution. It is the general focus of the field

3.2. OPTIMIZATION IN UNCERTAIN ENVIRONMENTS 63

of dynamic **optimization** (see Section 3.4) to track these shifting optima. However,

if shifts in global optima are random and small, one may prefer to seek a

fit robust solution instead of tracking an optimal solution that may be located

on a shaky peak (Branke, 2001). To **search** **for** robust solutions one can apply

similar strategies as **for** noisy **optimization**, due to the similarities between the

two **optimization** tasks (Jin and Branke, 2005): Taguchi’s robust design methods

are a popular approach originating from the experimental design discipline, while

averaging techniques, population up-sizing, and other approximation techniques

are more common within EAs. For an overview of approaches to robust **optimization**

and robustness measurement techniques please refer to Beyer and Sendhoff

(2007).

The **search** **for** reliable solutions is a task that has been considered only recently

by the **evolutionary** computation community. In fact, this task may arise

only when the optimal solution is located close to the constraint boundary. The

commonapproachtakeninthisdomainistoassumethattheuncertaintyofdesign

variables and problem parameters is known in **for**m of some probability distribution.

This assumption allows then to trans**for**m the constraints into probabilistic

constraints, and consider a solution as feasible if its probability of satisfying each

constraint is greater than some pre-defined threshold. The advantage of this approach

is that the constraints can be considered as deterministic, and standard

techniques **for** coping with constraints (see Section 3.3) can be applied to find

reliable solutions. However, commonly one is interested in the overall reliability

(joint probability) of a solution in terms of satisfying all constraints rather than

each single constraint on its own. It remains a challenge to compute this joint

probability, and several approximation methods have been proposed to do this

calculation. The interested reader is referred to Deb et al. (2009) **for** an overview

of these techniques and the application of EAs to reliability-based **optimization**.

3.2.3 Further **for**ms of uncertainties in **optimization**

There are two more main sources that can introduce uncertainty during an **optimization**

process. The first is related to the problem of tracking shifting optima,

mentioned above in the context of robust solutions. Here, uncertainty is due to

the fact that it is often unknown when the environment changes, and in which direction

the optimal solution shifts upon an environmental change. It is the focus

of the dynamic **optimization** field to deal with these issues, and we will introduce

64 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

False minimum

Figure 3.8: Visualization showing how an approximation error in a meta-model may

lead to a false minimum. The solid line represents the true fitness function, the dashed

line the approximated function, and the six dots the sample points. Figure reproduced

from Jin (2005).

this field in more detail in Section 3.4.

Another common source of uncertainty is associated with the application of

response surface techniques (also known as surrogates or meta-models in the

contextof**evolutionary****search**). Rememberthatthemainpurposeofmeta-models

is to approximate the fitness function when fitness evaluations are expensive (see

Section 2.5 and 2.6). However, whilst evaluating the real fitness function returns

the true fitness of a solution, albeit sometimes with noise, fitness values taken

from the meta-model may be subject to approximation errors (see Figure 3.8).

For a review of techniques **for** coping with such approximation errors please refer

to e.g. Jin et al. (2002); Ong et al. (2004).

Having gained an overview of different **for**ms of uncertainty that may be encountered

in **optimization**, we can generally conclude that in the presence of

limited evaluation numbers one has to accept a lower solution quality. The question

that arises is whether it is worth risking expensive evaluations to find the

best optimal solution, or is it rather advisable to **search** **for** a robust or reliable

but perhaps lower quality solution; work addressing this question include e.g. (Jin

and Sendhoff, 2003). Although many real-world problems are subject to different

**for**ms of uncertainties simultaneously, there is un**for**tunately only little work in

the literature considering approaches **for** such problems.

3.3 Constrained **optimization**

So far, we did not pay much attention to how constraints onthe decision variables

are dealt with during an **evolutionary** **search**. Roughly speaking, an **optimization**

problem may be subject to two types of (static) constraints: box constraints

(also known as parametric constraints or domain constraints) on individual decision

variables, which are usually always present, and a set of linear and/or

3.3. CONSTRAINED OPTIMIZATION 65

non-linear constraints that further restricts the values of the decision variables.

The issue when applying EAs to constrained problems is that (simple) **search** operators

(and encodings) are blind to constraints and thus do not guarantee that

feasible parent individuals will provide feasible offspring. This issue led to the

development of many different constraint handling methods of which we briefly

discuss some, but **for** a more detailed description we refer e.g. to Michalewicz

and Schoenauer (1996); Coello (2002); Mezura-Montes (2009). Although dynamic

constraints, such as ERCs considered in this thesis, are materially different to

standard (linear and non-linear) constraints, as we shall see later, it is nevertheless

in**for**mative to consider how constraints in EAs are usually handled.

Methods based on preserving feasibility of solutions. Some methods

falling into this category preserve feasibility by making use of special representation

schemes (Bean, 1994) and/or special operators (suitable **for** these schemes)

(Michalewicz and Nazhiyath, 1995; Davis, 1991). For instance, Davidor (1991)

used an EA with a variable-length representation to generate trajectories of a

robot; and, to cope with this representation he designed an analogous crossover,

which defines crossover points in parent chromosomes based on their phenotypic

similarities. Other methods in this category utilize decoders (Koziel and

Michalewicz, 1999; Kime and Husbands, 1997), which aim at mapping the original

feasible region into another easier to optimize space, or scan the boundary

of the feasible region using some strategy that coordinates jumps between the

feasible and infeasible region (Glover, 1977; Schoenauer and Michalewicz, 1996).

Notethatthelatterapproachreliesonthe(oftentrue)assumption thattheglobal

optimal solution is located at the boundary of the feasible region. The drawback

of some of the strategies falling into this category is that they require an initial

feasible solution. Michalewicz and Nazhiyath (1995), **for** instance, find this solution

by iteratively generating random solutions in the **search** space until a feasible

one is found. It is easy to see that this method can be time-consuming **for** applications

where the feasible region is relatively small compared to the infeasible

region.

Methods based on penalty functions. Although preserving feasibility is

sometimes preferred, most **evolutionary** computation techniques use constrainthandling

techniques that are based on the concept of (exterior) penalty functions,

which in some way penalize infeasible individuals. The degree of the penalty

66 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

often depends on either the distance of an infeasible individual from the feasible

region or on the ef**for**t to repair an infeasible individual. Different designs

have been proposed **for** maintaining an appropriate balance between objective

and penalty function. The simplest designs always maintain a constant penalty

strength (static penalties) (see e.g. Homaifar et al. (1994); Michalewicz (1995)),

or simply reject an infeasible individual and then generate new individuals until

a feasible one is found (death penalties) (Rechenberg, 1973; Schwefel, 1975;

Bäck et al., 1991). Other designs use a more sophisticated approach and increase

the penalty strength gradually over time (dynamic penalties) (Joines and Houck,

1994; Kazarlis and Petridis, 1998), use concepts inspired by simulated annealing

(Kirkpatrick et al., 1983) (annealing penalties) (Michalewicz and Attia, 1994;

Carlson and Shonkwiler, 1998), or change the penalty strength depending on the

quality of individuals in the population (adaptive penalties) (see e.g. Bean and

Hadj-Alouane (1992)). And again another interesting approach, which is known

as segregated genetic algorithm (Riche et al., 1995), aims at approaching the feasible

optimal solution from both sides of the boundary of the feasible region by

evolving a population that maintains roughly two subpopulations, one containing

(more likely) infeasible individuals and the other one (more likely) feasible individuals.

Recently, a (static) penalty function based method has been proposed

that ranks individuals stochastically according only to their penalty function or

to their fitness values (Runarsson and Yao, 2000). To reduce the number of expensive

evaluations, Runarsson (2004) extended the stochastic ranking method

further by fitting a response surface to both the penalty and the fitness function.

Methods based on a **search** **for** feasible solutions. There are three main

techniques fallingintothiscategory. Onetechnique, knownasbehavioral memory

approach (Schoenauer and Xanthakis, 1993), tackles the constraints of a problem

one after another whereby it is always ensured that a new constraint is only

tackled if there is a sufficient number of individuals in the population satisfying

all previously tackled constraints. A second group of techniques (see e.g. Powell

and Skolnick (1993)) are similar to penalty functions and also penalize infeasible

individuals but at the same time strictly favor feasible individuals over infeasible

ones. The third main technique in this category incorporates the concept of repairing

infeasible individuals. To date, many different repairing schemes as well

as replacement strategies **for** dealing with the repaired (feasible) individuals have

been proposed (see e.g. Liepins and Vose (1990); Nakano and Yamada (1991);

3.3. CONSTRAINED OPTIMIZATION 67

Liepins and Potter (1991); Nakano (1991); Orvosh and Davis (1994)); we will describeonerepairingprocedure,

whichisemployed bytheGENOCOPIIIapproach

of Michalewicz and Nazhiyath (1995), in more detail in Section 3.4.1. Typically,

a replacement strategy is concerned with the question whether evolution should

consider only the fitness value of a repaired solution (Baldwinian evolution), or

thefitnessandalsotherepairedsolution(genotype)itself(Lamarckianevolution).

For a recent survey of repairing methods we refer to Salcedo-Sanz (2009).

Hybrid methods. Thisgroupofconstraint-handlingmethodscontainsEAsthat

are combined with other **optimization** techniques, which tend to be numericalbased.

Hybrid methods include, amongst others, EAs that make use of Lagrange

multipliers (see e.g. Adeli and Cheng (1994)), co-**evolutionary** models (Paredis,

1994), and concepts of ant colony **optimization** (Bilchev and Parmee, 1996).

Methods based on multi-objective **optimization**. Recently, the application

of multi-objective **optimization** methods to constrained problems has gained popularity.

Typically, these methods regard each constraint as an objective, which

are then optimized alongside the actual objective function using techniques from

multi-objective **optimization** (Montes, 2004). A comprehensive survey of multiobjective

EAs and their application to constraints has been given by Deb (2001).

Considering the above constraint-handling methods, it is apparent that infeasible

individuals (though unwanted) are often allowed to coexist with feasible

individuals. The reason **for** allowing infeasible individuals to enter a population is

that it may increase the probability of finding even fitter individuals, particularly

on complex problems with rugged fitness landscapes and disjoint feasible regions.

When dealing with problems where individuals violating a constraint cannot be

evaluated, such as problems subject to ERCs, these techniques need to be modified

or replaced by other techniques. Two simple techniques that do allow the

evaluation of feasible individuals only are the death penalty and repairing. Alternatively,

one can avoid evaluating an infeasible solution altogether and simply

assign a poor fitness value to it. We will investigate the per**for**mance of these

methods later in the presence of ERCs. Un**for**tunately, there is still little work

on guidelines suggesting how to select the most appropriate constraint-handling

technique and its parameters **for** a certain application.

68 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

f(⃗x)

t t+1

f(⃗x)

⃗x

⃗x global optimum ⃗x global optimum

⃗x

Figure 3.9: Visualization of how the location of the current best solution ⃗x global optimum

at time t (left) may shift at time t+1 (right) due to a change in the fitness function

(solid line). This change requires the population (indicated by the black dots) to track

the new optimal solution.

3.4 Dynamic **optimization**

The field of dynamic **optimization** is traditionally concerned with problems that

are subject to stochastically changing fitness functions (landscapes). Reasons **for**

a change in the fitness landscape can be, **for** instance, that new jobs to be scheduled

arrive in a dynamic job shop scheduling problem, or that new cities to be

visited have to be considered in a dynamic TSP. The natural objective when

dealing with dynamic problems is to track the potentially shifting optima upon

a fitness landscape change (see Figure 3.9). The book of Branke (2001) provides

a comprehensive overview of re**search** on how EAs can be used to achieve this

objective. Let us briefly review some seminal work in this domain.

Tracking ofshifting optimaisinitiatedby achangeinthefitness landscape. In

applications like dynamic job shop scheduling and dynamic TSP (Larsen, 2000),

it is explicitly known when the landscape undergoes a change. However, there

are also applications where the point of time of an environmental change is not

known explicitly. As an example, consider deviations in the quality of raw materials

or wearing out of machines, both factors being common in engineering

and manufacturing applications. Gradually or more suddenly, optimal operating

conditions may shift unexpectedly in this case. Such applications require a mechanism

that detects changes in the fitness landscape (and subsequently triggers

the tracking of optima). A commonly-used mechanism to detect environmental

changes is to watch out **for** deteriorations in the time-averaged best per**for**mance

of the population (Cobb, 1990; Cobb and Grefenstette, 1993; Vavak et al., 1996).

Alternatively, one may re-evaluate several individuals at each generation, and assume

anenvironmental change when thefitness ofatleast oneofthese individuals

3.4. DYNAMIC OPTIMIZATION 69

has changed (Branke, 1999; Carlisle and Dozier, 2000).

Various strategies have been proposed to track potentially shifting optima

upon detecting an environmental change. A naïve one is to restart the **optimization**

process from scratch (Raman and Talbot, 1993; Kidwell and Cook, 1994). In

essence, thisstrategyturnsadynamic**optimization**probleminto aproblemwhere

a sequence of independent static **optimization** problems is to be solved. Although

being simple, this strategy can be effective if the fitness landscape undergoes a

severe change; we will demonstrate this later in Section 6.1.5. However, most

of the time, a restart policy is rather impractical as reusing genetic in**for**mation

obtained in the past can help to track optima quickly. There are many strategies

that aim to tackle exactly this drawback of the restart policy. Roughly speaking,

we can divide the working procedure of these strategies into three groups: diversity

control, memory-based and multi-population approaches. We introduce each

approach in more detail in the following.

Diversity control approachesaccount**for**achangingenvironmenteitherby

maintaining adiverse populationthroughoutthe**search**, orbyincreasing diversity

whenever achangeinthelandscapeisdetected. Techniques fallingintothe**for**mer

category include the set of niching techniques (Mahfoud, 1995). These techniques

are also often augmented on EAs to tackle static **optimization** problems because

they are supportive in detecting multiple optima. Examples of niching techniques

include fitness sharing (Goldberg and Richardson, 1987) and crowding (De Jong,

1975; Mahfoud, 1995). Sharing aims to discourage over-populated regions in the

**search**space. Itachieves thisbyreducing thefitness ofindividuals byafactorproportional

to the number of individuals located in the same region. Crowding aims

to achieve a similar objective but does this by allowing offspring to replace one of

their parents only if they are less-fit and similar to the offspring; Algorithm 3.2

shows the pseudocode of a simple but effective crowding procedure, deterministic

crowding. Another approach to constantly maintaining a diverse population is to

replace, at each generation, some part of the population with randomly generated

individuals, also referred to as random immigrants (Grefenstette, 1992).

A common procedure **for** introducing diversity into the population upon detecting

an environmental change is to temporarily increase the mutation rate.

While the method of Cobb (1990), triggered hypermutation, increases the mutation

rate drastically **for** a single generation, the variable local **search** method

70 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.2 A single generation of deterministic crowding (Mahfoud, 1995)

Require: f (objective function), µ (parent population size), Pop (current population)

1: **for** i = 1 to µ/2 do

2: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents, ⃗p (1) and ⃗p (2)

randomly from Pop, and then recombining and mutating them

3: if (distance(⃗p (1) ,⃗x (1) ) + distance(⃗p (2) ,⃗x (2) )) ≤ (distance(⃗p (1) ,⃗x (2) ) +

distance(⃗p (2) ,⃗x (1) )) then

4: if f(⃗x (1) ) > f(⃗p (1) ) then replace ⃗p (1) with ⃗x (1)

5: if f(⃗x (2) ) > f(⃗p (2) ) then replace ⃗p (2) with ⃗x (2)

6: else

7: if f(⃗x (2) ) > f(⃗p (1) ) then replace ⃗p (1) with ⃗x (2)

8: if f(⃗x (1) ) > f(⃗p (2) ) then replace ⃗p (2) with ⃗x (1)

of Vavak et al. (1996) increases the range of mutation (e.g. the number of decision

variables mutated) slowly, depending on the **search** progress made after

an environmental change. Several re**search**ers including Angeline (1997); Bäck

(1998) have also investigated the application of self-adaptive mutation rates (see

Section 3.1.2) **for** dynamic **optimization**.

The idea of memory-based approaches is to augment an EA with the

ability to store genetic in**for**mation, which has been of high-quality in the past,

and retrieve this in**for**mation at some later stage of the **optimization**. Such an

approach can be beneficial when parts of the environment/problem can return to

a **for**merly visited state in which case the retrieved in**for**mation may be used to

speed up tracking of optima. Memory can be used in two **for**ms (Branke, 2001):

implicitly or explicitly. Implicit memory is provided through the use of a redundant

representation. A popular approach is to use a diploid representation, i.e.

one where each chromosome possesses two or more genes at each locus (one dominant

and one recessive), combined with a dominance scheme that governs the

expression of each allele in the phenotype (Goldberg and Smith, 1987; Ng and

Wong, 1995; Hadad and Eick, 1997). A potential drawback of implicit memory

is that the in**for**mation stored may not be used in an efficient way. Approaches

employing explicit memory aim at overcoming this drawback. For instance, Trojanowski

et al. (1997) equips each individual of the population with the ability to

remember the genotype of a fixed number of its ancestors. These ancestors are

then re-evaluated upon an environmental change, and the fittest of them replaces

the current individual if it is fitter. Whilst this is a simple example of an EA with

explicit memory, it highlights some of the challenges associated with this **for**m

3.4. DYNAMIC OPTIMIZATION 71

of memory: it requires the algorithm designer to define what in**for**mation should

be stored in the memory, and what in**for**mation should be retrieved under which

circumstances.

The purpose of multi-population approaches is to evolve multiple populations

simultaneously and thus explore multiple promising regions in the **search**

space at the same time. Niching approaches have a similar purpose but they

do not divide the population explicitly into multiple subpopulations. One of the

early multi-population approaches is the shifting balance GA of Oppacher and

Wineberg (1999). Inspired by the shifting balance theory of Wright (1932) (see

Section 3.1.1), this GA splits the population into one large core population and

a number of small colony populations (also referred to as demes). The purpose

of the demes is to facilitate genetic drift, which was postulated by Wright to

be necessary in the **optimization** of a fitness landscape. The core population is

responsible **for** exploiting a promising **search** region, and it sends (**for**ces) the

colony populations out to explore new areas in the **search** space. En**for**cement

is achieved by a mechanism that pushes a colony away from the core whenever

the two populations are too close to each other. In regular intervals, the core

gains in**for**mation about the colonies through immigration of colony members,

allowing the core to shift its **search** focus to potentially more promising regions.

A potential downside of multi-population approaches is the definition of several

properties including the number and size of the different populations, the migration

scheme, and the merging/‘**for**king’ scheme of populations. These properties

tend to interact in sophisticated ways in modern multi-population approaches,

such as the self-organizing scouts approach of Branke et al. (2000).

3.4.1 Dynamically-constrained **optimization**

Recently, **evolutionary** **search** has also been considered **for** continuous problems

subject to dynamic constraints (Nguyen, 2011; Richter and Dietel, 2011; Richter,

2010). The purpose of these constraints is to simulate changes in the feasible

**search** region occurring as a function of the evaluation or generation counter.

As these constraints yield a moving feasible region, the location of the global

optimum can move as well. Notice that constraints that prevent the temporary

evaluation of solutions only, such as ERCs, do not cause a shift in the feasible

region nor in the global optimum; (ERCs can cause a shift in the set of evaluable

72 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

solutions only but this will become clearer in the next chapter).

Similarly with static constraints (see Section 3.3), maintaining infeasible solutions

in the population of an EA seems an important aspect in the presence of

dynamic constraints. In fact, Singh et al. (2009) ensure that some fraction of the

population consists of (fit) infeasible solutions by favouring them over feasible

ones in environmental selection. The resulting mixed population can accelerate

**search** along the constraint boundary.

An interesting approach **for** dealing with dynamic constraints has been proposed

by Nguyen and Yao (2009a). Rather than tracking the moving global optima,

it is proposed to track the moving feasible region by maintaining always

a reference set of feasible solutions. Solutions belonging to this set can then

be involved in the repairing procedure of infeasible solutions. To repair an infeasible

solution, Ngyuen and Yao borrow the repairing procedure of GENOCOP

III (Michalewicz and Nazhiyath, 1995). This procedure employs two stages: first,

a path between the infeasible solution and a feasible solution (from the reference

set) is created; then, along this path, solutions are generated at random until a

feasible one is found, which replaces the original infeasible solution. Clearly, the

drawback of this repairing approach is that it can be a time-consuming matter to

generate a feasible solution (at random). To ensure the existence of a reference

set of feasible solutions, solutions belonging to it are checked **for** feasibility every

generation and repaired (using the same approach as above) if needed. Un**for**tunately,

the authors do not specify how to proceed in the case where all solutions

of the reference set become suddenly infeasible. This situation does not occur in

GENOCOP III because the algorithm is designed **for** static constraints with the

assumption that the population contains at least one feasible solution.

Clearly, our understanding of the effect of dynamic constraints on **optimization**

is still limited, and more re**search** is needed in this emerging field. The recent

studies of Jin et al. (2010); Oh et al. (2011) suggest that understanding the dynamics

associated with changing constraints can also be beneficial when dealing

with problems featuring naturally static constraints. The objective of these studies

is to efficiently tackle constrained problems; in particular, problems where the

constraints yield disconnected feasible regions. The authors propose a dynamic

approach that first builds anapproximate model **for** each constraint function, and

then uses this model to manipulate the feasible region. At the beginning of the

**optimization**, the model enlarges the original feasible region to facilitate a less

3.5. ONLINE OPTIMIZATION 73

constrained **search**. As the **search** proceeds, the feasible region is shrunk incrementally

to eventually match the original feasible region. The hope is that this

procedure enables an optimizer to jump more easily between the originally disconnected

feasible regions. Of course, the assumption made here is that solutions

that are infeasible to the original problem can be evaluated.

We will see later that ERCs are of dynamic nature too. In fact, rather than

being only changed based on some counter or periodically, ERCs may arise due

to some random event (e.g. a machine breakdown), or uncontrolled factors (e.g.

delays in the delivery of raw materials required in an experiment). Perhaps more

interesting is the scenario where ERCs arise due to previously made decisions.

Such time-linkage scenarios are common in online **optimization**, which we will

consider in more detail in the next section.

3.5 Online **optimization**

Online **optimization** deals with problems where, as time goes by, decisions have to

be made without any knowledge of future events (Borodin and El-Yaniv, 1998).

Generally, it is assumed that decisions cannot be revoked and thus that timelinkage

effects (Bosman, 2005) may arise (we explain this effect in more detail

below). For instance, in the online version of a TSP one needs to decide how

vehicles should be routed to serve a new customer; and, in caching problems it

needs to be decided which page should be removed from the memory (cache) if

the cache is full. An online algorithm makes these decisions based on past events

such that some cumulative score, e.g. miles driven by a vehicle or number of page

faults, is optimized.

Be**for**e we say something more about online algorithms, we want to describe

how the per**for**mance of these algorithms is measured in this domain as this is

different from standard **optimization**. Is it easy to see that the quality of an

online algorithm depends on the sequence of events it is faced with during an

**optimization** run. The standard approach to measure this quality is to compare

the per**for**mance of an online algorithm on each event sequence with that of

an optimal offline algorithm, which has complete knowledge of the future. An

online algorithm is said to be competitive when the ratio (also known as competitive

ratio) between its per**for**mance and that of the optimal offline algorithm

74 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

is bounded. This per**for**mance analysis method is known as competitive analysis

(Sleator and Tarjan, 1985), and it falls within the framework of worst-case

complexity (BorodinandEl-Yaniv, 1998). Alternatively, competitiveanalysis can

be viewed as a game between an online player and a malicious adversary. The

adversary (optimal offline algorithm) has knowledge about the online algorithm

run by the online player, and it aims to construct an event sequence that is costly

to process **for** the online player but inexpensive **for** the optimal offline algorithm.

An aspect worth mentioning is that randomized online algorithms, i.e. algorithms

that involve some random component in the decision making process to an event,

are more difficult to fool by an adversary than deterministic online algorithms. 2

The application of EAs to online **optimization** is not new. In fact, dynamic

**optimization** problems are tackled in an online fashion if the dynamical changes

are unknown be**for**ehand. Note that an event sequence consists here of different

fitness landscapes and/or constraints rather than a sequence of page requests to

be dealt with. A challenge that has received less attention in the field of **evolutionary**

online (dynamic) **optimization** are time-linkage effects (Psaraftis, 1988;

Branke and Mattfeld, 2005; Bosman, 2005; Nguyen and Yao, 2009b), as Bosman

termed it. A time-linkage effect refers to a scenario where decisions made now

may influence the environment and score that can be obtained in future; in turn

this may affect the cumulative score that ought to be optimized. A simple example

of a time-linkage effect is that visiting customers on time in a routing problem

increases customer satisfaction and thus may result in additional customers (in

certain geographical regions) in future.

A promising approach to cope with time-linkage effects seems to be to anticipate

future changes. For instance, Branke and Mattfeld (2005) realizes anticipation

by **search**ing **for** solutions that are not only fit but also flexible in the

sense that they can easily be adjusted to upcoming environmental changes. For a

dynamic job-shop scheduling problem where new jobs arrive stochastically during

the course of the **optimization**, the authors showed that the avoidance of early

idle times can support the flexibility of a solution. Consequently, to facilitate

the **search** **for** fit and flexible solutions, they extended the objective function to

additionally penalize early idle times.

TheapproachtakenbyBosmanandhiscollaboratorsBosman(2005);Bosman

2 There are subtleties associated with how competitive analysis should be carried out and

interpreted in the presence of randomized algorithms that we will not go into here.

3.5. ONLINE OPTIMIZATION 75

Reward

Current

time step

Past

Future

t

Decision events

Figure 3.10: A schematic illustration of the effect of anticipation. Anticipation may

help to make a decision that seems poor at the current time step but better in the long

run (see path of gray dots). Figure reproduced from Bosman and Poutré (2007).

and Poutré (2007); Bosman (2007) to tackle online time-linkage problems is to

predict (or anticipate) the futureby learning fromthepast. Technically speaking,

“the value of the dynamic objective function **for** future time steps is estimated by

minimizing the generalization error over the time period that contains the data

to learn from” (Bosman, 2005). The estimation is then used to select the **search**

path with the highest objective function value (see Figure 3.10). Hutzschenreuter

et al. (2010) applied this approach successfully to hospital resource management,

where the task is to efficiently allocate resources to units of a hospital. In this

application, the hospital environment was modelled by parametrized pathways of

groups of patients (Hutzschenreuter et al., 2009). Anticipation was realized by

simulating differentresourceallocationpoliciesonthismodeloffline(**optimization**

of policy parameters was done using an EA) and then making allocation decisions

onlinebasedonthepolicythatwasoptimalwithrespecttosomefuturetimespan.

Later, when we deal with ERCs, we will encounter time-linkage effects too.

However, there is a crucial difference between problems subject to ERCs and

standard online dynamic problems, such as dynamic job-shop scheduling and the

hospital resource management problem described above: while the objective in

standard online (dynamic) **optimization** is to optimize some cumulative score,

the objective in problems subject to ERCs is to find a single optimal solution.

Optimizing some cumulative score is also the goal in the field of rein**for**cement

learning (RL), which we introduce next, as a prelude to explaining how RL can

be used to help tune EAs **for** challenging domains, such as problems subject to

dynamic constraints.

76 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

state s t reward r t

Agent

action a t

Environment

Figure 3.11: The agent-environment interaction in rein**for**cement learning: at each

time step t, the agent receives some representation of the environment’s state s t and

the reward r t **for** reaching this state, and on that basis it selects an action a t .

3.6 Rein**for**cement learning

The field of rein**for**cement learning (RL) offers a variety of ideas and algorithms

that have been employed in the **evolutionary** computation community **for** **tuning**

and controlling of EA operators and their parameters. We give a brief introduction

to RL here, and review some work investigating **evolutionary** **search** in

combination with RL in the following section. The interested reader is referred

to the book of Sutton and Barto (1998) and the survey of Kaelbling et al. (1996)

**for** a more complete overview of the RL field.

Generally speaking, RL is a computational approach to learning from interaction

of an agent with an environment whereby the aim of an RL agent is to

learn some optimal policy π ∗ , a mapping from an environmental state s t ∈ S to

an action a t ∈ A(s t ) (see Figure 3.11), so as to maximize some discounted return:

R t =

∞∑

γ j r t+j+1 , (3.2)

j=0

where r t+1 ∈ R is a numerical reward received after taking action a t in state

s t , and γ (0 < γ ≤ 1) is the discount rate. The discount rate determines the

value of future rewards at the present time step. As the value of γ decreases,

the importance of future rewards reduces so that in the case where γ ≈ 0 the

RL agent is myopic and aims only at maximizing the immediate reward. In

learning problems where the agent-environment interaction can be divided into

sub-sequences (known as episodes), such as single runs of an EA, an agent aims

at maximizing the return at the end of each episode. Let us define the above

mentioned concepts including the assumptions made in RL in more detail.

The common assumption made in RL is that environmental states satisfy the

3.6. REINFORCEMENT LEARNING 77

Markov property, meaning thatthe stateand rewardat time t+1depends only on

the state and action at time t (rather than on the complete history of an agent),

or:

P(s t+1 = s,r t+1 = r|s t ,a t ,r t ,s t−1 ,a t−1 ,...r 1 ,s 0 ,a 0 ) = P(s t+1 = s,r t+1 = r|s t ,a t ).

(3.3)

It is common to assume the Markov property even when states do not summarize

all the relevant in**for**mation accurately (Sutton and Barto, 1998). A learning task

that satisfies the Markov property is referred to as a Markov decision process

(MDP).MDPsextendMarkovchains(seeSection4.3.1)withthenotionofactions

and rewards. Equivalently, we can think of MDPs that feature a single action in

each state, andareward of zero, asMarkov chains. To keep the number of actions

and states in an MDP manageable, especially with respect to tabular-based RL

algorithms (of which more later), it is common to quantize the state and action

space if they are continuous.

In the presence of an MDP, some policy π, and the corresponding probability

π(s,a) of taking action a when in state s, we can calculate the expected return

received after starting in some state s t and then following π according to

{ ∞

}

∑

V π (s) = E π {R t |s t = s} = E π γ j r t+j+1 |s t = s , (3.4)

whereV π iscalledthestate-value function **for** policyπ. Inasimilarfashionwecan

define the action-value function **for** policy π, Q π (s,a) = E π {R t |s t = s,a t = a},

as the expected return received after taking action a in state s and following π

thereafter. We can say that a policy π is better or equal to another policy π ′

if it has an expected return that is greater than or equal to that of π ′ **for** each

state s ∈ S, or V π (s) ≥ V π′ (s),∀s ∈ S. The goal of an RL agent is to find

(ideally quickly and reliably) a policy π ∗ that is at least as good as any other

policy; π ∗ is referred to as the optimal policy. The corresponding state-value

function **for** an optimal policy, also called optimal state-value function, is defined

by V ∗ (s) = max π V π (s),∀s ∈ S; we can define the optimal action-value function

analogously by Q ∗ (s,a) = max π Q π (s,a),∀s ∈ S,∀a ∈ A(s).

Many RL agents **search** **for** an optimal policy by finding the optimal value

function. If we have complete knowledge of an environment as an MDP, i.e.

we know π(s,a),∀s ∈ S,a ∈ A(s), and the expected rewards associated with

j=0

78 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.3 Tabular TD(λ) **for** estimating V π

1: V(s) = 0, e(s) = 0,∀s ∈ S

2: **for** each episode do

3: initialize s

4: repeat

5: take action a = π(s), and observe reward r and next state s ′

6: update e(s) **for** all s ∈ S according to Equation (3.7)

7: update V(s) **for** all s ∈ S according to V(s) = V(s)+α(r+γV(s ′ )−V(s))e(s)

8: s = s ′

9: until s is terminal state

any state-action transition, then we can find an optimal policy using dynamic

programming methods (see Chapter 4 in the book of Sutton and Barto (1998)).

However, although many RL methods aim to achieve the same effect as dynamic

programming methods, such methods are limited in their use because a complete

model of the environment is rarely available and because they tend to be computationally

expensive; thus we do not consider dynamic programming methods

any further here.

In the common case where the environmental dynamics are unknown, the

agent estimates the value function from experience it gains by interacting with

theenvironment. Apopularapproach**for**learning fromexperience isthetemporal

difference (TD) method of Sutton (1988). This method updates the value of state

s t immediately after visiting that state. After taking action a t in state s t and

subsequently observing a reward r t+1 and the new state s t+1 , the TD method

updates the estimate V(s t ) by

V t+1 (s t ) = V t (s t )+α(r t+1 +γV t (s t+1 )−V t (s t )), (3.5)

where 0 < α ≤ 1 is the learning rate. As α increases the estimate V t+1 (s t ) relies

more on r t+1 +γV t (s t+1 ) and less on its previous estimate V t (s t ). 3

Equation (3.5) is the update rule used by a specific TD method, referred to

as TD(0). A drawback of TD(0) is that updating a state-value estimate V(s)

only upon visiting the state s is rather inefficient and may require the agent a

long time to estimate the correct value function. A more general version of the

algorithm, TD(λ) (see Algorithm 3.3), updates V(s) not only upon visiting s,

but also upon visiting any subsequent state (during that episode). To realize this

3 The term temporal difference refers to the difference between the two estimates r t+1 +

γV t (s t+1 ) and V t (s t ).

3.6. REINFORCEMENT LEARNING 79

procedure, TD(λ) maintains a kind of memory, called the eligibility trace e. The

task of the trace is to keep track of previously visited states that are eligible **for**

being updated based on currently obtained rewards. To account **for** the trace,

the update rule of V(s) changes to

V t+1 (s) = V t (s)+α(r t+1 +γV t (s t+1 )−V t (s t ))e t (s),∀s ∈ S. (3.6)

Eligibility traces can be updated in various ways. One example are replacing

traces, which update the trace according to

⎧

⎨ 1 if s = s t ,

e t (s) =

⎩γλe t−1 (s) otherwise,

(3.7)

where 0 ≤ λ ≤ 1 is the decay factor of the trace. For λ = 0 the update rule

is equivalent to Equation (3.5). As λ increases, more importance is given to

the actual reward received at each time step rather than the value estimates of

subsequent states. Another popular trace option is an accumulating trace. This

trace increases the eligibilities of states by some constant upon visiting them,

rather than resetting them to a constant.

Theupdaterulesintroducedabovetellushowtoestimate the state-value function

V **for** an arbitrarily deterministic policy π. Ideally we would like to control

(improve) a policy with the aim to approximate eventually the optimal policy π ∗ .

Oneway toimprove π isto select ineachstatestheactiona ∗ = argmax a Q π (s,a)

that is expected to yield greatest return according to the estimates Q π . Since

Q π (s,a ∗ ) ≥ V π (s),∀s ∈ S (Bellman, 2003), the new policy will be at least as

good as π. To be able to control a policy we thus must explicitly estimate the

value of an action, or more generally the action-value function Q π . 4

Selecting always the best action seems to be a good strategy at first glance

because it ensures that we optimally exploit the the current estimates. The drawback

of this strategy, however, is that by taking always the current best action we

will never be able to try new actions and thus limit our chances of finding an even

better policy. Also, our current estimates may be inaccurate due to early sampling

errors, preventing the truly best action from being selected. To overcome

4 Clearly, if we would know the next state s t+1 we are moving to after taking an action a t ,

then we do not need to estimate Q π (s t ,a t ). We could achieve the same effect by picking the

action a t that will yield the state s t+1 with the greatest state-value estimate V π (s t+1 ).

80 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.4 Tabular Sarsa(λ), an on-policy TD control method

1: V(s,a) = 0, e(s,a) = 0,∀s ∈ S,a ∈ A(s)

2: **for** each episode do

3: initialize s,a (the action a can be e.g. selected at random from A(s))

4: repeat

5: take action a = π(s) (e.g. ǫ-greedy), and observe reward r and next state s ′

// the initial policy π can be e.g. to select a random action a from A(s) in

each state s

6: choose a ′ = π(s ′ ) (e.g. ǫ-greedy)

7: update e(s,a) **for** all s ∈ S and a ∈ A(s)

8: update Q(s,a) **for** all s ∈ S and a ∈ A(s) according to

Q(s,a) = Q(s,a)+α(r +γQ(s ′ ,a ′ )−Q(s,a))e(s,a)

9: s = s ′ ,a = a ′

10: until s is terminal state

these issues an agent must explore new actions. A simple approach **for** trading off

the exploitation of our current knowledge and the exploration of new actions is to

select the currently optimal action a ∗ **for** most of the time and only occasionally,

say with probability ǫ, a suboptimal one; this action selection method is referred

to as ǫ-greedy. This way we allow each action to be selected with a non-zero

probability π(s,a) > 0 at any time. Theoretically, this property ensures that in

the limit as the number of actions increases, each state-action pair (s,a) will be

sampled an infinite number of times, guaranteeing that the estimates of Q(s,a)

converge to the true values; in turn this means we can approximate the optimal

policy (given we have infinite time). Exploration may also be encouraged by initializing

all entries of Q(s,a) with an optimistically large value (Schmidhuber,

1991). This way we ensure that all actions are selected at least once. One way

to test the ability of an action selection method to trade off exploration with exploitation

is to run it on a multi-armed bandit (MAB) problem (see Section 3.1.2

**for** a description of this problem).

AnexampleofaTDalgorithmwithapolicycontrolmechanism, Sarsa(λ)(Rummery

and Niranjan, 1994), is shown in Algorithm 3.4. This RL algorithm belongs

to the class of on-policy control method because it estimates the value of a policy

while using it **for** control. RL algorithms that use two different policies **for** these

tasks are known as off policy control methods; Q(λ) (Watkins, 1989) is a famous

example of this algorithm class. The advantage of off-policy methods is that they

allow learning of the value function **for** a policy even if experience is available

from another policy. The obvious drawback, however, is that this may reduce

3.6. REINFORCEMENT LEARNING 81

the learning speed. There are many more RL algorithms, each having its own

weaknesses and strengths; **for** a review please refer to Sutton and Barto (1998).

3.6.1 Rein**for**cement learning in **evolutionary** **optimization**

The application of RL in **optimization** goes back to the seminal work of Zhang

and Dietterich (1995, 1996); Zhang (1996), which investigated RL in the context

of job-shop scheduling. The aim of this work was to learn a repair-based scheduler

that incrementally repairs (resource) constraint violations with the goal to

obtain schedules of short durations. Zhang and Dietterich considered complete

schedules as states, and used domain-specific heuristics as the actions. The reward

was a hand-crafted function based on the number of violated constraints

and repairs per**for**med. The RL agent learnt a policy offline on problems involving

a small number of jobs, which was then tested on larger problems. Zhang

and Dietterich demonstrated that the repair-based scheduler can outper**for**m a

non-learning iterative repair algorithm.

Withinthe**evolutionary**computationcommunity, RLhasbeenused**for****tuning**

and controlling of EA operators and their parameter values. Müller et al. (2002)

were one of the first to combine **evolutionary** **search** with RL. They employed RL

to learn online how to adapt the mutation step size within ESs depending on the

success rate achieved with a mutation step size. The success rate after a fixed

number of mutations represented the state, and the actions were the options to

increase, decrease, or keep the current step size. The conclusion of this study

was that the choice of the reward function as well as the lower and upper bounds

of the mutation step size is crucial to obtain optimal per**for**mance. Eiben et al.

(2007)investigated asimilar approach**for**onlineadaptationofvariousparameters

involved inaGA,including themutationandcrossover rate, tournament sizeused

in the selection of parents, and population size.

Similar to the study of Zhang and Dietterich, Pettinger and Everson (2003);

Pettinger (2002) learnt a policy offline using a test environment and then tested

it in a different environment. The goal here was to learn when to switch between

different crossover and mutation operators employed by an EA during an

**optimization** process. Interestingly, the RL agent was also permitted to select

the type of individuals (either a random individual from the fittest 10% of the

82 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

population, or one from the remaining population) to which the operators are

applied. The (discretized) state space in this application was characterized by

the current time step, the average population fitness, and the diversity of the

population. And, the reward after applying an operator was the relative fitness

difference between the fittest offspring generated and the fittest parent.

Recently, rein**for**cement learning has gained popularity in the field of hyperheuristics

(Cowling et al., 2000; Burke et al., 2009). A hyper-heuristic can be

seen as a method that automates the process of selecting heuristics from a given

set of (low-level) heuristics with the goal to improve the **search** per**for**mance. The

heuristics may represent different operators or strategies needed to construct a

solution. RL has been employed in this domain to select online between these

heuristics; see e.g. Nareyek (2003); Meignan et al. (2010); McClymont and Keedwell

(2011).

According to the classification of parameter setting methods of Eiben et al.

(1999) (see Section 3.1.2), we can assign RL-based approaches that learn some

policy π offline and online to the branch of parameter **tuning** techniques, respectively,

adaptive or self-adaptive parameter control techniques.

3.7 Chapter summary

We have now reviewed a variety of techniques **for** analyzing and **tuning** of EAs

to the main problem classes related to **closed**-**loop** **optimization** problems with

resourcing issues. In particular, we looked at problems featuring uncertainty (e.g.

noise), constraints, changing landscapes, and online decision making. Understanding

the techniques reviewed is helpful when trying to tackle novel challenges

in **optimization**. As we will see in the remainder of this thesis, many of the strategieswe

consider **for**coping withresourcing issues arerelated andinspired byideas

described in this chapter. The use of Markov chains **for** modeling EA dynamics,

**for** instance, is employed in the next chapter to theoretically analyze the effect of

ERCs on **evolutionary** **search**.

Chapter 4

Ephemeral Resource Constraints:

Definitions and Concepts

In the previous two chapters we have introduced the class of **closed**-**loop** **optimization**

problems, where solutions are evaluated by conducting experiments,

and reviewed how **evolutionary** algorithms (EAs) have been extended to cope

with some of the common challenges that arise in such problems and others, such

as noise, constraints, changing environmental conditions, and real-time decision

making. In this chapter we look at a related but novel challenge in **closed**-**loop**

**optimization** deriving from resourcing issues. We introduce the framework to describe

a particular resourcing issue: the temporary non-availability of resources

required in the evaluation process of solutions. This resourcing issue yields an **optimization**

problem with a static fitness function but dynamic constraints, which

we call ephemeral resource constraints (ERCs). After **for**mally defining problems

subject to ERCs, we introduce a variety of ERC types that were encountered

by Knowles (2009) in his collaborative work and that seem to be common in

real-world applications. Finally, we present an initial theoretical investigation on

the impact of ERCs on **evolutionary** **search**. The aim of this investigation is to

gain initial insights into how ERCs can affect the per**for**mance of an EA **for** different

configuration choices. Our analysis looks at configuration choices related

to different selection and reproduction schemes used within a simple EA.

83

84 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

4.1 Ephemeralresource-constrained**optimization**

problems (ERCOPs)

Compared with standard **optimization** problems, defining an ephemeral resourceconstrained

**optimization** problem (ERCOP) is more involved and requires knowledge

of the **closed**-**loop** environment that is to be modeled. The reason is that

the dependency of the ephemeral resource-constraints (ERCs) on some variables,

such as time, function evaluations, costs and previous solutions evaluated, does

not fit into thestandarddefinition ofan**optimization**problem. How exactly these

variables are handled or simulated thus needs careful consideration.

Consider again Figure 2.1 of Section 2.1, which illustrates the basic setup of a

**closed**-**loop** problem: Solutions are planned on a computer using an **optimization**

algorithm, e.g. an EA, and then submitted **for** evaluation to the experimental

plat**for**m. Depending on the plat**for**m, we may submit **for** evaluation a single solution,

or, if solutions can be evaluated in parallel, then an entire population

plan (i.e. a generation). When the experiment is completed, a generation of fitness

values is returned, and the **loop** is repeated. But this iterative process is

complicated by the fact that resources are needed to conduct the experiments,

and these might run out e.g. as a function of time, or previous actions taken (or

both). If the EA requests the evaluation of a solution that cannot be evaluated

then something must happen, and this something must be defined. In a real experiment,

a human might intervene to order some new resources, or allow the EA

to generate some alternative solution(s).

The purpose of an ERCOP is to simulate the above kinds of experimental

scenarios, including the way that non-evaluable solutions arise (i.e. as a function

of parameters such as time, **search** history, or costs), and how they are to be

handled.

4.1.1 General problem definition of an ERCOP

To define ERCOPs **for**mally let us recall the definition of a generic **optimization**

problem of Section 2.1, and add to it the notion of a time-ordered **search**, and

the concept of a non-evaluable solution, as follows.

4.1. ERCOPS 85

An **optimization** problem of generic **for**m can be defined as

maximize y = f(⃗x) (4.1)

subject to ⃗x ∈ X,

where ⃗x = (x 1 ,...,x l ) is a solution vector and X a feasible **search** space. The

static objective function (also known as fitness function) f : X ↦→ Y represents a

mapping from X into the objective space Y ⊂ R.

A black box **optimization** algorithm a, e.g. an EA, **for** solving the above problem

can be represented as a mapping from previously visited solutions to a single

new solution in X, as suggested by Wolpert and Macready (1997). Formally,

a : φ(t) ↦→ ⃗x t , where the **search** history φ(t) = {(⃗x 1 ,y 1 ),...,(⃗x t−1 ,y t−1 )} denotes

the time-ordered set of solutions visited until time step t−1, and ⃗x i and y i indicate

the X value, respectively, the corresponding Y value of the ith successive

element in φ(t). We augment this notion of a **search** algorithm with the ability

to visit a null solution, x null ∉ X, with the effect that the algorithm can ‘wait’

**for** a time step without evaluating a solution. An optimizer might submit null

solutions, **for** example, if it wishes to wait until a missing resource is again available.

In fact, this approach has been employed by Schwefel (1968) in his flashing

nozzle problem (see preface of Chapter 1).

NowweareabletodefinethegeneralERCOP. Whileinstandard**optimization**

problems, the objective value f(⃗x t ) of a feasible solution ⃗x t ∈ X is y t = f(⃗x t ), in

an ERCOP, we have

⎧

⎨f(⃗x t ) if ⃗x t ∈ E t (σ t ) ⊆ X

y t =

⎩null

otherwise,

(4.2)

where E t (σ t ) represents the set of evaluable solutions (or evaluable **search** region)

at time step t. In our case, E t (σ t ) is defined by a set of schemata into which

solutions have to fall in order to be evaluable; 1 we introduced the concept of

schemata in Section 3.1.1 in the context of the schema theorem. 2 The set E t (σ t )

1 Notice that if schemata represent evaluable solutions, then E t is the union of schemata.

If schemata represent non-evaluable solutions, as it will be the case with random ERCs (see

Section 4.2.4), then E t is the complement of the union of schemata.

2 Notice that in the context of ERCs, schemata are not related to the Schema Theorem or

schema processing per se; here we use the concept of schemata merely as a convenient way to

identifyanysubspaceofthesolutionspaceinordertospecifyregionsthatbecomenon-evaluable.

86 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

X

4

E t

t

1

X

4

t+1

E t+1

1

2

5

6

2

3

7

5

7

6

3

8

8

Figure 4.1: A schematic diagram showing how a population consisting of the solutions

1,2,3,4,5,6,7,and 8 might be distributed across the feasible **search** space X and the

evaluable **search** space E t . At time step t, only the solutions 2,5,6, and 7 can be evaluated

while the solutions 1,3,4, and 8 must be repaired to be evaluable. Each evaluation

might cause a change in E t .

may change over time depending on a set of problem-specific and time-evolving

parameters σ t . The availability of resources required **for** the evaluation of solutions

depends on these parameters. Thus, the set σ t may include parameters such

as various types of counters (e.g. cost, time and evaluation counters), the **search**

history (which may be used to encode non-availabilities of resources due to previously

made decisions), random variables (which may encode random events like

machine breakdowns), and so **for**th. How exactly the set E t changes depending

on the parameters σ t is modeled by the ERCs.

Note that the objective function f (thus also the global optimum) is static

anddoes not change over time inastandard ERCOP; it is just theset of solutions

evaluableateachtimesteptthatmaychange. Inthiscontext, repairingasolution

means to modify, through some repair function, the genotype of a solution that

is not in E t such that it is **for**ced into E t ; i.e. the outcome of a repairing step is a

solution that falls into all schemata that define E t (assuming that the schemata

are non-contradictory; i.e. E t = ∅).

Compared to standard (dynamic) constraints, the meaning of ERCs is different:

a solution ⃗x that violates an ERC at time t, or ⃗x /∈ E t , is not infeasible but

non-evaluable at time step t. That is, the experiment that is associated with ⃗x —

be it involved in the genotype-phenotype mapping or the fitness assessment of the

phenotype — cannot be conducted and thus the objective function is undefined

(or null) **for** solution ⃗x at time t. 3 Figure 4.1 illustrates a common situation in an

ERCOP with respect to the distribution of solutions across E t and the feasible

3 Notice the difference between a null solution and a solution with a fitness value of null: A

null solution is submitted with the purpose to skip an evaluation, while a solution with fitness

null is submitted with the purpose of being evaluated but then is not due to a lack of resources.

4.1. ERCOPS 87

**search** space X.

Time in an ERCOP can be seen as the simulated time defined by the real

**closed**-**loop** experimental problem that is to be simulated. Hence, time may refer

not only to function evaluations of single solutions, as is the case in standard

**optimization** problems, but also e.g. to real time units (e.g. seconds) or cost units

(e.g. pounds). This notion of time allows ERCs to be dependent amongst others

on the number of evaluated solutions, expenses, or a certain date, such as days

of the week (independent of the number of evaluated solutions).

Thenormalassumptionisthatallevaluationstakeequaltimeorresources, but

this need not be the case. Generally, experiments may be of different durations

and have non-homogeneous costs in terms of the financial or temporal resources

they require.

4.1.2 Further comments on ERCOPs

ThedefinitionofERCOPswegaveinSection4.1.1representsthetypeofERCOPs

we consider in this thesis. There are several ways in which this definition can be

extended.

We consider ERCOPs that feature a static objective function f. However, it

is also realistic **for** ERCOPs to feature a dynamic function f (see Section 3.4) in

addition to dynamic constraints. For instance, resources in **for**m of raw material

or instruments may be replaced with new ones during the **optimization** process,

causing a change in f and potentially a shift in the global optimum. We will

consider **optimization** problems with a dynamic function f when investigating

problems subject to changing variables in Chapter 6, but this will not be done

within an ERCOP framework.

Another extension to ERCOPs is to optimize not only a single objective f

but several f 1 ,...,f m (Deb, 2001). For instance, in a drug discovery problem one

may not only aim to find potent drug mixtures but also ones that are unlikely

to cause side effects, and involve a small number of drugs (as this reduces costs

when manufacturing drug mixtures in mass quantities).

Moreover, the objective function f is typically noisy in **closed**-**loop** scenarios

(see Section 3.2), and there may be standard constraints defining the feasible

**search** space X (see Section 3.3). All these aspects are important to be accounted

**for** but, **for** the sake of understanding the effect of ERCs on EAs, we believe it is

best to start with a standard problem setup, meaning non-noisy objective values

88 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

and unconstrained **search** spaces.

Finally, we remark that an ERCOP can be cast as a rein**for**cement learning

problem rather than a pure **optimization** problem. For various reasons that

will become clear later, we postpone a discussion related to this subject until

Chapter 7.

4.2 Specific types of ephemeral resource constraints

(ERCs)

In this section we introduce four ERC types that were encountered by Knowles

(2009) in his collaborative work and that seem to be common in real-world applications:

commitment relaxation ERCs, periodic ERCs, random ERCs, commitment

composite ERCs; Table 4.1 gives examples of these constraints (more

detailedexamplesaregiveninthesectionsintroducing therespective ERCtypes),

and the Nomenclature section summarizes notation related to ERCs, which we

will introduce in more detail next. Be**for**e we define each ERC type, we want to

introduce three elements that are common to all ERC types considered. These

elements are the constraint time frame, the activation period and the constraint

schema.

4.2.1 Fundamental elements of ERCs

The activation period k(ERC i ) of ERC i ,k ∈ Z + , is the number of counter units

**for** which that constraint remains active, once it is ‘switched on’. Similar to time

steps, counter units may refer to function evaluations of a single solution, a set

of solutions (in case experiments are conducted in parallel), real time units (e.g.

seconds), acalendarperiod(e.g.Tuesday2-4pm), orsomethingelse. Inthisthesis,

they refer to function evaluations of a single solution.

The constraint time frame (ctf) of ERC i is {t|t start

ctf (ERC i ) ≤ t < t end

ctf (ERC i)}

where t represents some counter unit, as above. 4 The constraint ERC i may be

active only during the ctf and not outside of the ctf. That is, if we assume an

ERCOP to be subject to a single constraint, then we have E t (σ t ) ⊆ X,∀t ∈ ctf,

and E t (σ t ) = X,∀t /∈ ctf. The period of time 0 ≤ t < t start

ctf and t end

ctf ≤ t ≤ T (T is

4 We are looking at the **optimization** scenario where the ctf is a single continuous period of

time but it also realistic to have a ctf that is separated by unconstrained periods.

4.2. SPECIFIC TYPES OF ERCS 89

Table 4.1: Different types of ephemeral resource constraints (ERCs) and examples.

ERC

Commitment relaxation ERC

(models the temporary commitment

to a certain resource upon

using it)

Periodic ERC (models availabilities

of resources at regular time

intervals only)

Random ERC (models temporary

non-availabilities of randomly

selected resources)

Commitment composite ERC

(models scenario where variables

of a solution define a composite,

e.g. a tyre, that needs to be

locally available **for** the solution

to be evaluated. Composites

need to be ordered in advance,

are associated with a reuse

number and shelf life, and have

to be kept in capacity limited

storage.)

Examples

• Commitment to a certain instrument setting

**for** some time once it is selected.

• Commitment to a certain drug **for** some time

(e.g. until the drug is used up) once it is given

to the robot **for** mixing.

• Periodic availability of certain engineers required

to operate certain instruments.

• Periodic availability of certain drugs due to

logistic limitations.

• Non-availabilityofcertaininstrument settings

due to machine/instrument failures.

• Delays in the delivery of drugs.

• When optimizing the crash test abilities of a

vehicle, the tyre setup may be subject to **tuning**.

A tyre would represent a composite that

costs money and must be ordered in advance,

used up within a certain period, and involved

in at most a certain number of tests; storage

space **for** tyres may be limited too.

• Drugs that must be ordered in advance and in

batches only, have shelf lives, and need to be

kept in capacity-limited storage.

the total **optimization** time) is the preparation period, respectively, the recovery

period (see Figure 4.2). The duration of these two periods has a significant effect

on the per**for**mance of an EA, as we will see later.

The restriction imposed by an ERC during the activation period can be of

different **for**ms. Inourcase, resources areassociatedoftendirectlywithindividual

solution variables, which allows us conveniently to use the notion of schemata

to describe availabilities of resources. We say that solutions have to fall into

a particular constraint schema H(ERC i ) (associated with a constraint ERC i )

in order to be evaluable. From Section 3.1.1, where we introduced the concept

of schemata in the context of the schema theorem, we know that a schema H

represents aparticularsubset ofsolutionsthatsharesomecommonproperties. To

90 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

preparation period constraint time frame

recovery period

0

t start

ctf

t end

ctf

T

t

Figure 4.2: An illustration of how the available **optimization** time T can be divided

into the preparation period 0 ≤ t < t start

ctf

, the constraint time frame t start

ctf

≤ t < t end

ctf ,

and the recovery period t end

ctf

≤ t ≤ T.

make it more specific, let us consider a drug discovery problem with the objective

being to identify the most potent mixtures of drugs drawn from a library of size

l = 5; a decision variable is binary indicating whether a drug is present (1) or

absent (0) in a drug mixture, which is an entire bit string. In this example,

the constraint schema H = (∗1 ∗ ∗0) would describe the set of all solutions

(drug mixtures) **for** which the drugs associated with the second and fifth bit

position are present and absent, respectively. The wildcard symbol ∗ means that

we do not care whether the drug associated with that bit position is present or

absent in a mixture (i.e. bit values can be either 0 or 1). Recall also the two

general properties of a schema, its order o(H) and its length l(H), representing

the number of defined bit positions and the distance between the first and last

defined bit position, respectively; **for** the above example we have o(H) = 2 and

l(H) = 3. In the presence of multiple constraints ERC i , solutions have to fall

into the union of the schemata ⃗x ∈ ⋃ i H i associated with the constraints. 5 The

meaning of the ERC related parameters is also summarized in the Nomenclature

section.

In this thesis we consider discrete **search** spaces, mainly of pseudo-boolean

nature or X ∈ {0,1} l . In non-discrete ones, H might restrict solution parameters

to lie within or out of certain parameter value ranges rather than to take specific

parameter values. This scenario may occur, **for** example, when optimizing the

concentration of drugs involved in a drug cocktail. Here, we may have available

only a limited amount of the individual drugs, causing the concentration of a

drug to be upper bounded.

5 Alternatively, theschemataH i mayrepresentresourcesthatarenotavailable**for**evaluation,

thus define the evaluable region E t negatively. In this case, solutions are evaluable if they are

a member of the space spanned by the complement of the union of schemata or ⃗x ∈ ¬( ⋃ i H i).

4.2. SPECIFIC TYPES OF ERCS 91

V

0

k(1) k(2) ...

k(6)

T

t

Figure 4.3: An illustration of how a commitment relaxation ERC may partition the

**optimization** time into epochs of length V, and how it may be potentially activated.

The activation period k(j) during the jth epoch is represented by the dashed part.

4.2.2 Commitment relaxation ERCs

The first type of ERCs we introduce are commitment relaxation ERCs. Generally

speaking, this ERC **for**ces or commits an optimizer to a specific variable value

combination (i.e. constraint schema) **for** some (variable) period of time whenever

it uses this particular combination. Forcing a variable or linked combination of

variables to be fixed **for** some time models real-world problems involving changeover

costs of one sort or another. In particular, if changing a variable’s value

would incur some (large) change-over cost, such as a cleaning step, a component

replacement, or a testing phase, then such changes to the variable may be made

taboo **for** some period. Often, the change-over is much cheaper if done at a particular

time step immediately after component replacements or cleaning (which

is commonly done routinely rather than reactively), and so the variable can be

allowed to change at that point.

A commitment relaxation ERC simulates the above scenario, so that some

variable(s) setting (or schema) H, is **for**bidden from changing during a period of

time we refer to as an epoch. We denote by V the duration of the epoch, and

define the activation period k(j),0 ≤ k(j) ≤ V, to be the duration of the period

of time we have to commit to a particular setting H during the jth epoch. Note,

the length of the activation period may change with each new epoch depending

on when the particular setting H is selected by the optimizer. To describe the

setting H we can conveniently use a constraint schema. For example, we would

use H = (∗1∗∗0) to state that a commitment is associated with the instrument

setting **for** which the value of bit position 2 and 5 is set to 1 and 0, respectively.

Figure 4.3 illustrates the partition of the **optimization** time into epochs, and

a possible distribution of activation periods. Imagine the six epochs illustrated

by the figure to represent six working days, each consisting of V = 9 hours (assuming

working hours to be from 8am to 5pm). The limitation that causes the

commitment relaxation ERC to arise, can be:

92 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.1 Implementation of a commitment relaxation ERC

Require: t start

ctf

(startofctf), t end

ctf

(end ofctf), V (duration ofanepoch), H (constraintschema)

1: commitmentRelaxationERC(⃗x,t){

2: if t = 0 then

3: last activation = 0; k = 0 // initialize auxiliary variables

4: if t ∈ ctf then

5: if t−last activation ≥ k then

6: if ⃗x ∈ H then

7: last activation = t; k = V −t mod V

8: return true // ⃗x is evaluable

9: else if ⃗x /∈ H then

10: return false // ⃗x is not evaluable

11: else

12: return true // ⃗x is evaluable

13: else

14: return true } // ⃗x is evaluable

“In an **optimization** problem involving the selection of instrument settings, the

configuration, c, once set, cannot be changed during the remainder of the working

day.”

In the above example, the constraint schema H represents the parameter

combination that corresponds to instrument configuration c. The length of an

activation period is bounded by 0 ≤ k(j) ≤ 9. For instance, imagine we select

instrument configuration c in the middle of the day, say at 1pm, as indicated by

epoch j = 1 in the figure. This will activate the ERC **for** a period of k(1) = 4

(=5pm-1pm) hours (indicated by the dashed part). Activating the ERC later,

earlier, or not at all during a working day changes k(j) accordingly. Clearly,

whether we actually evaluate solutions during an activation period (and thus

use the setting c), or submit null solutions during this period, depends on the

constraint-handling strategy employed; it is only that if we decide to evaluate a

solution during an activation period, then it needs to be from H.

From Figure 4.3 it is also apparent that the total number of constraint activations

during the **optimization** can vary between 0 ≤ j ≤ ⌈T/V⌉. That is, we

might be lucky and the ERC may be never activated, e.g. if solutions belonging

to H do not lie on an optimizer’s **search** path, but already one activation may introduce

enough solutions from H into the population such that future activations

might be more likely.

4.2. SPECIFIC TYPES OF ERCS 93

Algorithm 4.2 Illustration of how we manage ERCs and the evaluation of solutions

Require: ERC 1 ,...,ERC r (set of ERCs), f (objective function), T (time limit)

1: t (global time variable representing the current time step)

2: functionWrapper(⃗x,t){

3: y t =null

4: if t < T then

5: if ⃗x is a null solution then

6: ⃗x t = ⃗x

7: else

8: if ⃗x satisfies the ERCs ERC 1 ,...,ERC r then

9: ⃗x t = ⃗x; y t = f(⃗x t )

10: else

11: ⃗x t =repair(⃗x,t); y t = f(⃗x t )

12: t++

13: return ⃗x t and y t }

The corresponding implementation of a commitment relaxation ERC is defined

by Algorithm 4.1. The method commitmentRelaxationERC(⃗x,t) is defined

by the parameters t start

ctf ,t end

ctf

,V, and H, and it takes as input a candidate solution

⃗x that is to be checked **for** evaluability and the current (global) time step t. The

output is a boolean value indicating whether ⃗x is evaluable or not. The method

maintains two auxiliary variables, last activation and k, required to update the

internal state of the constraint: Line 5 to 7 are responsible **for** the activation

of the ERC and the setting of the activation period, while Line 9 ensures that

solutions have to be in H during an activation. In future, we will denote a commitment

relaxation ERC of this **for**m by commRelaxERC(t start

ctf ,t end

ctf

,V,H). The

meaning of the parameters related to this and other ERC types is summarized in

the Nomenclature section too.

An extension to this simple commitment relaxation ERC is to maintain not

only one but several commitment relaxation ERCs with different constraint

schemata H i . In this case, we need to consider three aspects: (i) a solution is

non-evaluable if it violates at least one ERC, (ii) a repaired solution has to satisfy

all activated ERCs and not only the ones that were violated, and (iii) it needs

to be checked whether a repaired solution activates an ERC that was not activated

be**for**e. We will consider this extension later in Section 5.1.4. At this point,

we would like to describe how we manage an ERC method, such as the method

commitmentRelaxationERC(⃗x,t). To keep things simple, we design a function

wrapper, as shown in Algorithm 4.2, that takes care of the management of ERCs

94 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

and the evaluation of solutions. Calling this wrapper is similar to calling the objective

function f in a standard **optimization** problem. The wrapper is defined by

the set of ERCs, ERC 1 ,...,ERC r , an ERCOP is subject to, the objective function

f, and the time limit T. It takes as input a solution ⃗x to be evaluated, and the

current (global) time step. If the solution is a null solution (which is checked at

Line 5), then the time counter is only incremented. If the solution is not a null

solution and satisfies all ERCs (i.e. all ERC methods return true at Line 8), then

it is immediately evaluated, otherwise it is first repaired and then evaluated. The

output of the wrapper is the (repaired) solution and its fitness; the fitness is null,

if a null solution is submitted. The repairing method repair(⃗x,t) needs to be implemented

by the optimizer. The structure of the function wrapper depends on

the ERCOP to be simulated and thus needs to be specified alongside the ERCs.

4.2.3 Periodic ERCs

A periodic ERC models the availability of a specific resource, represented by a

constraint schema H, at regular time intervals. That is, the ERC is activated

every P time steps (period length) **for** an activation period of exactly k time steps

(see Figure 4.4). As the ERC models the availability of resources, an individual

has to be a member of H during the activation period. An example of a periodic

ERC is:

“In an **optimization** problem requiring skilled engineers to operate instruments,

on Mondays, only engineer eng i is available.”

In the above example, the activation period is k = 1 (assuming a time step

is a day), the period length is P = 7 (i.e. a week), and the constraint schema

H represents the parameter combination that corresponds to the instruments (or

their settings) operated by engineer eng i .

The corresponding implementation of a periodic ERC is defined by Algorithm

4.3. The method periodicERC(⃗x,t) is defined by the parameters t start

ctf ,t end

ctf ,

k,P, and H, and it takes as input a candidate solution ⃗x that is to be checked **for**

evaluability and the current (global) time step t. The output is a boolean value

indicating whether ⃗x is evaluable or not.

In future, we will denote periodic ERCs by perERC(t start

ctf ,t end

ctf

,k,P,H). In

Algorithm 4.2, the method periodicERC(⃗x,t) is called at Line 8. A potential

4.2. SPECIFIC TYPES OF ERCS 95

P

constraint time frame

0

k

t start

ctf

t end

ctf

T

t

Figure 4.4: An illustration of a periodic ERC perERC(t start

ctf

,k,P,H). The ERC

is activated every P time steps **for** an activation period of always k time steps.

ctf

,t start

extension of a periodic ERC is that the period length and the activation period

refer to different counter units. For example, consider the maintenance of machines.

While maintenance might take hours (i.e. k might be measured in real

time units), machines might need to be maintained after using them a certain

number of times (i.e. P is measured in function evaluations).

4.2.4 Random ERCs

A random ERC models the temporary non-availability of randomly selected resources,

whereby all resources correspond to constraint schemata of pre-determined

fixed order o(H). In other words, if a resource becomes unavailable, then

we activate a new (plain) ERC **for** an activation period of k time steps and associateitwithaconstraint

schema H **for**whichtheo(H)order-definingbitpositions

and their values are selected at random. As the ERC models the non-availability

ofaresource, anevaluablesolution cannot be a member ofH; noticethedifference

to periodic and commitment relaxation ERCs where solutions have to be from H

during activation periods. We allow multiple resources to be non-available at the

same time, meaning an evaluable solution must not be a member of any of these

constraint schemata. Let us denote the set of ERCs that is activated at time step

t by ActERCs t . We remove ERCs with an expired activation period from this set.

So far, we have not specified how a resource can become unavailable and thus

how a new ERC is activated. Obviously, there are many options but since we

want to use this ERC type to model random effects, such as unexpected machine

breakdowns, we activate a new ERC at each time step with an independent

probability of p; we call this probability the activation probability.

We want to avoid situations in which there is no evaluable solution. This

is simply achieved by removing any ERCs from the set ActERCs t that have

a constraint schema that is contradictory to the constraint schema of a newly

activated ERC. Clearly, this can be realized differently. For example, by not

96 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.3 Implementation of a periodic ERC

Require: t start

ctf

length), H (constraint schema)

1: periodicERC(⃗x,t){

(start of ctf), t end

ctf

(end of ctf), k (length of the activation period), P (period

2: if t ∈ ctf ∧ (t−t start

ctf

) mod P < k ∧ ⃗x /∈ H then

3: return false // ⃗x is not evaluable

4: else

5: return true } // ⃗x is evaluable

allowing a contradictory ERC to enter ActERCs t in the first place. Alternatively,

we may allow contradictions and consequently be perhaps unable to evaluate any

solution from time to time. Figure 4.5 shows how the set ActERCs t may vary

over time and the effects of en**for**ced ERC removals using our approach.

A simplified real-world example of a random ERC is:

“In a multi-stage production process, a manufactured product runs through a series

of stages, where, at each stage, the product is processed by a single machine

which progresses it towards its final manufactured state. We wish to optimize the

production facility/process in the following sense. For each stage there is a choice

from among several machines (or machine types) capable of advancing the product

to the next stage; we wish to select the best machine to use at each stage.

Each machine may breakdown at any time step with a probability of 2% in which

case it needs to be repaired; standard repairing takes 5 hours including test runs.

Repairing time may increase if machine failures are more severe than expected

(A in Figure 4.5) but, if necessary, a machine can be used be**for**e the test runs are

finished (B in Figure 4.5).”

In this example we have an activation probability of p = 0.02, an activation

period of k = 5 (assuming a time step is one hour) and a fixed order of any

constraint schema of o(H) = 1 (because a single machine might break down).

It is easy to see that a random ERC with a small p and k is unlikely to harm

the **optimization** much (even if o(H) high) as machines rarely break down and in

case they do, they are quickly repaired. Clearly, the opposite is the case if either

of the parameters is large; in this case, even a low order o(H) can severely harm

the **optimization**.

The corresponding implementation of a random ERC is defined by Algorithm

4.4. The method randomERC(⃗x,t) is defined by the parameters t start

ctf ,t end

ctf ,

4.2. SPECIFIC TYPES OF ERCS 97

constraint time frame

0

H = (1∗)

H = (0∗)

H = (∗1)

H = (∗0)

t start

ctf

B

k

...

A

t end

ctf

T

t

Figure 4.5: A random ERC with an order of o(H) = 1, an activation period of k time

steps and an activation probability of p ≈ 2/k (k ≥ 2) is applied to binary strings

of length l = 2. The 4(=2 o(H)( l

o(H))

) possible constraint schemata are represented on

the left side. Each dashed block represents the activation period of an ERC in the set

ActERCs t . One can see that multiple ERCs can coexist, including ones with identical

constraint schemata (A), and that an ERC is removed from ActERCs t if its activation

period expires or if its constraint schema contradicts the one of a newly activated ERC

(B).

k,o(H), and p. The method takes as input a candidate solution ⃗x that is to

be checked **for** evaluability and the current (global) time step t, and outputs a

boolean value indicating whether ⃗x is evaluable or not. Inside the method, we use

H i ,i = 1,...,|ActERCs t | to refer to the constraint schema of the ith ERC in the

set ActERCs t at time step t. The call to rand(0,1) returns a random, uni**for**mly

distributed real value between 0 and 1.

In future, we will denote random ERCs by randERC(t start

ctf ,t end

ctf ,k,o(H),p).

In Algorithm 4.2, the method randomERC(⃗x,t) is called at Line 8. We want to

point out here that we do not analyze the effect of random ERCs inthis thesis; we

included the definition of the ERC **for** completeness. For an analysis of random

ERCs on **evolutionary** **search** we refer to a technical report (Allmendinger and

Knowles, 2010).

4.2.5 Commitment composite ERCs

The last type of ERCs we introduce here are commitment composite ERCs; this

ERC type is slightly more complex than the other types because it combines

several real-world limitations. A commitment composite ERC occurs when some

variables of a candidate solution define a composite that requires resources to be

locally available (e.g. in a cache) in order **for** the solution as a whole to be realized

and/orevaluated. Wecanconveniently usethenotionofschemata todescribe the

resource-requiring composite part of a solution. For example, assuming a binary

representationofsolutions, wewoulduseH # = {∗∗###∗∗∗∗∗##} tostatethat

98 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.4 Implementation of a random ERC

Require: t start

ctf

(start of ctf), t end

ctf

(end of ctf), k (length of the activation period), o(H) (fixed

order of a newly arising constraint schema), p (activation probability)

1: ActERCs t = ∅ (set of ERCs activated at time step t)

2: randomERC(⃗x,t){

3: if t ∈ ctf then

4: remove any ERCs from ActERCs t whose activation period is expired

5: if rand(0,1)< p then

6: H ′ = createH(o(H)) // create a new ERC of order o(H) at random at random, and

time stamp the ERC with t to keep track of its activation period; add the ERC to

the set ActERCs t , and remove any ERCs from this set whose constraint schema is

contradictory to H

7: **for** i = 1 to |ActERCs t | do

8: if ⃗x ∈ H i then

9: return false // ⃗x is not evaluable

10: return true // ⃗x is evaluable

11: else

12: return true } // ⃗x is evaluable

bit positions 3, 4, 5, 11 and 12 define a composite; we refer to the bit positions

denoted by # as the composite-defining bits, and the order o(H # ) to be the

number of composite-defining bits in the schema (we refer to H # as the high-level

constraint schema). In the problems tackled in this thesis the composite-defining

bits are static, and **for**m a part of the ERC problem definition.

Whenasolutionistobeevaluated,wemustlookatthecomposite-definingbits

ofitsgenotype andcomparethem toalocalcache ofcomposites (inAlgorithm4.5

this cache is represented by the variable SCInfo). Each composite in the local

cache is indexed by a bit-string of the same length as the order of the high-level

constraint schema. If there is a match, the solution can be evaluated; if not, the

solution may not be evaluated at the current time step.

We define the cache to be made up of a number of storage cells, #SC. Typically,

the number of storage cells is smaller than the space of possible composites,

which is 2 o(H #) in a binary **search** space. A composite available in a storage cell

may be used in the evaluation of more than one solution: each composite may be

used up to RN (reuse number) times and has a shelf life of SL time steps, and

we assume SL ≥ RN. Finally, the composites available in the cache at time t are

a function of previous purchase orders made, and a fixed time lag TL between

a purchase being made and it arriving. When composites arrive at a particular

time, they are immediately put in a storage cell (and any existing composite in

that cell is discarded); which storage cell is selected is defined either at the time

of purchase or at the time of arrival (see Line 4 and 5 of Algorithm 4.5).

4.2. SPECIFIC TYPES OF ERCS 99

# Cell

1 2 3 4

001

101

000

SL = 2 SL = 3 SL = 7

RN = 5

111

SL = 1

RN = 4 RN = 1 RN = 6

Order 011 and 110

(c = c+2×c order )

t

Experiment

⃗x = (10101)

f(⃗x)

EA

001

SL = 1

RN = 5

101

SL = 2

RN = 3

000

SL = 6

RN = 1

111

SL = 0

RN = 6

Composites 011

and 110 arrrived

Shelf life is over,

empty cell

Update storage cells

and queue of not arrived

composite orders

store 011 in cell 4

and 110 in cell 1

EA

t+1

(c = c+c time step )

110

SL = 20

RN = 10

101

SL = 2

RN = 3

000

SL = 6

RN = 1

011

SL = 20

RN = 10

...

...

Figure 4.6: A visual example of the commitment composite ERC

commCompERC(H # = {### ∗ ∗},#SC = 4,TL = 1,RN = 10, SL = 20);

each composite order and time step costs c order and c time step units, respectively. The

evaluation step at time step t reduces the reuse number of the composite in cell 2.

At the same time step, the shelf life of the composite in cell 4 expires, and two new

composites are ordered. One time step later, t+1, the ordered composites arrive and

put into cells determined by the EA.

To make the constraint more realistic we associate costs of c order and c time step

units with each submitted composite order and time step, respectively. The available

budget, which cannot be exceeded, is denoted by C; any composite can be

purchased as often as desired, as long as we are within the budget. Figure 4.6

gives a visual example of the ERC.

An example of a commitment composite ERC is:

“In an **optimization** problem involving the selection/design of vehicle parts least

harmful to pedestrians in case of a crash, we wish amongst others to identify the

most suitable configuration **for** the tyres of the vehicle. A tyre is made up of

several parameters, such as size, thickness, and rubber material. Upon defining

these parameters, we order the tyres, which is associated with a fixed cost of 500

100 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.5 Implementation of a commitment composite ERC

Require: t start

ctf

(start of ctf), t end

ctf

(end of ctf), H # (high-level constraint schema), #SC (number

of storage cells), TL (time lag), RN (reuse number), SL (shelf life), c order (cost per

submitted composite order), c time step (cost per time step), C (budget)

1: c = 0 (cost counter), SCInfo = ∅ (storage cell in**for**mation), QInfo = ∅ (composite queue

in**for**mation) // Initialize global state variables

2: commitmentCompositeERC(⃗x,t){

3: if t ∈ ctf then

4: collect composite orders if within budget, and increment c by c order **for** each submitted

order; add all ordered composites to the queue QInfo and time stamp them with t; let

the optimizer choose the storage cell indexes into which the ordered composites should

be stored upon their arrival

5: empty the storage cells in SCInfo that belong to composites whose shelf life is over;

remove any composites from QInfo whose time lag is over, and add them to the storage

by updating SCInfo; if the optimizer did not specify previously the storage cell indexes

into which arriving composites should be put, then let it choose the cell here

6: if c < C then

7: P = ∅ (auxiliary set containing the indexes of storage cells that contain a composite

that matches the composite required by the solution⃗x; this set is only important when

⃗x requires a composite that is available in storage multiple times)

8: **for** i = 1 to |SCInfo| do

9: if ⃗x ∈ H i then

10: P = P ∪i

11: if |P| = 1 then

12: increment the reuse number of the composite stored in the storage cell with the

index represented by the element of P; if the composite is used up, then empty the

corresponding storage cell

13: return true // ⃗x is evaluable

14: if |P| > 1 then

15: let the optimizer decide from which storage cell the composite should be taken, and

subsequentlyincrementthereusenumberoftheselectedcomposite; ifthecomposite

is used up, then empty the corresponding storage cell.

16: return true // ⃗x is evaluable

17: return false // ⃗x is not evaluable (because the composite needed is not available)

18: return false // ⃗x is not evaluable (because we are above the budget C)

19: else

20: return true } // ⃗x is evaluable

Pounds and a delivery period of 3 days. To allow **for** a valid assessment of tyres,

a set of tyres can be involved in at most 5 crash test trials, and can be kept in

storage **for** not more than 1 month. The storage itself is limited in size to 10

sets of tyres. Every day of crash testing involves a fixed charge of 3000 Pounds

including things like labor, rent of venue, and electricity.”

In this example a composite is a tyre and the composite-defining bits are the

variables defining a tyre. Ordering tyres is associated with a time lag of TL = 3

(assuming a time step is one day), and tyres have a reuse number of RN = 5 and

4.3. THEORETICAL ANALYSIS 101

a shelf life of SL = 30 (assuming one month consists of 30 days). The number of

storage cells is #SC = 10, and the costs associated with a composite order and

time step are c order = 500 Pounds and c time step = 3000 Pounds, respectively.

The corresponding implementation of a commitment composite ERC is defined

by Algorithm 4.5. The method commitmentCompositeERC(⃗x,t) is defined

by the parameters t start

ctf , t end

ctf ,H #,#SC,TL,RN,SL,c order ,c time step , and C.

It takes as input a candidate solution ⃗x that is to be checked **for** evaluability

and the current (global) time step t, and outputs a boolean value indicating

whether ⃗x is evaluable or not. The variables representing the cost counter c, storage

cell in**for**mation SCInfo (contents in **for**m of schemata, remaining shelf lives

and remaining reuse number), and the queue QInfo of submitted but not arrived

composites, are global state variables and visible to the optimizer.

In addition to repairing, three reactive tasks need to be defined by the optimizer

to deal with this ERC type: (i) ordering of composites (Line 4), (ii) assigning

storage cell indexes to arriving composites (Line 4 or 5), and (iii) deciding

which composite to use if multiple identical ones are available (Line 15). With

this ERC, repairing involves modifying a non-evaluable solution such that it falls

into the constraint schema of a composite that is currently in storage.

In future, we will denote a commitment composite ERC of this **for**m by

commCompERC(H # ,#SC,TL,RN,SL). 6 InAlgorithm4.2,themethodcommitmentCompositeERC(⃗x,t)

is called at Line 8.

4.3 Theoretical analysis

Having defined ERCOPs and several ERCs, we are now ready to analyze their effect

on **evolutionary** **search**. We begin with an initial theoretical investigation

on the impact of periodic ERCs on two selection and reproduction schemes

commonly-used within EAs; the next chapter will continue with an empirical

analysis. The theoretical analysis uses the concept of Markov chains, which we

first mentioned in Section 3.1.1. The next section gives a more detailed introduction

to Markov chains. We then derive the Markov model (transition probabilities)

that account **for** periodic ERCs in Section 4.3.2, and analyze the simulation

results in Section 4.3.3. Finally, Section 4.3.4 summarizes the insights gained

6 We leave out the variables t start

ctf ,t end

ctf ,c order,c time step , and C from commCompERC(...) **for**

ease of presentation. They will be specified where appropriate.

102 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

from the theoretical study.

4.3.1 Markov chains

A Markov process is a random process that has no memory of where it has been

in the past such that only the current state of the process can influence the

next state. If the process can assume only a finite or countable set of states,

then it is usual to refer to it as a Markov chain (Norris, 1998). In other words,

assuming that X t ,t = 0,1,... are random variables with variable X t depending

on X t−1 only, then we can think of a Markov chain as a sequence X 0 ,X 1 ,X 2 ,...

of random events occurring in time (Reeves and Rowe, 2003). Suppose S 0 ,...,S µ

are the µ+1 possible values that each of the variables X t can take. Then, a chain

moves from a state S a at time t, to a state S a ′ at time t+1 with a probability of

p a,a ′ = P(X t+1 = S a ′|X t = S a ). The probabilities p a,a ′ (a,a ′ = 0,...,µ) are called

transition probabilities and **for**m the µ+1×µ+1 matrix P, the transition matrix.

Thus, the probability that the chain is in state S a at time t is the ath entry in

the probability vector

⃗u t = ⃗u 0 P t , (4.3)

where ⃗u 0 is the (µ+1)-dimensional probability (column) vector that represents

the initial distribution over the set of states.

WhenanEAismodeledbyaMarkovchainitiseasytoseethatthepopulation

is the natural choice **for** describing a state. The transition probabilities then

express thelikelihoodsthatanEAchangesfromacurrent populationtoanyother

possible population after applying the stochastic effects of selection, crossover

and/or mutation. It is also possible to consider other effects such as noisy fitness

functions (Nakama, 2008), niching (Horn, 1993) and elitism (He and Yao, 2002).

Once the transition matrix is calculated it can be used to calculate a variety of

measurements, suchasthefirsthittingtimeofaparticularstateortheprobability

of hitting a state at all. An overview of tools of Markov chain analysis can be

found inany general textbook onstochastic processes, such as thebooks of Norris

(1998); Doob (1953).

The drawback of modeling EAs with Markov chains is that the size of the

required transition matrix grows exponentially in both the population size and

string length. To keep Markov chain models manageable it is there**for**e common

to use small population sizes and string lengths (Goldberg and Segrest, 1987;

4.3. THEORETICAL ANALYSIS 103

Horn, 1993). Other options, which allow the modeling of more realistic EAs, are

to make simplifying assumptions about the state space (Mahfoud, 1991) or to

use matrix notation only (Vose and Liepins, 1991; Nix and Vose, 1992; Davis and

Principe, 1993).

4.3.2 Modeling ERCs with Markov models

In this section we derive the transition probabilities **for** EAs optimizing in the

presence of periodic ERCs. Our Markov chain model is based on the model

of Goldberg and Segrest (1987), which considers a simple environment composed

of two individual types: Type A has always a fixed objective value (or fitness) of

f(A), while type B has a fitness of f(B). This limitation allows **for** an intuitive

definition of states. Forafixed populationsize ofµ, there areµ+1possible states,

where state S a represents a population with a type A individuals and µ−a type

B individuals. Furthermore, in this simple EA model we do not apply mutation

and crossover such that an offspring shall be simply a copy of the selected parent.

Goldberg and Segrest (1987) used this model to investigate the effect of drift

**for** a simple EA that used a generational reproduction scheme combined with

fitness proportionate selection. They also extended the model to include mutation.

Horn (1993) extended it further to include niching. We extend it to include

periodic ERCs and use the resulting model to analyze the impact of the ERC on

two selection strategies, fitness proportionate and binary tournament selection,

and two reproduction schemes, generational and steady state reproduction, both

without elitism.

4.3.2.1 Selection probabilities

Under fitness proportionate selection (FPS)wechooseanindividual ofthecurrent

population to serve as a parent (inour environment, to be inthe next population)

with a probability that is proportional to its (relative) fitness. In our simple environment,

the probability of choosing a type A individual **for** the next population

while being in a state S a is simply

P a (A) =

af(A)

af(A)+(µ−a)f(B) . (4.4)

104 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

As there are only two individuals types in total, the probability of choosing a

type B individual is P a (B) = 1−P a (A). From the above equation it is apparent

that once a uni**for**m population is reached, i.e. a = 0 or µ, there is no chance of

selecting individuals from the other type. Thus, the two corresponding states S 0

and S µ are absorbing states.

Under tournament selection we first randomly select a number of individuals

from the population (with replacement) and then per**for**m a tournament among

them with the fittest one serving subsequently as a parent. It is common to use

a tournament size of two, which will also be used here; this selection strategy is

known as binary tournament selection (BTS). The result of a tournament is clear:

the individual with the higher fitness wins the tournament; there is a draw if an

individualmeetsanotherindividual withthesamefitnessinwhichcasethewinner

is randomly determined; and an individual will be the winner of a tournament

with itself. We distinguish two cases regarding the fitness of the individual types:

(i) f(A) = f(B) and (ii) f(A) > f(B). For case (i) the probability of selecting a

type A individual is

( ) 2 a

P a (A) = + a(µ−a) , (4.5)

µ µ 2

where the first term is the probability of selecting two type A individuals, and

the second term the probability of selecting one type A and B individual (which

is 2 a(µ−a) ) accounted **for** by the fact that the type A individual will win the

µ 2

tournament on average in half of the cases only (i.e. 2 a(µ−a) /2 = a(µ−a) ). If

µ 2 µ 2

f(A) > f(B), then a type A individual will win all tournaments it is involved in,

resulting in following selection probability:

4.3.2.2 Transition probabilities

( ) 2 a

P a (A) = +2 a(µ−a) . (4.6)

µ µ 2

In our environment, the transition probabilities depend on the selected reproduction

scheme, which in turn depends on the selected selection strategy. We

first consider a generational reproduction scheme as already used in the original

genetic algorithm of Holland (1975); we denote this scheme by GGA. With

GGA, the entire current population is replaced by the offspring population. That

is, µ selection steps are carried out per time step (with replacement). Using the

selection probability P a (A) either **for** FPS or BTS, the transition probabilities

4.3. THEORETICAL ANALYSIS 105

p a,a ′ = P(X t+1 = S a ′|X t = S a ) **for** GGA of moving at time t from a state S a with

a type A individuals, to a state S a ′ with a ′ type A individuals at time t+1, are

defined as follows:

For a = 0

p a,a = 1 (4.7)

p a,a ′ = 0, a ′ = 1,...,µ.

For 0 < a < µ and 0 ≤ a ′ ≤ µ

p a,a ′ =

( µ

a ′ )

P a (A) a′ (1−P ′ a(A)) µ−a′ .

For a = µ

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

With steady state reproduction, the population is updated after each selection

step. Usually, an offspring individual replaces the worst individual in the population.

This replacement strategy, however, is elitist and ensures that the number

of the less fit individual type in the population does not increase. Thus, to allow

**for** a fair comparison with GGA, an offspring does not replace the worst individual

in the population but a randomly chosen one regardless of its fitness; we

denote this reproduction scheme by SSGA (rri), where ‘rri’ refers to replacing a

random individual. It has been shown elsewhere (Syswerda, 1991) that GGA and

SSGA (rri) yield similar per**for**mance. Bearing in mind that one time step corresponds

to one selection step with SSGA (rri), we obtain the following transition

probabilities:

For a = 0

p a,a = 1 (4.8)

p a,a ′ = 0, a ′ = 1,...,µ.

106 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

For 0 < a < µ

p a,a ′

= 0, a ′ = 0,...,a−2

p a,a−1 = (1−P a (A)) a µ

p a,a = P a (A) a µ +(1−P a(A)) µ−a

µ

p a,a+1 = P a (A) (µ−a)

µ

p a,a ′

= 0, a ′ = a+2,...,µ.

For a = µ

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

The transition probabilities of either GGA or SSGA (rri) will be the entries

of the transition matrix P.

4.3.2.3 Constrained transition probabilities **for** a periodic ERC

We have mentioned in the previous section that GGA per**for**ms µ selection steps

per time step, while SSGA (rri) per**for**ms one selection step per time step. To be

able to compare the effect of an ERC on the two reproduction schemes, we thus

express ERCs in this section in terms of selection steps rather than time steps.

Let us now derive the transition probabilities in the presence of a periodic

ERC. For this, consider the general periodic ERC, perERC(iµ,(i+1)µ,k,µ,H =

(A)) (i ∈ N,0 ≤ k ≤ µ), which is activated at selection step iµ **for** a period of µ

selection steps, i.e. one time step (or generation) **for** GGA and µ time steps **for**

SSGA (rri). During the activation period of k ≤ µ selection steps, we can only

select (and evaluate) type A individuals. Let us assume that if we select a type

B individual during this period, this individual is repaired by simply **for**cing it

into the right schema; i.e. it is converted into a type A individual. This repairing

procedure is a simple constraint-handling strategy **for** dealing with non-evaluable

solutions; alternative constraint-handling strategies will beintroduced in thenext

chapter. Be**for**e we derive the constrained transition probabilities **for** GGA we

want to point out a few aspects:

4.3. THEORETICAL ANALYSIS 107

• If we are in state S 0 and the ERC is activated, then S 0 is not an absorbing

state anymore and we move directly to state S k .

• As a population contains at least k type A individuals after lifting the

constraint, we are not able to move to a state S a ′ with a ′ < k during the

constrained generation (time step).

• The ERCreduces thenumber offreely selected offspring downto µ ′ = µ−k.

• Moving to astate S a ′ witha ′ > k isalready achieved by selecting a ′′ = a ′ −k

(instead of a ′ ) type A individuals from the current population.

Considering these points, we derive **for** the time step **for** which the ERC is activated

the following constrained transition probabilities **for** GGA:

For a = 0

p a,a ′ = 0, a ′ = 0,...,k −1,k +1,...,µ (4.9)

p a,k = 1.

For 0 < a < µ and 0 ≤ a < k

p a,a ′ = 0

For 0 < a < µ and k ≤ a ′ ≤ µ

For a = µ

p a,a ′ =

( µ

′

a ′′ )

P a (A) a′′ (1−P a (A)) µ′ −a ′′ .

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

The above periodic ERC is set such that the activation period of k selection

steps is upper bounded by the population size µ, and, in the case of GGA,

starts and ends within a single time step (generation). This does not need to

be necessarily the case. In fact, a periodic ERC can feature an activation period

k that is so long that it constrains selection steps within two or more successive

generations, or so short that several activation periods may start during a

108 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

single generation. In such scenarios, one needs to constrain all generations that

are subject to constrained selection steps. The number of constrained selection

steps within a generation, referred to as k in Equation (4.9), is then simply the

sum of all selection steps that happen to be constrained during any particular

generation. That is, depending on the ERC, the number of constrained selection

steps may change between generations.

With SSGA (rri), the population is updated after each selection step, which

remember is a single time step with this scheme. This means that we need to

determine **for** each selection step (time step) separately whether it lies within the

activation period and thus is constrained or not. During the activation period,

the periodic ERC of above prevents us from moving from a current state S a

to a state S a−1 , which can only be reached if a type B individual replaces a

type A individual. As above, if the constraint is active, then the state S 0 is not

an absorbing state anymore, and we move directly to state S 1 . We obtain the

following new transition probabilities **for** each of the k constrained time steps:

For any a = 0

p a,a ′ = 0, a ′ = 0,2,3,...,µ (4.10)

p a,1 = 1.

For any 0 < a < µ

p a,a ′

= 0, a ′ = 0,...,a−1

p a,a = a µ

p a,a+1 = µ−a

µ

p a,a ′

= 0, a ′ = a+2,...,µ.

For any a = µ

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

We will denote the transition matrix with the constrained transition probabilities

by P c .

4.3. THEORETICAL ANALYSIS 109

4.3.2.4 Calculating proportions of individual types in a population

One way to analyze the impact of an ERC on different selection and reproduction

schemes is to monitor the proportion of the two individual types in a population.

To do so one needs to first calculate the probability of ending up in any of the

possible states S i ,i = 0,...,µ after t time steps. In an unconstrained environment,

this can be done according to Equation (4.3) (see Section 4.3.1) using the

transition matrix P derived in Section 4.3.2.2; in this equation, the µ+1 state

probabilities at time t are represented in **for**m of the probability vector ⃗u t . In

a constrained environment we cannot use the transition matrix P across all t

time steps but have to swap it with the constrained transition matrix P c (see

Section 4.3.2.3) **for** time steps that consist of constrained selection steps; this dependence

of the transition matrix on time makes it a non-homogeneous Markov

chain. Let us consider the same periodic ERC as in the previous section but this

time with a constraint time frame spanning over g ∈ N periods (as opposed to

exactly one), i.e. perERC(iµ,(i+g)µ,k,µ,H = (A)). For this periodic ERC we

can calculate the probability vector at any time step t **for** GGA as follows:

⃗u t = ⃗u 0 P t , 0 ≤ t < i,

⃗u t = ⃗u 0 P i P t−i

c ,

i ≤ t < g +i,

⃗u t = ⃗u 0 P i P g cP t−g−i , g +i ≤ t,

where the entries of the transition matrices P and P c are calculated using Equations

(4.7) and (4.9), respectively. The probability vector ⃗u 0 of the initial state

distribution has a value of 1 at the sth entry and a value of 0 in the others, if we

want to start with a population of exactly s type A individuals.

One time step with GGA corresponds to µ time steps with SSGA (rri). To

compute the probability vector ⃗u **for** SSGA (rri) we thus need to look at the state

distributions at time step tµ:

⃗u tµ = ⃗u 0 P tµ , 0 ≤ t < i,

⃗u tµ = ⃗u 0 P iµ (P k c Pµ−k ) (t−i)µ , i ≤ t < g +i

⃗u tµ = ⃗u 0 P iµ (P k cP µ−k ) g P (t−g−i)µ , g +i ≤ t,

where the transition matrices P and P c are calculated according to the Equations

(4.8) and (4.10), respectively.

110 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Having obtained the probabilities of ending up in all the different states,

we can calculate the expected proportions c t (A) and c t (B) of type A and B

individuals in a population at time step t (or tµ in the case of SSGA (rri)) as

follows (note that this calculation does not depend on whether a problem is

subject to ERCs or not):

c t (A) = 1 µ

µ∑

ju j t , c t (B) = 1−c t (A), (4.11)

j=0

where u j t is the jth entry of the probability vector ⃗u t .

4.3.3 Simulation results

This section uses the measure of the expected individual type proportion to analyze

the impact of period ERCs on two selection strategies, FPS and BTS, and

two reproductionschemes, GGAandSSGA(rri). Weconsider first thecasewhere

both individual types have equal fitness values, and then the case where they are

different. If not otherwise stated, the population size is set to µ = 50.

4.3.3.1 Identical fitness values: f(A) = f(B)

Inthiscasethereisnoselection pressure andthusbothselectionstrategies behave

identically. Ideally, an EA maintains an equal proportion of the two individual

types in the population. However, because of genetic drift this is impossible and

an EA eventually converges to a uni**for**m population (S i = 0,n). As the probability

of ending up in one of the two states is proportional to the initial state, the

expected individual type proportion is identical to the initial proportion, which

is specified by ⃗u 0 . Thus, **for** a random initialization, the expected proportion is

0.5.

FromFigure4.7wecanseethatanexpected proportionof0.5isachieved until

selectionstep400atwhichweactivatetheperiodicERC,perERC(400,450,20,50,

H = (A)), which has a unique activation period of k = 20 selection steps. 7 This

ERC **for**ces us to evaluate k = 20 type A individuals and subsequently, reduces

(increases) the proportion of type B (A) individuals in the population. After the

7 Note, in an EA per**for**ming **optimization** of a function, the number of per**for**med selection

steps displayed on the x-axes of Figure 4.7 would be equivalent to the number of per**for**med

function evaluations.

4.3. THEORETICAL ANALYSIS 111

perERC(400,450,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA - constrained, real proportion

GGA - constrained, expected proportion

SSGA (rri) - constrained, real proportion

SSGA (rri) - constrained, expected proportion

0 200 400 600 800 1000 1200 1400

#Selection steps

Figure 4.7: A plot showing the proportion of type B individuals c t (B) **for** GGA

and SSGA (rri) as a function of the number of selection steps **for** the ERC

perERC(400,450,20,50, H = (A)). Both individual types have equal fitness and the

constraint settings used are given above the plot. The terms real and expected refer to

proportions obtained by actually running the EA, respectively, by running the Markov

chain. The EA results are averaged across 500 independent runs.

ERC is lifted at selection step 420, the expected individual type proportion does

not get back to the initial proportion. Although this effect can be put down to

the specifics of the model (no selection pressure towards either individual type),

we will see in the following theoretical and experimental studies, several results

which display a similar pattern. That is, a constraint can have a permanent or

long-lived effect on **search** per**for**mance even if it was active **for** a short time only.

From the figure we can also see that the proportion is affected more severely

**for** GGA than **for** SSGA (rri). The reason that SSGA (rri) is more robust is

that with this reproduction scheme there is a chance that an offspring of type A

replaces another type A individual that is currently in the population. Of course,

if an offspring replaces a solution of the same type, then this will not affect the

proportion. By contrast, with GGA, all offspring are carried over to the population

of the next generation. This causes the proportion of type B individuals in

the population to be a linear function of the activation period. This effect is also

apparent from Figure 4.8, where the per**for**mance of both reproduction schemes

is shown as a function of the activation period k. From the figure one can see

that SSGA (rri) is able to maintain a proportion of around c t (B) = 0.2 after an

activation period of k = 50, which is equal to the population size. On the other

hand, GGAcannot maintain a single type B individual in the population because

of its linear dependence on k. Note, in the case where k > 50, the constraint is

activated **for** more than one time step when using GGA. For example, **for** k = 70

112 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Proportion of type B individuals c t (B)

0.5

0.4

0.3

0.2

0.1

perERC(50,200,k,150,H=(A))

GGA

SSGA (rri)

0

0 25 50 75 100 125 150

Activation period k

Figure 4.8: A plot showing the proportion of type B individuals c t (B) **for** GGA and

SSGA (rri) at selection step 200 as a function of the activation period k **for** the ERC

perERC(50,200,k,150, H = (A)). Both individual types have equal fitness.

the constraint restricts all 50 selection steps within one time step and 20 selection

steps within the subsequent one.

AstheMarkovchainresultsareexactwewillomittheexperimentallyobtained

proportions in the following plots.

4.3.3.2 Different fitness values: f(A) ≠ f(B)

When both individual types have different fitness values, the aim of an EA is to

converge as quickly as possible to a population state consisting only of the fitter

individual type. We focus our investigations mainly on the more interesting case

where an ERC has a negative effect on the convergence behavior. Hence, the

fitness of the individual type that we have to select during the activation period,

in our case type A, needs to be lower than the fitness of type B individuals. If

not otherwise stated, the fitness values are set to f(A) = 1.0 and f(B) = 1.3,

meaning there is little selection pressure towards selecting the fitter individual

type (which slows down convergence speed); later we will investigate the impact

of the fitness ratio between type A and B individuals in more detail.

As the basis **for** our analysis we use the periodic ERC perERC(50,400,20,50,

H = (A)). This ERC is activated after the initialization (i.e. at selection or

evaluation step 50) **for** seven periods, each consisting of P = 50 selection steps

whereby k = 20 of them are constrained. Figure 4.9 shows the impact of the periodic

ERC on the expected proportion c t (B) **for** all combinations of the selection

and reproduction schemes: GGA with FPS and SSGA (rri) with FPS (top plot),

4.3. THEORETICAL ANALYSIS 113

perERC(50,400,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with FPS, unconstrained

SSGA (rri) with FPS, unconstrained

GGA with FPS

SSGA (rri) with FPS

0 200 400 600 800 1000 1200 1400

#Selection steps

perERC(50,400,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with BTS, unconstrained

SSGA (rri) with BTS, unconstrained

GGA with BTS

SSGA (rri) with BTS

0 200 400 600 800 1000 1200 1400

#Selection steps

Figure 4.9: Plots showing the proportion of type B individuals c t (B) **for** FPS (top)

and BTS (bottom) as a function of the number of selection steps **for** the ERC

perERC(50,400,20,50, H = (A)). The term unconstrained refers to the proportions

obtained in an ERC-free environment.

and GGA with BTS and SSGA (rri) with BTS (bottom plot). 8

We want to point out that during activation periods, SSGA (rri) with BTS

and FPS per**for**m identically, since independently of selection type, anAoffspring

will replace an individual selected at random. But during the inactive periods,

the stronger selection pressure of BTS recovers more of the B-to-A replacements,

so that overall BTS maintains a higher proportion of Bs. This behavior can be

seen in the zigzag shape, where there is the same steep fall off of fitness in both

methods, but a steeper recovery **for** BTS. Overall, the same is true **for** GGA,

(BTS is better **for** the same reason) but it is not possible to see this so clearly in

the plots.

Figures 4.10 to 4.12 indicate how the proportion of type B individuals is

8 We get the zigzag-shaped line **for** SSGA (rri) during the constraint time frame because

c t (B) is plotted after each time step containing here of one selection step. For GGA the change

in c t (B) is smooth because a time step consist of µ selection steps.

114 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

start start

perERC(t ctf ,tctf +350,20,50,H=(A))

perERC(50,400,k,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with BTS

0.2

GGA with FPS

SSGA (rri) with BTS

0.1

SSGA (rri) with FPS 0

0 200 400 600 800 1000

start

Start of constraint time frame t ctf

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

GGA with BTS

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0 10 20 30 40 50

Activation period k

Figure 4.10: Plots showing the proportion of type B individuals c t (B) at selection

step 1500 as a function of the start of the constraint time frame t start

ctf

(left) and the

activation period k (right) **for** the ERCs perERC(t start

ctf

,t start

ctf

+ 350,20,50, H = (A))

and perERC(50,400,k,50, H = (A)), respectively.

affected when altering the constraint parameters. We can observe that:

• Longer activation periods degrade the per**for**mance of all EAs (see right

plot of Figure 4.10).

• Fixing theconstraint time frameduration, but translating it (see left plot of

Figure 4.10), yields a non-monotonic effect on per**for**mance (of all EAs, but

most apparently with FPS): more preparation time gives more time to fill

the population with fit individuals, whereas little recovery time detriments

final fitness. These two effects trade off against each other.

• Increasing the duration of the constraint time frame (see Figure 4.11) degrades

per**for**mance.

• Changing the fitness ratio (see Figure 4.12) has only a switching effect on

BTS (when the fitter individual changes), but **for** FPS the ratio smoothly

affects final proportion up to a saturation point.

• Overall, comparing GGA with SSGA we see that SSGA achieves the higher

proportion of fit individuals during the constraint time frame, and it recovers

more rapidly after the constraint is lifted, but its rate of recovery

does not reach the rate achieved by GGA, and ultimately GGA reaches a

higher proportion (see Figure 4.8 and 4.9). This can be explained by the

replacement strategy of SSGA (rri): offspring may replace individuals in

the population that are from the same type. During the activation period,

4.3. THEORETICAL ANALYSIS 115

end

perERC(0,t ctf ,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with BTS

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0 200 400 600 800 1000 1200 1400

end

End of constraint time frame t ctf

Figure 4.11: A plot showing the proportion of type B individuals c t (B) at selection

step 1500 as a function of the end of the constraint time frame t end

ctf

**for** the ERC

perERC(0,t end

ctf

,20,50, H = (A)).

this is beneficial as the number of poor type A individuals in the population

does not increase linearly with the activation period. However, during

the unconstrained selection steps, this may be disruptive in the sense that

fit type B offspring may replace other type B individuals of the current

population, which slows down the convergence.

4.3.4 Summary of theoretical study

We used Markov chains to analyze the impact of periodic ERCs **for** a simple

environment and EA model. The environment was composed of only two individual

types and the EA model applied only a selection operator. In the EA

modelweconsideredtwoselectionstrategies, FPSandBTS,andtworeproduction

schemes, GGA and SSGA (rri). We observed that **for** one and the same reproduction

scheme, BTS is more robust than FPS due to its independence to the

fitness value of the individual types. However, FPS was able to match and even

outper**for**m the per**for**mance of BTS if the ratio of the individual type fitnesses

was high, i.e. if alarger selection pressure than**for** BTSwas obtained. The crucial

difference between the two reproduction schemes we considered is that GGA carries

out many selection steps be**for**e the population is updated, while SSGA (rri),

or steady statereproduction ingeneral, carries out onlyasingle one. This enables

SSGA (rri) during the activation periods to replace less fit individuals with other

less fit individuals of the current population, but also prevents SSGA (rri) on

the long run from a quicker convergence in the remaining periods. By contrast,

the per**for**mance of GGA depends linearly on the activation period but there are

116 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

perERC(50,400,20,50,H=(A))

perERC(50,550,25,50,H=(A))

1

1

Proportion of type B individuals c t (B)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Proportion of type B individuals c t (B)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

GGA with BTS

0.2

GGA with FPS

SSGA (rri) with BTS

0.1

SSGA (rri) with FPS 0

GGA with BTS

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Fitness ratio f(B)/f(A)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Fitness ratio f(B)/f(A)

Figure 4.12: Plots showing the proportion of type B individuals c t (B) at selection step

1500 as a function of the fitness ratio f(A)/f(B) **for** the ERCs perERC(50,400,20,50,

H = (A)) (left) and perERC(50,550,25,50, H = (A)) (right).

now drawbacks if the ERC is not activated. This crucial difference between the

reproduction schemes means that SSGA (rri) is able to outper**for**m GGA during

the activation period and in situations where the advantage over GGA gained

in the activation period(s) can be maintained until the next activation period or

until the end of the **optimization**. In terms of the constraint parameters, this

occurs when there is a long activation period, a short recovery period, and the

constraint time frame is set late.

4.4 Chapter summary

Ephemeral resource-constrained **optimization** problems (ERCOPs) are problems

where feasible solutions can be temporarily non-evaluable due to a lack of resources

required **for** their evaluation. In the first part of this chapter, we **for**mally

defined such problems, and introduced several types of real-world ephemeral resource

constraints (ERCs).

In the second part of the chapter, we have conducted an initial theoretical

analysis on the effect of one of the ERC types, periodic ERCs, on two selection

andtwo reproductionschemes commonly-used withinEAs. Weemployed thetheory

of Markov chains to model a simple **optimization** environment, and yielded

two main conclusions: (i) binary tournament selection tends to be more robust

andallows**for**aquicker convergencetowardsabetterpopulationstatethanfitness

proportionate selection, and (ii) a non-elitist steady state reproduction scheme

tends to be more robust than a generational reproduction scheme during periods

4.4. CHAPTER SUMMARY 117

where the ERC is active, while the opposite is the case during inactive periods

of the ERC. Although a larger theoretical investigation is necessary to make

definite conclusions regarding other ERC types and more complex **optimization**

environments, our findings indicate that the choice of algorithm setup is important

to obtain optimal per**for**mance. Moreover, the analysis indicated that the

optimal algorithm setups depends on the parameter settings of an ERC. 9 While

this chapter has dealt with basic EAs only, the next chapter proposes and empirically

analyzes the per**for**mance of a number of constraint-handling strategies

(augmented on basic EAs) **for** coping with the different ERC types introduced in

this chapter.

9 In (Allmendinger and Knowles, 2010)we have empirically shown that the findings gained in

our theoretical investigation can be valid also **for** more complex **optimization** environments, EA

models, and other ERC types. We do not include this particularempirical analysis in this thesis

but instead focus on the development and empirical analysis of different constraint-handling

strategies.

Chapter 5

Strategies **for** Coping with

Ephemeral Resource Constraints

In the previous chapter we **for**mally defined ephemeral resource-constrained **optimization**

problems (ERCOPs), and introduced several types of ephemeral resource

constraints (ERCs). We also presented a theoretical study of the effect of

one of the ERC types, periodic ERCs, on the choice of simple EA configurations,

and observed that algorithm setup is crucial to obtain optimal per**for**mance. In

this chapter we continue our analysis of ERCOPs using an empirical approach.

Our objective is to develop and empirically analyze the per**for**mance of various

strategies **for** coping with the ERCs introduced in the previous chapter. We first

proposeandinvestigate static constraint-handling strategies **for**dealing with nonevaluable

solutions in Section 5.1. We then investigate whether learning online

and offline when to switch between these static strategies during an **optimization**

process is more efficient than using the static strategies themselves (Section 5.2).

Finally, we develop online resource-purchasing strategies **for** coping with commitment

composite ERCs in Section 5.3.

5.1 Static constraint-handling strategies

In this section, we introduce and analyze five static constraint-handling strategies

**for** dealing with non-evaluable solutions in an ERCOP. We first introduce

the strategies in Section 5.1.1. Then we outline the experimental setup as used

in the analysis of the strategies in Section 5.1.2, and conduct the analysis itself

in Section 5.1.3. The analysis will consider commitment relaxation ERCs and

118

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 119

X

⃗x 1

⃗x 2

⃗x 3

⃗x t−1

⃗x t

⃗x t,repaired

⃗x optimal

E t

⃗x local optimal

Figure 5.1: An illustration of how the targeted **search** direction of an optimizer (indicated

by the arrow starting at ⃗x t ), which might be towards an optimal solution ⃗x optimal ,

may change due to ERCs (which define the evaluable **search** space E t ). The reason is

that repairing a solution causes a kind of ‘drift’ effect that alters the **search** direction,

which, depending on the fitness landscape, may be towards a local optimum; the

solutions ⃗x 1 to ⃗x t−1 indicate the **search** history.

periodic ERCs (both ERC types were introduced in Section 4.2.2 and 4.2.3, respectively)

but the strategies are applicable (in similar **for**m) also to other ERC

types. In Section 5.1.4 we present a case study that illustrates one way in which

one might select a suitable strategy **for** an ERCOP with largely unknown **search**

space properties, which is a common situation in the real world. In Section 5.1.5

we draw together the findings from the experimental analyses.

5.1.1 Specific static constraint-handling strategies

This section introduces five constraint-handling strategies **for** dealing with nonevaluable

solutions in an ERCOP. Three of the strategies (**for**cing, regenerating,

and the subpopulation strategy) actually apply repairing (i.e. modify the genotype

of a solution) and two (waiting and penalizing) avoid it in some way in order

to prevent drift-like effects in the **search** direction; see Figure 5.1 **for** a graphical

illustration of how repairing can trigger a drift in the **search** direction.

In the description of the strategies we assume that multiple ERCs of a particular

ERC type (commitment relaxation or periodic ERC) with non-overlapping

(or non-contradictory) constraint schemata H i ,i = 1,...,r, may be activated at a

given time step. That is, there is always an evaluable solution, or ∃⃗x ∈ ⋃ i H i.

We also assume that we know which resources are available and thus that the

schemata H i are known to the optimizer.

1. Forcing: This strategy simply **for**ces a non-evaluable solution ⃗x into the

constraint schemata H i of all activated ERCs. In other words, all bits that do

120 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

not match the order-defining bit values of the schemata H i of all activated ERCs

are flipped, and the solution so obtained is returned **for** evaluation. We employed

this strategy in the theoretical analysis presented in Section 4.3 (to **for**ce type B

individuals into type A individuals during activation periods of a periodic ERC).

Repairing strategies of this kind have been used previously, particularly, in the

**optimization** of constrained combinatorial problems (see Section 3.3).

A potential drawback of this strategy is that en**for**cing a change in the values

of some solution bits may destroy a potentially good combination of bit values

between the changed and the remaining bits present in the original solution.

Later, we will investigate this aspect more closely.

2. Regenerating: The aim of this strategy, which is similar to the death penalty

approach (see Section 3.3) often used in the evolution strategies community, is

to overcome the potential drawback of **for**cing. In fact, as the name of the strategysuggests,

uponencountering anon-evaluablesolution, regenerating iteratively

generates new solutions from the empirical distribution of the current offspring

population (i.e. it generates new offspring from the current parent set) until it

generates one that is evaluable, i.e. falls into the schemata H i of all activated

ERCs, or until L trials have passed without success. In the latter case, we select

the solution, generated within the L trials, that is closest to the schemata H i of

all activated ERCs and apply **for**cing to it. Here, closest refers to the solution

with the smallest sum of Hamming distances to the schemata H i of all activated

ERCs; 1 ties between several equally-closest solutions are broken randomly. Thus

the method always returns an evaluable solution (except in the ‘deadlock’ situation

where multiple ERCs with overlapping H i are activated simultaneously, in

which case no solution is evaluable).

A potential drawback of this strategy is that **for** large L it can be computationally

and/or resource expensive if checking of constraints is costly, while **for**

small L, it could be that it reduces often to the **for**cing strategy.

3. Subpopulation strategy: Let us assume there is only one ERC, i.e. r = 1.

In this case, alongside the actual population, we maintain also a subpopulation

SP of maximum size J that contains the fittest solutions from H 1 evaluated so

far. A non-evaluable solution is then dealt with by generating a new solution

based on this subpopulation. If the maximum population size of SP, J, is not

1 Notice that the Hamming distance between a solution⃗x and a schemaH is calculated based

on the order-defining bits of H only (as the non-order-defining bits can be either 0 or 1-bits).

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 121

reached, then a new solution from H 1 is generated at random, otherwise we apply

one selection and variation step using the same algorithm as the one we augment

the constraint-handling strategies on; if the new solution is non-evaluable, which

may happen due to mutation, we apply **for**cing to it. To update the subpopulation

upon evaluating a solution from H 1 we use a steady state or (J + 1)-ES

reproduction scheme. We use this reproduction scheme because, depending on

the ERC, the number of evaluated solutions from H 1 might be small, in which

case a generational reproduction scheme is likely to result in a slow convergence.

Generally, a drawback of the subpopulation strategy is that if we have more

than one ERC, i.e. r > 1, then the number of subpopulations we need is upperbounded

by 2 r , the power set of the total number of ERCs. With multiple ERCs,

we generate a solution using the subpopulation that is defined by the (set of)

schemata H i of all activated ERCs.

4. Waiting: This strategy does not repair but it waits with the evaluation of

a non-evaluable solution and the generation of new solutions until the activation

periods of all ERCs that are violated by the solution have passed; i.e. the

**optimization** freezes. The freezing period is bridged by submitting as many null

solutionsasrequireduntil thesolutionbecomesevaluable. 2 Theeffect ofthiswaiting

strategy is identical to the way Schwefel (1968) handled unavailable conical

rings in his flashing nozzle design problem (see preface of Section 1).

The advantage and disadvantage of waiting is obvious: it should prevent driftlike

effects in the **search** direction caused by ERCs, but the waiting might result

in a smaller number of solutions being evaluated (this depends upon whether

time is a limiting factor).

5. Penalizing: Like waiting, this strategy does not repair. However, instead

of freezing the **optimization**, a non-evaluable solution is penalized by assigning a

poor objective value c to it. The effect is that evaluated solutions coexist with

non-evaluated ones in the same population. However, due to selection pressure

in parental and environmental selection, non-evaluated solutions are likely to be

discarded as time goes by. As we will use the strategy within an elitist EA, and

2 We want to point out that in the implementation of this strategy we use here we do not

submit null solutions to bridge the freezing period. Instead, we bridge the freezing period by

setting the (global) time counter t directly to the end of the longest activation period of all

currently violated ERCs. The effect of this is identical to submitting null solutions, but this

approachallows**for**acleanerpresentationof the EAand the function wrapperin Algorithm5.1;

in this algorithm, the bridging of the freezing period is per**for**med in Line 25.

122 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

because we use a c that is the minimal fitness in the **search** space, a non-evaluated

solution will never be inserted in a population (that is filled with previously

evaluated solutions) in the first place.

This kind of penalizing strategy has been used by Schwefel, and Reynolds

and Corne to cope with execution errors occurring in simulations (we mentioned

this application in the preface of Section 1). But the approach is also generally

popular in the genetic algorithm (GA) community; here, this penalizing approach

can be seen as a static penalty function method (see Section 3.3).

The advantage of penalizing over waiting is that the **optimization** does not

freeze upon encountering a non-evaluable solution; i.e. the solution generation

processcontinuesandthussolutionsmightactuallybeevaluated(withoutneeding

to penalize them) during anactivationperiod. However, since solutions evaluated

during an activation period will have to fall into the schemata H i of all currently

activated ERCs, penalizing might be subject to drift-like effects, thus potentially

losing the advantage of waiting.

Figure 5.2 visualizes how the strategies **for**cing, regenerating, and the subpopulation

strategy may repair a non-evaluable solution.

5.1.2 Experimental setup

This section describes the test functions f, the EA on which we augment the

different constraint-handling strategies, and the parameter settings as used in the

subsequent experimental analysis, which investigates the impact of commitment

relaxation and periodic ERCs.

5.1.2.1 Search algorithm

We augment the different static constraint-handling strategies on a standard EA.

The EA uses a standard (µ + λ)-ES reproduction scheme, an elitist approach,

which we believe would be generally applicable in this domain. The algorithm

uses binary tournament selection (with replacement) **for** parental selection, which

has shown to be a robust operator in the theoretical study (see Section 4.3.3).

The algorithmalso uses uni**for**mcrossover (Syswerda, 1989)andbit flipmutation.

Algorithm 5.1 shows the pseudocode of the EA including the function wrapper

as employed with the different static constraint-handling strategies. Additional

per**for**mance-enhancing mechanisms commonly used in EAs, such as diversity

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 123

Offspring

Regenerating

Forcing

X

⃗x t

⃗x t,repaired

Member of Pop

⃗x t,repaired

⃗x t,repaired

E t

Member of SP

Member of both

Pop and SP

Offspring individual and the

potential repaired versions of it

Subpopulation strategy

Figure 5.2: A depiction of the current population Pop (filled circles and squares) and

an offspring individual ⃗x t , which is feasible but not evaluable (because it is in X but

not in E t ). Solutions indicated by the filled squares coexist in both the actual EA

population Pop and the population SP maintained by the subpopulation strategy. The

three solutions ⃗x t,repaired indicate repaired solutions that might have resulted after applying

one of the three ‘repairing’ strategies to ⃗x t : while **for**cing simply flips incorrectly

set bits of ⃗x t and thus creates a repaired solution that is as close as possible to ⃗x t but

not necessarily fit, regenerating creates a new solution in E t using the genetic material

available in Pop. Similarly, the subpopulation strategy creates also a new solution

but it uses the genetic material available in the subpopulation SP (empty and filled

squares), which contains only solutions from E t .

preservation techniques or adaptive parameter control (see Section 3.1.2), may

affect the results, but are not considered in this particular study. Similarly, since

we want to investigate the per**for**mance of a standard EA, the algorithm does not

check whether a currently non-evaluable solution has been evaluated previously,

i.e. identical solutions may be evaluated multiple times.

5.1.2.2 Test functions f

Sinceouraiminthisstudyistounderstandtheeffect ofERCsonEAper**for**mance

on real **closed**-**loop** problems (ultimately), it might be considered ideal to use,

**for** testing, some set of real-world ERCOPs, that is: real experimental problems

featuring real resource constraints. That way we could see theeffects ofEAdesign

choices (the constraint-handling strategy used) directly on a real-world problem

of interest. But even granting this to be an ideal approach, it would be very

difficult to achieve in practice due to the inherent cost of conducting **closed****loop**

experiments and the difficulty of repeating them to obtain any statistical

124 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Algorithm 5.1 Generational EA with static constraint-handling strategies

Require: ERC 1 ,...,ERC r (set of ERCs), f (objective function), T (time limit), µ (parent populationsize),λ(offspringpopulationsize),

#Strategy(numberofselectedconstraint-handling

strategy; see Section 5.1.1)

1: t = 0 (global variable representing the current time step), Pop = ∅ (current population),

OffPop = ∅ (offspring population)

2: while |Pop| < µ ∧ t < T do

3: generate solution ⃗x at random

4: ⃗x = functionWrapper(⃗x,t)

5: Pop = Pop∪{⃗x}; t++

6: while t < T do

7: OffPop = ∅

8: repeat

9: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

10: OffPop = OffPop∪{⃗x (1) }∪{⃗x (2) }

11: until |OffPop| = λ

12: **for** i = 0 to |OffPop| do

13: if t < T then

14: ⃗x i = functionWrapper(⃗x i ,t) // ⃗x i represents the ith solution of OffPop

15: t++

16: **for**m new Pop by selecting the best µ solutions from the union population Pop ∪ OffPop

17: functionWrapper(⃗x,t){

18: y t = null

19: if ⃗x satisfies the ERCs ERC 1 ,...,ERC r then

20: ⃗x t = ⃗x; y t = f(⃗x t )

21: else

22: if #Strategy = 1∨#Strategy = 2∨#Strategy = 3 then

23: ⃗x t = repair(⃗x,t); y t = f(⃗x t )

24: if #Strategy = 4 then

25: t = t+δ; ⃗x t = ⃗x // δ is the number of time steps we have to wait until the activation

periods of all ERCs that are currently violated by ⃗x have passed

26: if t < T then

27: y t = f(⃗x t )

28: if #Strategy = 5 then

29: ⃗x t = ⃗x; y t = c // c is a constant, representing poor fitness

30: return ⃗x t and y t }

confidence inresultsseen. Forthisreason, mostofourstudywill usemorefamiliar

artificial test problems augmented with ERCs (although in the case study, in

Section 5.1.4, we do use data and constraints from a real **closed**-**loop** problem.)

The use of test functions to understand aspects of EA per**for**mance has long

been a staple of EA empirical studies. It is a good approach generally, and appropriate

also here, because problem features can be controlled and experiments

can be repeated many times, allowing a thorough analysis of results. Our set of

selected test functions comprises: (i) OneMax, (ii) a competing optima problem,

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 125

TwoMax, and (iii) a problem instance with many local optima, MAX-SAT. In

the subsequent case study we will also see a variant of NK landscapes (NKα

landscapes) being used to tune (offline) our method; we will introduce this test

function here too. The use of these different landscape types should be sufficient

to draw some tentative conclusions about the effects of ERCOPs generally,

depending on results observed.

OneMax: The OneMax function is a bit counting function that is used frequently

in the theoretical analysis of GAs (see e.g. Mühlenbein and Schlierkamp-

Voosen (1993); He and Yao (2002)). For a binary solution vector ⃗x ∈ {0,1} l , it

takes the sum over the bit values of all bit positions

maximize f(⃗x) =

l∑

i=1

x i

and has its optimum f = l **for** the bit string consisting only of 1-bits. Although

the OneMax problem features a unimodal landscape and so is supposed to be relatively

easy to solve **for** anEA, it helps us to gaindetailed insights into things like

how ERCs affect the convergence speed and may cause premature convergence.

As we will see in the experimental study, these insights **for**m the foundation **for**

understanding the impact of ERCs on more complex fitness landscapes.

TwoMax: The TwoMax function is a bimodal function that can be seen as the

multimodal counterpart of OneMax. It has two local optimal solutions: solutions

consisting only of 1-bits and 0-bits. In its original version, the slopes leading

to these two solutions are symmetric. As the OneMax function, this function

has been intensively studied in theoretical works on the analysis of GAs (see

e.g.PelikanandGoldberg(2000);Hoyweghenetal.(2002)).TomaketheTwoMax

function more realistic and interesting, its symmetric property has often been

broken (see e.g. Pelikan and Goldberg (2000); He and Yao (2002); Friedrich et al.

(2008)), turning it into a function with a local and a global optimal solution.

The two common modifications are to change either the steepness of either slope

or to simply increase the fitness value of either optimal solution. We opt **for** the

**for**mer, making the slope leading to the solution consisting only of 1-bits steeper.

Hence, if #1s denotes the number of 1-bits in a solution vector ⃗x and b > 1 the

factor by which the global optimal solution shall be fitter than the local optimal

126 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

bl

Fitness

l

l/2

0 l/2 l

Figure 5.3: A TwoMax function with a local optimal solution at #1s = 0 and a global

optimal solution at #1s = l. The parameter b > 1 specifies the factor by which the

global optimal solution shall be fitter than the local optimal solution.

#1s

solution, then the TwoMax function is defined by

⎧

⎨l−#1s if #1s≤ l

maximize f(⃗x) =

, 2

⎩b#1s

otherwise.

Figure 5.3 illustrates this TwoMax function. Our main reason **for** using an asymmetric

version of the TwoMax function is to understand how much drift, coming

from an ERC, is required to cause an optimizer that uses a specific strategy to

climb up the suboptimal slope. Analyzing this property will help us to understand

better the impact of ERCs on more challenging multimodal test problems,

such as the MAX-SAT problem.

MAX-SAT: Given a boolean expression consisting of a set of clauses, which

in turn are **for**med by binary variables x i ,i = 1,...,l, the maximum satisfiability

(MAX-SAT) problem (Zhang, 2001; Coppersmith et al., 2004) asks **for** the maximum

number of clauses that can be satisfied by any assignment ⃗x. A MAX-SAT

problem is the **optimization** version of the decision problem, satisfiability (SAT),

which only asks whether there is a satisfying assignment at all or not. MAX-SAT

and SAT problems are of great theoretical interest, of course, due to SAT being

the original problem proved NP-Complete by Cook, and still at the heart of

attempts to determine the answer to the P vs NP problem. But it is of high practical

relevance too, as many challenging real-world **optimization** problems can be

efficiently **for**mulated in SAT **for**m (De Jong and Spears, 1989), e.g. hardware

and software verification problems, and routing in FPGAs.

The instance considered is a uni**for**m random 3-SAT problem and can be

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 127

downloaded online. 3 The instance has l = 50 variables and 218 clauses and is

satisfiable. We treat this 3-SAT instance as a MAX-SAT **optimization** problem,

with fitness calculated as the proportion of satisfied clauses.

Note, although there are local **search** algorithms **for** SAT or MAX-SAT problems,

such as greedy SAT (Selman et al., 1992; Dechter, 2003), that can solve

small problem instances easily, **for** optimizers that do not explicitly make use of

the structure of the problem, such as standard EAs, the instances are generally

hard to solve (due to the presence of many local optimal solutions) (Zhang, 2001;

Prügel-Bennett, 2004; Qasem and Prügel-Bennett, 2010).

Apart from the relevance of MAX-SAT problems in theoretical and practical

analysis, another reason **for** choosing this test problem is the convenient property

of a backbone (the backbone of a MAX-SAT instance is the schema into which all

the optimal solutions fall). We also conducted experiments on other challenging

multimodal test problems, such as NKα landscapes (Kauffman, 1989), which we

introduce next.

NKα landscapes: The general idea of the NKα landscapes (Hebbron et al.,

2008) is to extend Kauffman’s original NK model (Kauffman, 1989) to model

epistatic network topologies that are more realistic in mapping the epistatic connectivity

between genes in natural genomes. The NKα model achieves this by

affecting the distribution of influences of genes in the network in terms of their

connectivity, through a preferential attachment scheme. The model uses a parameter

α to control the positive feedback in the preferential attachment so that

larger α result in a more non-uni**for**m distribution of gene connectivity. There are

three tunable parameters involved in the generation of an NKα landscape: the

total number of variables N (in our notation this variable is denoted as l), the

number of variables that interact epistatically at each of the N loci, K, and the

model parameter α that allows us to specify how influential some variables may

be compared to others. Larger values of the parameter K yield more epistatic

(rugged) landscapes, while larger values of α assign more influence (on the fitness)

to a minority of variables; 4 **for** a value of α = 0, the NKα model reduces to

3 http://people.cs.ubc.ca/~hoos/SATLIB/benchm.html; the name of the instance is

“uf50-218/uf50-01.cnf”.

4 We can think of the loci as nodes in a network with the number of edges going into a node

and out of it representing the in-degree and out-degree of a node, respectively. Each node has

the same in-degree of K +1 (including the self-connection) but the out-degree approximates a

power-law (with the parameter α being in the exponent) as a consequence of the preferential

attachment scheme.

128 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Table 5.1: EA parameter settings as used in the study of static constraint-handling

strategies.

Parameter

Setting

Parent population size µ 50

Offspring population size λ 50

Per-bit mutation probability 1/l

Crossover probability 0.7

Table 5.2: Parameter settings of static constraint-handling strategies.

Strategy Parameter Setting

Regenerating

Penalizing

Subpopulation

strategy

Number of regeneration

trials L

Fitness c assigned to

non-evaluable solutions

Maximal size of

subpopulation SP, J

10000

0

30

Kauffman’soriginalNK modelwithneighborsbeingselectedatrandom. TheNK

model has already been used previously to analyze certain aspects of real-world

**closed**-**loop** problems (see e.g. Thompson (1996b, 1997)).

5.1.2.3 Parameter settings

The parameter settings of the EA and the strategies are given in Table 5.1

and 5.2, respectively. The settings of the test functions are outlined in Table 5.3.

We choose these **search** space sizes l as they correspond to typical **search** space

sizes we have seen (e.g. a drug combinations problem with a library of about

30 drugs (Small et al., 2011)). The reason **for** setting the scaling factor b of the

TwoMax function so small is that it makes the problem more challenging due to

the low selection pressure to climb up the optimal slope. Regarding the **optimization**

time T, remember that one time step corresponds to a function evaluation

of a single solution or a single submitted null solution. In an ERC-free **search**,

the **optimization** times as specified in Table 5.3 are usually not sufficient **for** an

optimizer to locate a global optimal solution reliably. The reason we set T this

way is that it gives us the opportunity to assess both positive and negative effects

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 129

Table 5.3: Parameter settings of test functions f as used in the study of static

constraint-handling strategies.

Test function Parameter Setting

OneMax

TwoMax

MAX-SAT

NKα landscapes

Solution parameters l 30

Optimization time T 700

Solution parameters l 30

Scaling factor b 1.1

Optimization time T 700

Solution parameters l 50

Optimization time T 800

Solution parameters N = l 15

Neighbors K {2,6}

Model parameter α {0,2}

Optimization time T 2250

of an ERC on the convergence speed and the solution quality obtained at the end

of an algorithm run. In addition to the parameters above, we analyze the impact

of the preparation and recovery time by considering different settings **for** T than

specified in Table 5.3, but this will be pointed out where applicable.

Any results shown are average results across 500 independent algorithm runs.

To allow **for**paired testing ofthe strategies, we use a different seed **for** therandom

number generator **for** each EA run but the same seeds **for** all strategies.

5.1.3 Experimental results

The per**for**mance of a strategy depends inter alia on the potential impact of an

ERC on the population diversity and the **optimization** direction. To assess the

impact on these two factors one has to consider: (i) what genetic material represented

by a constraint schema H needs to be introduced into a population to

cause a per**for**mance impact, (ii) how much of it, or, rather, how many individuals

of a constraint schema need to be introduced into a population to cause a per**for**mance

impact, (iii) at what stage during a run does it need to be introduced to

yield a per**for**mance impact, and (iv) the effects of the preparation and recovery

durations. We give detailed observations on these effects here, and summarise

the key findings in Section 5.1.5.

130 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

5.1.3.1 Commitment relaxation ERCs

We first analyze the case where a constraint schema H represents poor genetic

material. For OneMax and TwoMax, this means that the order-defining bits of

H are set to 0. For the MAX-SAT instance, we represent a poor bit by flipping

a randomly selected bit from the backbone of our instance 5 ; **for** ease of presentation,

also on this problem, we will write 1-bits to refer to ‘good’ bits, which are

randomly selected unflipped bits from the backbone, and 0-bits to refer to their

complements. 6

Figure 5.4 gives an impression of how the final average best solution fitness is

affected **for** the different strategies on OneMax. The results obtained on TwoMax

and the MAX-SAT instance are very similar and are shown in Appendix A.1. For

ease of presentation we normalize the fitness values of all test functions so that

they lie in the range [0,1]. We make the following observations from the figure.

• Generally, the ERCs affect **search** negatively, and strategy choice is important.

• The subpopulation strategy tends to per**for**m better than **for**cing and regenerating

(which per**for**m similarly) **for** the majority of constraint parameters.

Thereasonisthatthesubpopulationstrategygeneratesfittersolutions

from H and thus allows the EA to converge more quickly to a (suboptimal)

population state containing many (copies of) optimal solutions from H.

The subpopulation per**for**ms poorly **for** constraint settings that can cause

a premature convergence towards **search** regions covered by H (see range

4 ≤ o(H) ≤ 8 in the top left plot).

• With respect to the order of the constraint schema o(H) (top left plot), we

observe that there is a value that has the largest negative effect on the **optimization**;

it is around 4 **for** the ‘repairing’ strategies (**for**cing, regenerating

or the subpopulation strategy), and lower **for** penalizing and waiting.

5 We identified the backboneofourMAX-SAT instancefrom optimal solutionsobtained from

running a generational GA **for** 1000 generations, 500 times independently. The backbone is of

order 44.

6 If not otherwise stated, then the order-defining bits of a constraint schema are chosen at

random **for** each algorithm run; if an order-defining bit is a 1-bit, then the position of this bit

is also chosen at random among the order-defining bits; i.e. a constraint schema denoted by

H = (010∗∗∗∗) might actually be e.g. H = (∗∗1∗0∗0) or H = (∗∗00∗∗1) in an algorithm

run. Nevertheless, all strategies will be optimizing subject to the same constraint schemata.

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 131

Average best solution fitness

1

0.95

0.9

commRelaxERC(0,700,15,H=(0 o(H) ***...))

Forcing

0.85

Regenerating

Waiting

Subpopulation strategy

Penalizing

0.8

0 2 4 6 8 10 12 14

Order of constraint schema H, o(H)

Average best solution fitness

commRelaxERC(0,700,V,H=(00***...))

1

0.98

0.96

0.94

0.92

0.9

0.88

0.86

0.84

0 5 10 15 20

Epoch duration V

1

commRelaxERC(0,700,15,H=(00***...))

start start start

commRelaxERC(t ctf ,tctf +700,15,H=(00***...)), T = tctf +700

1

Average best solution fitness

0.98

0.96

0.94

0.92

Average best solution fitness

0.98

0.96

0.94

0.92

0.9

700 800 900 1000 1100 1200

Optimization time T

0.9

0 100 200 300 400 500

start

Start of constraint time frame t ctf

Figure 5.4: Plots showing the average best solution fitness found and its standard error

on OneMax as a function of the order of the constraint schema o(H) (top left), the

epoch duration V (top right), the **optimization** time T (bottom left), and the start of

the constraint time frame t start

ctf

(bottom right). Note, while the **optimization** time in

the top plots is fixed to T = 700, as specified in Table 5.3, the parameter T varies in

the bottom plots.

The non-monotonic per**for**mance impact on the repairing strategies, and

partially on penalizing, is due to two competing **for**ces: (i) the probability

of activating a constraint, which decreases exponentially with o(H), and

(ii) the probability that a constraint activation causes a shift in the **search**

focus, which is greater **for** low orders. With respect to these two **for**ces, an

order of o(H) ≈ 4 tends to have the worst trade-off. Penalizing per**for**ms

better than the repairing strategies in the range 2 < o(H) < 8 because

the probability of having to penalize solutions increases exponentially with

o(H). We remark that (results not shown) the order **for** which the worst

trade-off is obtained is a function of the string length l and the population

size µ. In general, the worst trade-off shifts to only a slightly higher order

than 4 as l and/or µ increase; the shift is only little because the probability

of activating the ERC decreases exponentially with the order.

132 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Forwaiting, theper**for**manceonlydependsontheprobabilityofactivatinga

constraint, causing the per**for**mance to be poorest at o(H) = 1 and improve

exponentially thereafter.

• Longer epoch durations (larger V) degrade per**for**mance of all methods because

of potentially longer activation periods during epochs (see top right

plot). With penalizing, **for**cing, regenerating, and the subpopulation strategy

a saturation point is reached beyond which further increases in the

epoch duration have no effect. With waiting there is no saturation point

because an increase in V results in longer waiting periods and thus a poorer

per**for**mance. The reason that waiting per**for**ms best **for** small V (see range

0 < V ≤ 14) is that the waiting periods are short in this regime, allowing

an optimizer to converge quickly away from **search** regions covered by H

and prevent future constraint activations.

• When providing some recovery time, all strategies improve in per**for**mance

(see bottom left plot). The recovery speed depends on how much time is

required to introduce first diversity among the previously constrained bits

be**for**e one can generate better solutions.

• With later start times of the constraint time frame, or, equivalently, longer

preparation times, there is a positive effect on the per**for**mance of all strategies

(see bottom right plot). This is because with a commitment relaxation

ERC the later in the **optimization** one is, the less likely it is to enter a poor

schema (a schema not on the **optimization** path) and activate a constraint;

also, due to elitism, repaired solutions are less likely to be inserted into the

population the later a constraint is activated. Thus later constraints are

less disruptive.

Let us now analyze how constraint schemata that represent genetic material

of different qualities affect the per**for**mance obtained with theconstraint-handling

strategies. Figure5.5shows theeffect ofdifferent constraint schemata ontheaverage

best solution fitness (left plots) and the number of times the optimal solution

has been evaluated (right plots) **for** **for**cing (top plots) and waiting (bottomplots)

on OneMax. The plots on the left-hand side indicate the effect of an ERC on the

convergence speed towards an optimal solution.

Although ERCs can have large effects on per**for**mance, one can see from the

plots of Figure 5.5 that the majority of the constraint schemata do not have an

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 133

#1s in H

20

15

10

5

0

Forcing - Average best solution fitness

Picking schemata

at random

0 5 10 15 20

1

0.99

0.98

0.97

0.96

0.95

0.94

0.93

0.92

0.91

0.9

#1s in H

20

15

10

5

0

Forcing - Average number of times the

optimal solution has been evaluated

Picking schemata

at random

0 5 10 15 20

90

80

70

60

50

40

30

20

10

0

Order of constraint schema H, o(H)

Order of constraint schema H, o(H)

#1s in H

20

15

10

5

0

Waiting - Average best solution fitness

Picking schemata

at random

0 5 10 15 20

1

0.98

0.96

0.94

0.92

0.9

0.88

0.86

0.84

0.82

0.8

#1s in H

20

15

10

5

0

Waiting - Average number of times the

optimal solution has been evaluated

Picking schemata

at random

0 5 10 15 20

8

7

6

5

4

3

2

1

0

Order of constraint schema H, o(H)

Order of constraint schema H, o(H)

Figure 5.5: Plots showing the average best solution fitness obtained (left) and the

average number of times the optimal solutions has been evaluated during the **optimization**

(which is an indication of the convergence speed) (right) by **for**cing (top)

and waiting (bottom) on OneMax as a function of the order of the constraint

schema o(H), and the number of order-defining bits in H with value 1 **for** the ERC

commRelaxERC(0,700,15,H). The straight line represents the expected per**for**mance

when picking a schema (i.e. the order-defining bits and their values) with a particular

order at random. The per**for**mance obtained in an unconstrained environment is

represented by the square at o(H) = #1s = 0.

impact on the per**for**mance at all compared to the unconstrained per**for**mance

(which is represented by the square at o(H) = #1s = 0). These are schemata

that are unlikely to cause an activation at all because they either do not lie on

an optimizer’s **search** path (schemata with few 1-bits) or are associated with a

generally low probability of being met by any individual (higher order schemata

134 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Waiting )

20

15

10

5

0

Picking schemata

at random

0 5 10 15 20

Order of constraint schema H, o(H)

10 4

10 2

10 0

10 -2

10 -4

10 -6

10 -8

10 -10

10 -12

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy )

20

10 2

10 1

Picking schemata

15 at random

10 0

10 -1

10

5

0

0 5 10 15 20

Order of constraint schema H, o(H)

10 -2

10 -3

10 -4

10 -5

10 -6

10 -7

Figure 5.6: Plots showing the ratio P(f(x) > f Penalizing )/P(f(x) > f Waiting ) (left) and

P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy ) (right) on OneMax as a function of the

order of the constraint schema o(H), and the number of order-defining bits in H with

value 1 **for** the ERC commRelaxERC(0,700,15,H); here, x is a random variable that

represents the best solution from a set of solutions drawn uni**for**mly at random from

the **search** space and f ∗ the average best solution fitness obtained with strategy ∗. If

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a higher average

best solution fitness than strategy ∗ and a greater advantage of ∗∗ is indicated by a

darker shading in the heat maps; similarly, if P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is

better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

around the straight line). Constraint schemata that represent poor genetic material

(i.e. consist of many 0-bits) have only an impact if their order is low because

anoptimizer is **search**ing in a different direction. Hence, constraint schemata that

have a significant effect on the per**for**mance of both strategies are either of low

order or contain many 1-bits (schemata along and near the diagonal). For these

constraint setting regimes, we observe a similar non-monotonic effect on the per**for**mance

of both strategies as we have seen previously **for** schemata representing

poor genetic material only (indicated here by the row of squares with #1s = 0).

The difference is that the more 1-bits there are in H (i.e. as we go up the rows of

squares), the less apparent becomes this non-monotonic (negative) per**for**mance

effect. From Figure 5.6 it is apparent that the per**for**mance differences between

the different strategies observed previously is also maintained largely (as we go

up the rows of squares).

On the MAX-SAT instance, the range of constraint schemata causing a per**for**mance

impact is smaller (see Figure 5.7). From the plot it is apparent that

while again low-order schemata affect the per**for**mance significantly, higher-order

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 135

#1s in H

Subpop. strategy - Average best solution fitness

20

0.982

0.981

Picking schemata

0.98

15 at random

0.979

10

5

0

0 5 10 15 20

Order of constraint schema H, o(H)

Figure 5.7: A plot showing the average best solution fitness obtained by the subpopulation

strategy on a MAX-SAT problem instance as a function of the order of the

constraint schema o(H), and the number of order-defining bits in H set correctly **for**

the ERC commRelaxERC(0,800,15,H). The straight line represents the expected

per**for**mance when picking a schema (i.e. the order-defining bits and their values) with

a particular order at random.

0.978

0.977

0.976

0.975

0.974

0.973

0.972

schemata that represent near-optimal or optimal genetic material have only a

little or no effect; the reason is that good genetic material is difficult to detect on

this challenging problem, particularly within 800 time steps.

5.1.3.2 Periodic ERCs

With the insights we gained about the strategies when applying them to commitment

relaxation ERCs, we can understand their behavior in the presence of a

second type of ERC, periodic ERCs, more easily.

Figure 5.8 shows how the per**for**mance of the different strategies is affected

by various constraint parameters of a periodic ERC on OneMax when H represents

poor genetic material. Again, the results obtained on TwoMax and the

MAX-SAT instance are similar and are shown in Appendix A.2. In comparison

to commitment relaxation ERCs, the main difference we observe from Figure 5.8

is that waiting per**for**ms poorly and is also clearly dominated by penalizing **for**

the majority of constraint settings. This is due to the fact that activation periods

are set deterministically with periodic ERCs. In essence, waiting is likely to

freeze the **optimization** during each activation period because of the low probability

of generating solutions from H (regardless of the quality of the genetic

material represented by H). With penalizing one is also unlikely to evaluate any

solutions during an activation period. However, the fact that the **optimization** is

136 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Average best solution fitness

1

0.99

0.98

0.97

0.96

0.95

0.94

perERC(0,700,20,50,H=(0 o(H) ***...))

Forcing

Regenerating

Waiting

Subpopulation strategy

Penalizing

Average best solution fitness

1

0.95

0.9

0.85

0.8

0.75

0.7

perERC(0,700,k,50,H=(0000***...))

0.93

0 2 4 6 8 10 12 14

Order of constraint schema H, o(H)

0.65

0 5 10 15 20 25 30 35 40 45

Activation period k

1

perERC(0,700,20,50,H=(0000***...))

start start start

perERC(t ctf ,tctf +700,20,50,H=(0000***...)), T = tctf +700

1

Average best solution fitness

0.99

0.98

0.97

0.96

0.95

0.94

Average best solution fitness

0.99

0.98

0.97

0.96

0.95

0.94

0.93

700 800 900 1000 1100 1200

Optimization time T

0.93

0 100 200 300 400 500

start

Start of constraint time frame t ctf

Figure 5.8: Plots showing the average best solution fitness found and its standard error

on OneMax as a function of the order of the constraint schema o(H) (top left), the

activation period k (top right), the **optimization** time T (bottom left), and the start of

the constraint time frame t start

ctf

(bottom right).

not frozen is beneficial because offspring are generated using a more up-to-date

parent population during unconstrained **optimization** periods.

The fact that activation periods are set deterministically with periodic ERCs

means also that high-order constraint schemata have an impact on the per**for**mance

and this is the case regardless of the genetic material they represent. In

fact, from Figure 5.9 we see that the average best solution fitness obtained with

the subpopulation strategy on OneMax decreases rather smoothly **for** all orders

as the quality of the represented genetic material worsens. Comparing this average

best solution fitness with the fitness obtained by penalizing (left plot of

Figure 5.10), we observe that repairing is particularly beneficial **for** high-order

constraint schemata that represent very good genetic material. Waiting, in turn,

is inferior to penalizing across all the different constraint schemata but in particular

**for** low-order schemata representing very good genetic material (see right

plot of Figure 5.10).

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 137

Subpop. strategy - Average best solution fitness

20

1

#1s in H

15

10

5

0

Picking schemata

at random

0 5 10 15 20

Order of constraint schema H, o(H)

0.99

0.98

0.97

0.96

0.95

0.94

Figure 5.9: A plot showing the average best solution fitness obtained by the subpopulation

strategy on OneMax as a function of the order of the constraint schema

o(H), and the number of order-defining bits in H with value 1 **for** the ERC

perERC(0,700,20,50,H). Thestraight line represents the expected per**for**mancewhen

picking a schema (i.e. the order-defining bits and their values) with a particular order

at random.

On the MAX-SAT instance, results not shown here, one makes similar observations

as on OneMax. However, because of the difficulty of finding good genetic

material, even when setting a small number of bits correctly, a smooth decrease

in the average best solution fitness, and differences between the per**for**mance of

strategies, aremoreobvious**for**schemata ofhigherorders. ComparedtoOneMax,

there are also small differences apparent in the results obtained on TwoMax; we

show the results and discuss these differences in Appendix A.2.

5.1.4 Case study

In this section we demonstrate one way in which an appropriate constrainthandling

strategy **for** an ERCOP may be selected in a real-world application.

For this we use the same experimental setup as used in the instrument configuration

application of O’Hagan et al. (2005, 2007). We mentioned this application

in our review of **closed**-**loop** problems in Section 2.2, but give a more detailed

description of the problem here.

Application description: The application is concerned with detecting as many

metabolites in a biological sample as possible by optimizing the configuration of

the instrument the samples are run through. Methods currently in use to analyze

138 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy )

20

10 5

15

10

5

Picking schemata

at random

10 4

10 3

10 2

10 1

10 0

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Waiting )

20

15

10

5

Picking schemata

at random

10 0

10 -1

10 -2

10 -3

10 -4

0

0 5 10 15 20

Order of constraint schema H, o(H)

10 -1

0

0 5 10 15 20

Order of constraint schema H, o(H)

10 -5

Figure 5.10: Plots showing the ratio P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy )

(left) and P(f(x) > f Penalizing )/P(f(x) > f Waiting ) (right) on OneMax as a function of

the order of the constraint schema o(H), and the number of order-defining bits in H

with value 1 **for** the ERC perERC(0,700,20,50,H); here, x is a random variable that

represents the best solution from a set of solutions drawn uni**for**mly at random from

the **search** space and f ∗ the average best solution fitness obtained with strategy ∗. If

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a higher average

best solution fitness than strategy ∗ and a greater advantage of ∗∗ is indicated by a

darker shading in the heat maps; similarly, if P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is

better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

samples are ones that employ a separation step coupled to mass spectrometric

detection. As the separation method, gas chromatography is considered as a

highly resolving technique and the method of choice. The challenge with this

approach, however, is that many instrumental parameters are involved in the

process and that small changes in these parameters may have significant effects

on the chromatographic per**for**mance. Variables or instrumental parameters represent

things like the sample volume, split ratio, oven temperatures, ramp rates,

and other settings of a gas chromatograph; in this application, we have l = 15

integer variables or instrument parameters making up a **search** space of 7.32×10 9

different instrument configurations. The fitness of a configuration is determined

by running a biological sample throughit, and measuring properties of the instrument’soutputsignal,

suchasthenumber ofpeaks(ormetabolites) detectedinthe

chromatogram, the signal-to-noise ratio of the peaks, and the run time. O’Hagan

et al. (2005, 2007) cast this application as a multi-objective **optimization** problem

but here we consider only one objective, namely the number of peaks detected.

Ingeneral, whendealingwithcomplex andsensitive instruments, suchasmass

spectrometer/gas chromatography equipment, the stability of the instrument and

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 139

the results can be affected when changing instrument parameters across wide

ranges. In particular, if the oven temperature of the instrument is set low in one

experiment, then it isexpected that some metabolites remain stuck to the column

(a tube in which the sample is passed through solvent in order to separate the

elements within the sample) and may elute from it in a later experiment where

the temperature is set higher unless the column is cleaned or replaced between

experiments (Dunn, March 2011). To avoid this, the column should ideally be

cleaned or replaced whenever an instrument configuration with a low oven temperature

is tested. However, usually, cleaning or replacing of instrument parts is

done overnight because it incurs additional costs in time and money when done

after every analysis run. Hence, using a low oven temperature en**for**ces a temporary

commitment to this temperature **for** the rest of the day, or, equivalently,

activates a commitment relaxation ERC; as the application of O’Hagan et al.

(2005, 2007) makes use of two ovens, the **optimization** is subject to two commitment

relaxation ERCs. In the particular application of O’Hagan et al. (2005,

2007), however, the experimentalists avoided the ERCs by setting a smaller parameter

range of the oven temperature than would have been ideal to optimize

this instrument parameter (Dunn, March 2011). We want to investigate how the

per**for**mance might have been affected in the presence of the two ERCs.

ERCs: To keep things simple, we assume that the maximal number of instrument

configurations that can be tested on a day is fixed at V = 15, and the

total number of days available **for** the **optimization** is 150, resulting in T =

15×150 = 2250 available time steps or fitness evaluations. We set the first two

variables to represent the oven temperatures, and the value 0 to represent the low

temperatures. Hence, we have the following two commitment relaxation ERCs:

commRelaxERC(0,2250,15,H = (0 ∗ ∗ ∗ ...)) and commRelaxERC(0,2250,15,

H = (∗0∗∗∗...)). Little is known of the fitness landscape be**for**e **optimization**

begins but, as in (O’Hagan et al., 2007), it would be expected that there is some

degree of epistasis in the problem. We would not know which of the two schemata

represent good or poor instrument configurations.

Offline testing: As algorithm designers, we are now faced with the challenge

to select an **optimization** algorithm or constraint-handling strategy **for** the above

described ERCOP. The common approach is to first design appropriate problem

functions that simulate the problem at hand, and then to test several algorithms

offline on these functions and use the best one **for** the real-world problem. In this

140 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Average best solution fitness

Average best solution fitness

0.86

0.84

0.82

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=2, α=0.0

Forcing

Regenerating

Waiting

Subpopulation strategy

Penalizing

0 500 1000 1500 2000

Time counter t

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=2, α=2.0

0.86

0.84

0.82

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

0 500 1000 1500 2000

Time counter t

Average best solution fitness

Average best solution fitness

0.8

0.78

0.76

0.74

0.72

0.7

0.68

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=6, α=0.0

0.66

0 500 1000 1500 2000

0.8

0.78

0.76

0.74

0.72

0.7

0.68

Time counter t

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=6, α=2.0

0.82

0.66

0 500 1000 1500 2000

Time counter t

Figure 5.11: Plots showing the average best solution fitness obtained on NKα landscapes

with N = 15 and K = 2,α = 0.0 (top left), K = 6,α = 0.0 (top right),

K = 2,α = 2.0 (bottom left), and K = 6,α = 2.0 (bottom right) as a function of

the time counter t; results are averaged over 500 independent runs using a different

randomly generated problem instance **for** each run. All instances were subject to the

two commitment relaxation ERCs commRelaxERC(0,2250,15,H = (0∗∗∗...)) and

commRelaxERC(0,2250,15,H = (∗0∗∗∗...)).

case study, we use NKα landscapes as the test problems because they allow us to

modeldifferentdegreesofepistasis. WeintroducedthisprobleminSection5.1.2.2,

andalsoprovidedthesettingsoftheproblemparametersN,K,andαinTable5.3.

The (four) selected settings will give us some insights into how the landscape

topology and the degree of epistasis may influence the per**for**mance. To cope

with the integer representation, we need to modify the mutation operator, which

shall now select a random setting from the set of possible ones **for** the instrument

parameter that is to be modified. Otherwise, we canuse the same algorithmsetup

as previously (see Algorithm 5.1).

Let us now analyze how the different constraint-handling strategies per**for**m

on the four NKα landscapes. Figure 5.11 shows the average best solution fitness

obtained by the different strategies on the landscape models as a function of the

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 141

time counter (we do not show the standard error as it was negligible). The plots

confirm what we observed in the experimental study that similar patterns are

obtained **for** different landscapes. In fact, we observe that a repairing strategy

(**for**cing, regenerating, or the subpopulation strategy) should be clearly favoured

over a waiting or penalizing strategy. More precisely, a trend is apparent that the

subpopulation strategy and regenerating per**for**m best in the initial stages of the

**optimization**, while **for**cing is slightly better in the final part of the **optimization**.

Also, the per**for**mance advantage of **for**cing at T = 2250 over the other two

repairing strategies tends to increase with K and/or α. The waiting strategy

does not per**for**m well because the likelihood that either or both of the ERCs is

active is relatively high, causing the **optimization** to freeze **for** too long. Although

penalizingper**for**mssignificantlybetterthanwaiting, theprobabilityofpenalizing

many solutions and thus making only little or no progress in the **optimization** is

too high to match the per**for**mance of the repairing strategies.

Based on these results, if a single strategy is to be chosen, then we would

select the strategy, **for**cing, **for** the real-world instrument **optimization** problem

as it per**for**ms best after 2250 time steps.

Testing on real-world landscape: Totest thischoice nowonthereal problem,

we cannot run it on the real **closed**-**loop** problem (and certainly we would not be

able to compare different policies). However, we are able to do the next best

thing. Since we have available the actual fitness values collected during the real

experimental trials reported by O’Hagan et al. (2005, 2007) (i.e. the number of

peaks detected) **for** around 315 instrument configurations tested, we can use this

datatoconstructaninterpolatedfitnesslandscapeusing, **for**example, theKriging

approach (Cressie, 1993). 7 In comparison to the NKα landscapes considered **for**

offline testing, the interpolated landscape is smoother and contains significantly

fewer localoptima; 8 bothaspects areattributedto thelow number of data points.

Nevertheless, asit isapparent fromFigure5.12, theper**for**mance ofthe strategies

on the interpolated landscapes is largely in alignment with the findings made

on the NKα landscapes. In particular, the repairing strategies tend to per**for**m

7 In essence, Kriging is a technique that interpolates the fitness value of an unobserved data

point from observations of values of nearby data points. To generate the fitness landscape we

used a Kriging function Krig() from the fields package of the statistical software, R.

8 Toobtain this in**for**mation, we followedthe experimentalmethods employingadaptive walks

already used by Hebbron et al. (2008).

142 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

5500

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...))

Average best solution fitness

5000

4500

4000

Forcing

Regenerating

3500

Waiting

Subpopulation strategy

Penalizing

3000

0 500 1000 1500 2000

Time counter t

Figure 5.12: A plot showing the average best solution fitness obtained on a

fitness landscape interpolated from real-world data as a function of the time

counter t. The **optimization** was subject to the two commitment relaxation ERCs

commRelaxERC(0,2250,15,H = (0 ∗∗∗...)) and commRelaxERC(0,2250,15,H =

(∗0∗∗∗...)).

better than waiting and penalizing, and, while the subpopulation strategy per**for**ms

best at the beginning of the **optimization**, all repairing strategies tend to

per**for**m identically at the end of the **optimization** run. Clearly, in reality one is

usually able to per**for**m only a single **optimization** run meaning that the result

might be different from the one we obtained from averaging over many runs.

Nevertheless, this case study demonstrates how one can approach and solve an

ERCOP beginning with the definition of the ERCs, modelling the simulated environment,

selection of appropriate test functions, and finally comparing different

optimizers and selecting the most suitable one to be used in the real-world application.

5.1.5 Summary and conclusion

In this study, we have proposed and analyzed various static constraint-handling

strategies **for** dealing with non-evaluable solutions arising in ERCOPs. We considered

two types of ERCs, commitment relaxation ERCs and periodic ERCs,

and four test functions, OneMax, TwoMax, a MAX-SAT problem instance, and

NKα landscapes. In addition, a demonstration of how one may approach and

solve a newERCOP inthecommon case where knowledge of thefitness landscape

is poor has been given in the **for**m of a case study (see Section 5.1.4).

We made several key observations from the experimental analysis that may

determine how one should proceed with an ERCOP. Generally, ERCs affect the

5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 143

per**for**mance of an optimizer, and clear patterns emerge relating ERC parameters

to per**for**mance effects. The later the constraint time frame of an ERC begins,

the less disruptive is the impact on **search**. This could mean that investment

in resources at early stages of the **optimization** should be preferred, where possible.

For commitment relaxation constraints, the probability of activating the

constraint is dependent on the order and the quality of the genetic material represented

by the constraint schemata. Thus, to some degree, we may be able to

predict the extent of impact if in**for**mation about these schemata is available.

Although periodic constraints are activated at regular time intervals (independently

of the constraint schemata defining them), their impact on **search** still

depends upon the order and quality of the constraint schemata in predictable

ways. We also clearly see that the impact on EA per**for**mance is modulated by

the choice of constraint-handling strategy adopted. Which choice of strategy is

best is dependent on the details of the ERC, as we have set out in our results.

Importantly, we also observed that the patterns of per**for**mance impact seen

on the same ERC type are quite similar across different **search** problems with

different types of fitness landscape. Whereas, between the two ERCs, even on the

same problem, the impact on per**for**mance is quite different. If this pattern turns

out to be more generally true, then it is good news because we usually have more

knowledge about the ERCs than about the fitness landscape. There**for**e we would

not need to be ‘right’ about the fitness landscape in order to choose the right

strategy. Nevertheless, as indicated in the case study, an a priori analysis of the

problem at hand can be beneficial when it comes to selecting a suitable strategy.

With respect to the impact of the individual ERC types, our analysis concluded

that with commitment relaxation ERCs, we would tentatively say that

repairing strategies should not be used (i.e. the genotype of a solution should not

be modified) if non-evaluable solutions are unlikely to be encountered, resources

are non-available **for** only short periods, or much recovery or **optimization** time

is available. With periodic ERCs, resources become non-available at regular time

intervals. Because of this, solutions are likely to be non-evaluable during activation

periods, which seems to cause strategies that do not repair to per**for**m quite

poorly. An exception may be the situation where the available resources are poor

because, there, repairing solutions and inserting them into the population may

cause an EA to prematurely convergence to a suboptimal population state.

144 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

For both ERC types, in situations where it should be repaired, we can tentatively

suggest thatastrategythataimsatcreating fitrepairedsolutionsshouldbe

preferred over naïve **for**cing strategy. An exception may be again the case where

the available resources are poor. In the latter situation, the fitter a repaired solution

is, the more likely it is to be inserted into the population and thus cause

further constraint activations. Ultimately, this may increase the probability of

prematurely converging to a suboptimal population state.

Although we are able to draw these conclusions, our study has of course

been very limited, and there remains much else to learn about the effects of

ERCs and how to handle them. The next section takes a step further and investigates

learning-based strategies **for** switching between the static constrainthandling

strategies introduced here during an **optimization** process.

5.2 Learning-based constraint-handling strategies

The previous section provided evidence that it is possible to select a suitable

(static) constraint-handling strategy **for** anERCOP offline if the ERCs are known

in advance. Inspired by this observation, this section investigates whether an EA

that learns offline when to switch between the static constraint-handling strategies

during the **optimization** process is more efficient than the static strategies

themselves. Offline learning is per**for**med by a rein**for**cement learning (RL) agent,

and this agent aims at optimizing the average fitness of the final population (reward)

by selecting the most suitable static strategy (which serve as the actions)

in a given state during the **optimization**. The next section introduces this offline

learning strategy in more detail. We also consider an EA that learns the same

switching task as the RL agent but online using a multi-armed bandit algorithm;

Section 5.2.2 will introduce this approach. In Section 5.2.3 we experimentally

compare the two learning-based strategies against the static constraint-handling

strategies themselves **for** commitment relaxation ERCs. Section 5.2.4 draws together

the findings from the experimental analyses.

5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 145

5.2.1 Offline learning-based strategy

To learn offline when to switch between static constraint-handling strategies during

an **optimization** run we use the tabular RL algorithm, Sarsa(λ) (see Algorithm

3.4 in Section 3.6). We employ Sarsa(λ) in combination with replacing

eligibility traces (see Equation (3.7)), and the ǫ-greedy action selection method.

Our learning task is episodic with each episode representing a single EA run.

We reset the eligibility trace to e = 0 at the beginning of each episode, meaning

we **for**get about the state-action pairs visited in the previous episode (see

Line 11 of Algorithm 5.2). This modification to Algorithm 3.4 resulted in a

slightly better per**for**mance in our case. To encourage exploration we use optimistic

initial values **for** the action-value estimates Q(s,a); we set all values to

Q(s,a) = 1.0,∀s ∈ S,a ∈ A(s) (1.0 is the maximal expected return in our case).

To characterize a state s we use the current population average fitness and the

current time step. The fitness values are normalized so that they lie in the range

[0;1], while the **optimization** time is limited by T. Each of the two variables

is binned into 5 equally-sized intervals, resulting in the intervals ( j

; ]

( ]

j+1

5 5 and

T·j

; T·(j+1) ,j = 0,1,2,3,4, respectively. That is, we have 25 states in total,

5 5

and5actions(thestaticconstraint-handlingstrategies)areavailableineachstate.

Our learning goal is to achieve a high population or individual fitness at

the end of the **search**. Consequently, the only reward we provide is the average

fitness of the population at the end of an episode; i.e. r i = 0 **for** i = 1,...,T, and

r T+1 ∈ [0,1]. Alternatively, the reward may be the best solution fitness found.

There are some further aspects related to the particular learning problem of

switching betweenconstraint-handlingstrategies. First, notethatanon-evaluable

solution may be encountered at different stages of the **optimization** process or

not at all in a single EA run. In other words, the number of states visited in an

episode, and the number of non-evaluable solutions encountered in a state, may

vary between episodes and states, respectively. In the case where a non-evaluable

solution is encountered, the first action selected in a particular state is applied

to all non-evaluable solutions encountered in this state (see Line 17 and 18 of

Algorithm 5.2). This selection approach tends to per**for**m better than allowing an

optimizer to reselect actions when in one and the same state, because it is more

direct in terms of credit assignment.

146 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

5.2.2 Online learning-based strategy

To learn online when to switch between static constraint-handling strategies, we

use an adaptive operator selection method known as the dynamic multi-armed

bandit (D-MAB) algorithm (Hartland et al., 2006, 2007; Costa et al., 2008). 9

The idea of D-MAB is to consider the learning problem as a multi-armed bandit

problem with the static strategies serving as independent arms. D-MAB extends

the upper confidence bound 1 (UCB1) algorithm (Auer et al., 2002) with the

statistical Page-Hinkley (PH) test (Page, 1954), which has the purpose to detect

changes in the sequence of rewards obtained, and then to restart the multi-armed

bandit. The algorithm, D-MAB, is described in detail in Appendix B.

D-MAB requires that the play of an arm is followed by a subsequent reward.

We provide a reward immediately after the play of an arm, and it is the (normalized)

raw fitness of the resulting solution. Note that if the arm associated

with the strategy, waiting, is played, then the reward may be available only a

few time steps later. Also, in the case of penalizing, the reward will always be

some poor fitness value c. However, this does not mean that penalizing is never

selected because (i) UCB1 maintains always a certain degree of exploration, and

(ii) a restart of the multi-armed bandit triggered by the PH test puts all arms

in the same initial position. Alternative credit assignment schemes and UCB algorithms

were tested but the combination used in this chapter tends to per**for**m,

on average, best and most robustly on the test problems considered.

We want to point out here that certain credit assignment schemes cannot be

readily applied when dealing with ERCs. For instance, a common scheme is to

use a credit based on the fitness improvement of an offspring compared to its

parent after applying a variation operator to it. In our scenario the parent would

be the individual that is to be repaired and the offspring the repaired individual

after applying a constraint-handling strategy to the parent. As we do not know

the fitness of the parent because it is non-evaluable, we cannot quantify by how

much its fitness differs from the one of the repaired individual.

9 In (Hartland et al., 2006, 2007), various versions of a D-MAB algorithm are proposed but

the version we use here is referred to as γ-restart.

5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 147

Table 5.4: Parameter settings of static and learning-based constraint-handling strategies.

Strategy Parameter Setting

Regenerating

Penalizing

Subpopulation

strategy

RL-EA

D-MAB

Number of regeneration

trials L

Fitness c assigned to

non-evaluable solutions

Maximal size of all

subpopulations SP h , J h

10000

0

25

Decay factor λ 1.0

Discount rate γ 1.0

Learning rate α 0.1

Probability ǫ of selecting

a random action

0.1

#Training episodes 5000

#Testing episodes 100

Threshold parameter λ PH 0.1

Tolerance parameter δ 0.01

Scaling factor C 1

5.2.3 Experimental analysis

Experimental setup: Here, we use a slightly different experimental setup

than we used in the analysis of the static constraint-handling strategies (see Section

5.1.3). The changes concern the reproduction scheme of the EA on which

we augment the constraint-handling strategies, and the maximal size of the subpopulation

SP,J, used within the subpopulation strategy. The reason **for** using

a modified setup is that we specifically tuned the EA and constraint-handling

strategies to per**for**m well on the test problems considered.

Rather than using an elitist generational reproduction scheme, we employ an

EA with a steady state or (µ+1)-ES reproduction scheme. Algorithm 5.2 shows

the interplay between thestatic andlearning-based constraint-handling strategies

within this EA. The method currentState(Pop,t) (Line 17 and 18) called by the

RL-based approach returns the current state s t based on the state variables Pop

and t; the method giveAction(s t ,rand(0,1)) (Line 18) returns the action a t based

on the ǫ-greedy method.

The parameter settings of the EA are identical to the settings used in the

previous study (see Table 5.1) with the difference that λ = 1. Table 5.4 shows

148 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Algorithm 5.2 Steady-state EA with static and learning-based constrainthandling

strategies

Require: ERC 1 ,...,ERC r (set of ERCs), f (objective function), T (time limit), µ (parent population

size), offlineLearning (boolean variable indicating whether offline learning is applied

(true) or online learning (false))

1: t = 0 (global time counter), Pop = ∅ (current population), s t = −1 (variable representing

current state), #Strategy = -1 (variable indicating the number of the static constrainthandling

strategy currently in use)

2: while |Pop| < µ ∧ t < T do

3: generate solution ⃗x at random

4: ⃗x = functionWrapper(⃗x,t)

5: Pop = Pop∪{⃗x}; t++

6: while t < T do

7: generate two offspring by selecting two parents from Pop, recombining and mutating

them; pick one of two offspring ⃗x at random

8: ⃗x = functionWrapper(⃗x,t)

9: **for**mnewPop byselectingthe best µsolutionsfromthe unionpopulationPop∪{⃗x}, t++

10: if offlineLearning then

11: set reward r T+1 = Pop and update the function Q(s,a) (see Line 8 of Algorithm 3.4);

clear trace e(s,a) = 0,∀s ∈ S,a ∈ A(s), and reset s t = -1 and #Strategy = −1

12: functionWrapper(⃗x,t){

13: y t = null

14: if ⃗x satisfies the ERCs ERC 1 ,...,ERC r then

15: ⃗x t = ⃗x; y t = f(⃗x t )

16: else

17: if offlineLearning ∧ s t ≠ currentState(Pop,t) then

18: s t = currentState(Pop,t); a t = #Strategy = giveAction(s t ,rand(0,1)); r t+1 = 0;

update trace e according to Equation (3.7) and the function Q(s,a) (see Line 8 of

Algorithm 3.4)

19: if !offlineLearning then

20: #Strategy = giveBestArm() // Select the best arm using the UCB1 algorithm

21: if #Strategy = 1∨#Strategy = 2∨#Strategy = 3 then

22: ⃗x t = repair(⃗x,t); y t = f(⃗x t )

23: if #Strategy = 4 then

24: t = t+τ; ⃗x t = ⃗x; y t = f(⃗x t ) // τ is the number of time steps we have to wait until

⃗x is evaluable

25: if #Strategy = 5 then

26: ⃗x t = ⃗x; y t = c // c is a constant, representing poor fitness

27: if !offlineLearning then

28: normalize y t and assign it as reward to arm j = #Strategy; update the variables

n j , ¯p j ,m j , and M j , and per**for**m the PH test; if change-point detected, then set

n j , ¯p j ,m j , and M j (j = 1,...,J) to 0

29: return ⃗x t and y t }

the parameter settings of the constraint-handling strategies. For the RL-based

strategy, denoted in Table 5.4 by RL-EA, we use a training and testing scheme

(similar to Pettinger and Everson (2003)). In the training phase, the RL agent

estimatesthefunctionQ(s,a), while, inthetestingphase, theQ-functionisfrozen

5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 149

and the greedy actions a ∗ are always selected; i.e. during the testing phase we set

ǫ = 0. As specified in Table 5.4, the testing phase involves 100 episodes or EA

runs, and this is also the number of runs **for** which we run the other constrainthandling

strategies; i.e. any results shown are average results across 100 EA runs.

To allow **for** a fair comparison of the strategies, we use a different seed **for** the

random number generator **for** each EA run but the same seeds **for** all strategies.

We also use a different seed **for** each episode of the training phase of RL-EA.

Experimental results: Let us imagine that we are faced with a similar **closed****loop**

scenario as in the case study of Section 5.1.4. Suppose our **optimization**

problem is binary and subject to the following two, a priori known, commitment

relaxation ERCs: commRelax- ERC(0,2000,20,H = (10101∗∗∗...)) and

commRelaxERC(0,2000,20,H = (∗... ∗ ∗101)). That is, one ERC constrains

the first 5 solution bits, while the other the last three bits. Solutions shall be

represented by a binary strings of length l = 30, and we set the **optimization** time

to T = 2000 time steps. As the test function f let us consider the standard NK

landscapes or, equivalently, NKα landscapes withα = 0(see Section5.1.2.2). For

the two tunable parameters of NK landscapes, N and K, we consider four different

settings **for** K, namely, K = 1,2,3, and 4, while N is fixed to N = l = 30.

These settings will allow us to cover reasonable degrees of epistasis.

Letusnowanalyzetheper**for**manceofthestaticandlearning-basedconstrainthandling

strategies on the different NK landscapes. Figure 5.13 shows the population

average fitness obtained with the different constraint-handling strategies

on the four NK landscapes as a function of the time counter. The average fitness

of the best solution in a population is in alignment with the population average

fitness. From the plots we can see that the ERCs affect the per**for**mance of an

EA. Also, the plots confirm what we mentioned be**for**e that similar patterns are

obtained **for** all the test functions. In fact, with respect to the static strategies,

we observe a trend that the subpopulation strategy per**for**ms best in the initial

stages of the **optimization**, **for**cing and penalizing in the middle part of the **optimization**,

and waiting in the final stages of the **optimization**. Also, the time step

at which waiting outper**for**ms penalizing shifts further to the right as the degree

of epistasis increases. The behavior of the static constraint-handling strategies is

similar to the one we made in the case study of Section 5.1.4.

For RL-EA, the results in Figure 5.13 were obtained using a training phase

involving 5000 different NK landscapes with N = 30 and K = 2. After that

150 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=1

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=2

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=3

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=4

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Figure 5.13: Plots showing the population average fitness (we do not show the standard

error as it was negligible) obtained by the different constraint-handling strategies

on NK landscapes with N = 30 and K = 1 (top left), K = 2 (top right),

K = 3 (bottom left), and K = 4 (bottom right) as a function of the time counter

t; results are averaged over 100 independent runs using a different randomly generated

NK problem instance **for** each run. All instances were subject to the commitment

relaxation ERCs commRelaxERC(0,2000,20,H = (10101 ∗ ∗ ∗ ...)) and

commRelaxERC(0,2000,20,H = (∗...∗∗101)). Theresults of ‘Unconstrained EA’ were

obtained by running the EA on the same problem instances but without the ERCs.

training phase, the same frozen Q-function was used in the testing phase of all

four NK landscape types. Consequently, this allows us to assess the robustness

of the control policy learnt as training and testing are done in environments with

different properties (at least **for** the cases K ≠ 2). From the figure we can see

that RL-EA is the best per**for**ming strategy on all four NK landscape types at

T = 2000. For a low level of epistasis, RL-EA is even able to match almost the

per**for**mance of an EA optimizing in an ERC-free environment. We observe also

that the per**for**mance advantage over waiting, the second best strategy, tends to

increase with the degree of epistasis; when comparing the two strategies against

each other using more than 100 algorithmic runs, then, according to the Kruskal-

Wallis test (significance level of 5%), RL-EA is also significantly better than

5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 151

Population average fitness

commRelax(0,2000,20,H = (1010∗∗∗...)),

commRelax(0,2000,20,H = (∗...∗∗101)),N = 30,K = 2

1.0

Forcing

0.8

Regenerating

0.6

0.4

0.2

0

000 11100

000 11100

000 111

00 11 000 111 00

00 11

11

00 11

0001111

0000

00 11

1110000

1111

00 11

1110000

1111

00 11

00 11

00 11

00 11

00 11

00 11

00 11

00 11

0

00 11

00 11

00 11

00000

11111

400 800 1200 1600 2000

Time counter t

00 11

00 11 Waiting

000 111

00 11

00 11

000 111

00 11 Subpop. strategy

000 111

Penalizing

Unvisited state during

training phase

Figure 5.14: A plot showing the greedy actions a ∗ learnt by the RL agent **for** each state

s. Training was done on NK landscapes with N = 30 and K = 2.

waiting **for** K = 3 and K = 4. This aspect indicates the robustness of RL-EA.

On the other hand, although D-MAB is not able to per**for**m as well as RL-EA, it

is able to match the per**for**mance of waiting **for** N = 30,K = 4. Another benefit

of offline learning is that if the **optimization** time would be shorter, say around

T = 500, then RL-EA would learn a different control policy, while D-MAB would

not. Thatis, RL-EAadaptstothereal-worldproblemathand; wehaveconfirmed

this experimentally (results not shown).

Figure 5.14 shows the greedy action a ∗ in each state s learnt by the RL

agent. From the plot it is apparent that the agent learnt to use mainly waiting

at the beginning of the **optimization** process, penalizing in the middle part of the

**optimization**, and, depending on the population average fitness, either **for**cing,

waiting, or thesubpopulationstrategy, inthefinal partof the**optimization**. From

Figure5.13onemayconclude that apolicythat uses arepairing strategy (**for**cing,

regenerating, or subpopulation strategy) at the beginning of the **optimization**,

and waiting or penalizing towards the end, should also per**for**m well. The agent

did not learn this policy because the repairing strategies may lead quickly to a

homogeneous populationcontaining manysolutions thatfallinto bothor either of

the constraint schemata. If the schemata are poor, then one should escape from

this population state but this is difficult as diversity needs to be again introduced

into the population and this takes too long using waiting or penalizing.

Figure 5.15 illustrates what strategies have been used on average by D-MAB

during different periods of the **optimization** process of N = 30,K = 4. From

152 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Average number of pulls

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=4

50

40

30

20

10

Regenerating

Subpop. strategy

Forcing

Waiting

Penalizing

0

[0;400)

[1600;2000]

[1200;1600)

[800;1200)

[400;800)

Time counter interval [t i ; t i+1 )

Figure 5.15: A plot showing the average number of times each strategy is played by

D-MAB at different periods of the **optimization** process. Results are shown **for** N = 30

and K = 4.

the plot we can see that penalizing is selected least often, which is due to the

low fixed reward associated when playing the associated arm. A trend is obvious

that waiting is used less often than the repairing strategies at the end of the

**optimization** process, which is in alignment with the policy learnt by the RL

agent. The reason that the per**for**mance of D-MAB is nevertheless poorer than

the one of RL-EA is that the repairing strategies are used too often throughout

the **optimization** process but in particular at the beginning of the **optimization**.

As mentioned be**for**e, this may result in a homogeneous population state from

which it is difficult to escape; the fact that the total number of plays increases

towards the end of the **optimization** confirms that D-MAB actually ends up in

this poor population state. On NK instances with lower epistasis, the PH test is

triggered less often, and waiting is used almost as often as the repairing strategies

throughout the **optimization**.

Overall, the strong per**for**mance of the RL-EA is encouraging, but we want to

mention that in order to achieve that per**for**mance, some **tuning** of the agent may

berequired. Thisisduetotwoaspects.First, thestochasticnatureofanEAinthe

sensethat: (i)takingthesameactioninaparticularstatebutindifferent episodes

may cause an EA to end up in different future states, and (ii) visiting the same

state-action pairs in different episodes may result in different rewards obtained at

the end of an episode. Secondly, since a different problem instance is used in each

training episode, the constraint schemata of the two ERCs may represent both

good and poor sets of solutions. Hence, as the quality of a schema has an impact

5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 153

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=2

0.725

Population average fitness

0.72

0.715

0.71

0.705

λ = 1.0, γ = 1.0

λ = 0.0, γ = 1.0

λ = 0.5, γ = 0.5

0.7

λ = 0.25, γ = 0.0

λ = 0.25, γ = 1.0

0.695

0 1000 2000 3000 4000 5000

#Training episodes

Figure 5.16: A plot showing the population average fitness obtained by RL-EA **for**

different values of thedecay factor λ andthediscount rate γ as afunction of thenumber

of trainingepisodes. Thepopulationaverage fitnessvaluesthemselves representaverage

values across 30 learning trials; each trial used different randomly generated problem

instances and random number generator seeds **for** training and testing, but the same

instances and seeds **for** any combination of λ and γ values. Training and testing was

done on NK landscapes with N = 30 and K = 2

on what strategy is most appropriate, the current optimal control policy learnt

by the RL agent may change constantly during training. The approach taken

here to deal with these two issues was to train the agent **for** different numbers of

training episodes using also different settings of parameters involved in the agent

algorithm. The parameter setting combination that per**for**med, on average, best

and most robustly on the training or validation problems was used.

Figure 5.16 illustrates how the population average fitness may be affected

by the number of training episodes, and different settings of the decay factor λ

and the discount rate γ. From the figure it is apparent that the RL agent is

able to learn an optimal policy after around 1000 episodes or algorithmic runs.

Furthermore, whiletheagentisrelatively robusttothesetting ofλandγ, optimal

per**for**mance tends to be obtained **for** configurations where both parameters take

large values. The meaning of these configurations is that we should not discount

anyrewardobtainedduringanepisode(largeγ)norshouldaction-valueestimates

Q(s,a) be updated based on other action-value estimates (large λ). These are

sensible configurations because the aim of discounting (small γ) is to reach a

terminalstatemorequickly; thisisnotthecaseinourscenariobecauseweprovide

a reward always after visiting the same amount of simulated time. The fact that

we obtain better per**for**mance by updating visited state-action pairs (s,a) based

on the actual rewards obtained (small values of λ) is an indication that the states

154 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=3

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Figure 5.17: A plot showing the population average fitness obtained in a single run

on a randomly generated NK landscape with N = 30 and K = 3 as a function of

the time counter t; the instance was subject to the two commitment relaxation ERCs

commRelaxERC(20,H = (10101∗∗∗...)) and commRelaxERC(20,H = (∗...∗∗101)).

we visited in an episode are not significantly dependent on the actions taken; this

is true in our scenario because the EA passes through approximately the same

state trajectory in an episode. Of course, different settings of γ and λ may be

more suitable **for** more fine-grained state space realizations.

Overall, based on the results shown, we would select RL-EA **for** the real-world

**closed**-**loop** problem as it per**for**ms best after 2000 time steps. Let us assume that

the real-world problem is a NK landscape instance with N = l = 30 and K = 3.

Hence, to verify our selection, we per**for**m one run with all strategies on a single

newly generated NK landscape instance with N = 30 and K = 3; the results are

shown in Figure 5.17, and note that RL-EA uses the control policy of Figure 5.14.

The plot confirms the findings made on the test functions. In particular, RL-EA,

D-MAB,andwaiting, per**for**mbest attheend oftherunwhile thestaticrepairing

strategies per**for**m best at the beginning of the **optimization**.

5.2.4 Summary and conclusion

In this study we have investigated two learning-based constraint-handling strategies

**for** coping with commitment relaxation ERCs. The learning-based strategies

aim at learning when to switch between static constraint-handling strategies,

which we introduced in Section 5.1.1, during the **optimization** process. However,

while one strategy learns this task offline using a rein**for**cement learning (RL)

agent, here Sarsa(λ), the other strategy per**for**ms online learning using the UCB

5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 155

algorithm extended with the statistical Page-Hinkley test to detect changes in

the sequence of rewards obtained. Offline learning is possible in this **optimization**

scenario because the per**for**mance of an EA depends largely on the type of ERC,

with similar effects observable **for** different fitness landscapes; this observation

has been indicated by the study conducted in Section 5.1.3.

The experimental analysis compared the learning-based constraint-handling

strategies against the static strategies themselves, and it concluded that ERCs

affect the per**for**mance of an EA but an RL-based strategy is able to get close to

the per**for**mance of an EA optimizing in an ERC-free environment, particularly

on fitness landscapes with little epistasis. The online learning algorithm did not

per**for**m aswell asthe RL-based algorithm, mainly because it does not lookahead

in the **optimization** process and so may end up quickly in a poor population state

from which it is difficult to escape; this is a general issue of online learning

within an ERCOP scenario and it is yet unclear how to avoid it. Static repairing

strategies are well suited if little **optimization** time is available, while a static

waiting or penalizing strategy is more suited **for** longer **optimization** times.

To improve the per**for**mance of learning-based constraint-handling strategies,

in particular, offline learning-based techniques, one may look at the design and

**tuning** of RL agents that also account **for** the stochasticity present in **evolutionary**

**search** (e.g. the P-Trace and Q-Trace algorithm of Pendrith (1994), or the

Kalmanfiltering approachof Chang et al. (2004)). Analyzing constraint-handling

and learning strategies on different and perhaps more realistic fitness landscapes

than NK landscapes is another important re**search** avenue. Furthermore, in addition

to learning the task of switching between constraint-handling strategies, an

agent may be required to arrange its own resources required in the evaluation of

solutions. The next section investigates strategies **for** coping with such scenarios.

Whilst these strategies do not rely on RL algorithms, extending them with ideas

from RL is certainly an interesting and promising re**search** direction.

5.3 Online resource-purchasing strategies

So far, this chapter has investigated strategies **for** dealing with non-evaluable

solutions arising due to ERCs. In this section, our focus shifts to online resourcepurchasing

strategies to cope with commitment composite ERCs, which we introduced

in Section 4.2.5. Recall that with this type of ERC, some part of the

156 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

solution vector defines a complex subpart or composite (expressed by a high-level

constraint schema H # ) that must be purchased online and pre-emptively so that

it arrives in time to be used in the evaluation process of a solution. Once the

composite arrives (after some lag of TL time steps) it can be stored **for** a limited

period of time only (because composites have a shelf life of SL time steps) and/or

re-used in different experiments only a certain number of times (RN denotes the

reuse number); we assume SL ≥ RN. The number of composites available simultaneously

is upper bounded by the number of storage cells #SC. We also assume

a budget C limiting the usage of the composites and/or limiting time. In the next

section we devise three resource-purchasing strategies (**for** use in a generational

EA) to cope with this type of ERCs. Be**for**e we deploy and test these strategies

empirically over a number of resource-constraint settings and budget regimes in

Section 5.3.3, we define the experimental setup in Section 5.3.2. Section 5.3.4

draws together the findings of the experimental study.

5.3.1 Specific online purchasing strategies

This section proposes three online resource-purchasing strategies **for** coping with

commitmentcompositeERCs:ajust-in-timestrategy, ajust-in-timestrategywith

repairing, and a sliding window strategy. All strategies are augmented on a generational

EA and designed such that the EA has never to deal with solutions

that have a null fitness value (although null solutions may need to be submitted

purposely to bridge waiting periods between arriving composite orders).

In general, a strategy designed to deal with commitment composite ERCs

is composed of three mechanisms, which are concerned with: (i) selecting the

compositethatisorderedandthepointoftimeatwhichitisordered, (ii)selecting

the storage cell into which an arrived composite is put, which may mean selecting

an existing composite that is to be replaced by a new one, and (iii) selecting an

available composite to repair a non-evaluable solution (given it is to be repaired).

Each of these mechanisms is described **for** our strategies in the following. As

be**for**e, we assume that time steps refer to function evaluations of single solutions.

5.3.1.1 Just-in-time strategy

Without repairing and any composites in storage, the minimum number of time

steps **for** evaluating all solutions of a population Pop (of size µ) is µ+TL (TL

5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 157

time steps are needed **for** the first composite to arrive; this period is bridged by

submitting TL null solutions). This is achieved if orders are processed in contiguous

groups organized by the composites they require, e.g. ccbbbadd..., where

a,b,c, and d shall represent different composites required by solutions. Thus, the

first main mechanism of the just-in-time (JIT) strategy is to arrange solutions of

a population Pop into these contiguous groups, and then to make purchase orders

so that composites arrive just in time **for** the scheduled experiment time. This

mechanism, which is per**for**med by the method alignOrdersAndSolutions(Pop)

(see below), prolongs the availability of resources.

When composites are already in storage (we call them old composites) savings

in purchase orders may be made if those composites are used first. For example,

suppose the optimizer requires the composites ccdadcac, and composite a is

available in one of the storage cells and has 3 uses and 5 time steps of its shelf

life remaining. Then, by placing a first, the permutation aaccccdd will save us a

purchase order since only two a composites are needed. Thus, the second main

mechanism of the JIT strategy is to efficiently schedule the evaluation of solutions

in a population Pop using old composites. This is per**for**med by the method

useUpOldComposites(Pop) (see below). Notice that, at any given time, JIT (and

JIT with repairing) maintain non-identical composites in storage.

During the **optimization**, the JIT strategy checks at each generation whether

oldcompositescanbeusedintheevaluationofsolutionsofthecurrentpopulation.

If so, the method useUpOldComposites(Pop) is applied first and then the method

alignOrdersAndSolutions(Pop), otherwise we proceed directly with the method

alignOrdersAndSolutions(Pop).

alignOrdersAndSolutions(Pop): Recalling that solutions are grouped by the

composite they require; this method must just choose the order in which these

groups of solutions are to be submitted **for** evaluation. To obtain a permutation

of groups we select groups one by one using roulette wheel selection (without

replacement) based on the number of solutions in Pop associated with them,

and then reverse the order so obtained. This strategy increases the chances that

left-over composites at the end of the generation will be useful next generation

(because the solutions associated with them are over-represented in the population).

After aligning composite orders with the evaluation sequence of (groups

of) solutions, we know exactly when composite orders need to be submitted **for**

them to arrive just in time **for** being used in the evaluation of solutions.

158 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

After the arrival of a composite we need to decide the storage cell into which

we put it. The default isto placethem inanempty storagecell. If no cell is empty,

then the arriving composite replaces the old composite that can be used in the

fewest evaluations within the subsequent generation. That is, if P is the current

set of old composites and associated with each p ∈ P there is the remaining

shelf life ψ p and a remaining number of reuses ρ p , then an arriving composite

replaces the composite p ∗ ∈ P given by min p∈P (ψ p −(t new generation −t),ρ p ), where

t new generation is the time step when the subsequent generation begins, and t the

current time step. Remember that we are aware of t new generation because the entire

schedule **for** the evaluation of Pop is done be**for**ehand. In the case where there

are multiple composites p ∗ , ties are broken by replacing the composite that has

the shortest shelf life remaining; further ties are broken randomly.

useUpOldComposites(Pop): Our aim when using up old composites is to save

as many composite orders as possible. To achieve this when there are several storage

cells to consider, we solve the lexicographical **optimization** problem

lexmax i∈π(Z) (F 1 ,F 2 ) over the permutation set of

Z = {ζ|ζ ∈ P ∩C ∧θ ζ modRN ≤ min(ψ ζ ,ρ ζ )}, (5.1)

where P,ψ p and ρ p have the same meaning as be**for**e, and C is the set of composites

required by the optimizer to evaluate Pop; associated with each c ∈ C there

is a required number θ c . The first condition, ζ ∈ P ∩C, says that a composite

needs to be currently in storage and it needs to be required by at least one solution

of the population Pop. The second condition, θ ζ modRN ≤ min(ψ ζ ,ρ ζ ),

ensures that the only left-over solutions considered **for** evaluation with the old

composites are ones that do not require the ordering of a new composite. 10

The objective functions to be maximized are F 1 = ∑ j∈1,...,|Z| f 1(π ij (Z)) and

10 Remember that we assume that SL ≥ RN, meaning that the maximum number of evaluation

steps associated with a composite is limited by its reuse number. If the shelf life would be

the limiting factor, then the condition would be θ ζ modSL ≤ min(ψ ζ ,ρ ζ ).

5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 159

F 2 = ∑ j∈1,...,|Z| f 2(π ij (Z)), where π ij is the jth element of permutation i,

⎧

1 if j = 1 ⎪⎨ ( ∑j−1

)

f 1 (π ij (Z)) = 1 if ψ πij −

k=1 f 1(π ik (Z))·θ πik modRN ≥ θ πij modRN

⎪⎩

0 otherwise,

(5.2)

and

f 2 (π ij (Z)) = f 1 (π ij (Z))·θ πij modRN. (5.3)

F 1 determines the number of purchase orders saved **for** each permutation of Z,

while F 2 breaks ties among the best permutations by selecting thepermutation(s)

that evaluatethe most solutions; further ties arebroken randomly. Required compositesleftinstorageafterevaluatingthelastsolutionoftheselected

permutation

(i.e.aftermax(F 2 )timesteps)areusedupfurtherinarandomfashion; i.e.wefirst

select a composite at random and then select a random (unscheduled) solution

from Pop that requires this composite, and we repeat this until no future evaluations

can be scheduled. For small sizes of Z, the lexicographical **optimization**

problem lexmax i∈π(Z) (F 1 ,F 2 ) can be solved to optimality by exhaustive **search**

(see Figure 5.18); this is the approach taken here.

Having devised a schedule **for** using up the old composites, and selecting the

solutions to be evaluated with these composites, we can now specify at which

time step the first new composite order needs to be submitted (to evaluate the

remaining solutions in Pop) in order to reduce our waiting time **for** it to arrive.

Subsequently, we know when and how many null solutions need to be submitted.

5.3.1.2 Just-in-time strategy with repairing

Although the JIT strategy avoids repairing of solutions and thus allows an optimizer

to per**for**m an unconstrained **optimization**, this may be associated with a

waste of up to µ×(RN −1) reuses per generation (if each solution of the population

requires a different composite). To reduce wastage, the JIT with repairing

(JITR) strategy extends the basic JIT strategy with a repairing procedure.

The method per**for**mRepairing(Pop) (see Algorithm 5.3) per**for**ms the repairing

160 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

1)

p

a b d e f r

c

b d e f g

ψ p 12 4 6 5 6 19

ρ p 8 5 6 8 4 9

θ c

θ c modRN

3 14 21 5 7

3 4 1 5 7

2)

Z = {b,d,e}

(composite f is not a member of Z because θ f modRN > min(ψ f ,ρ f ))

3)

i

π ij

j = 1 j = 2

j = 3

f 1 (π ij (Z))

j = 1 j = 2 j = 3

F 2

F 1

4

4) Choose at random between

permutations i = 3,4, and 6

1

b

d

e

1

0 1

2

2

3

4

5

6

b

d

d

e

e

e

b

e

b

d

d

e

b

d

b

1

1

1

1

1

1 0

0 1

1 0

1 0

1 0

2

2

2

2

2

4

5

5

4

5

5) Use up left-over composites

from the set P if they are needed

in the evaluation of the current

population. Regardless of the permutation

selected in this example,

one evaluation will be left-over

with composite d (because

min(ψ d ,ρ d )−F 2 = 1)

Figure 5.18: Visualization of the internal calculations made by the method

useUpOldComposites(Pop). We assume a commitment composite ERC with RN = 10

and SL = 20. We need to per**for**m five steps to process the population Pop (containing

in this example ∑ c θ c = 50 individuals): 1) determining the parameters ψ p ,ρ p , and θ c ;

2) determining the permutation set Z; 3) computing the objective functions F 1 and F 2

**for** each permutation π; 4) selecting a random permutation in case there are equally

goodpermutations; 5) match unscheduledsolutions of Pop andcomposites (from setP)

in a random fashion if any composites are left after realizing the selected permutation.

and this method is always called be**for**e calling the method alignOrdersAndSolutions(Pop)

(butafteruseUpOldComposites(Pop)).Theideaoftherepairingstrategy

is to repair solutions such that they use a composite that is nearly the one

required. Solutions to be repaired are identified by first clustering their composites,

and then trying to find an assignment of solutions to clusters that minimizes

the total Hamming distance of all repairs.

per**for**mRepairing(Pop): The first step of this method is to filter out all solutions

from the population Pop that use up a composite entirely (without being

repaired); Pop ′ shall denote the resulting solution. If there are more than RN solutions

requiring the same composite type, then we filter out a subset of RN solutions

among them at random. The remaining solutions take part in the clustering

process and are subject to being repaired. The number of clusters k into which

we can partition these solutions (or their required composites), varies between

5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 161

Algorithm 5.3 Just-in-time strategy with repairing

Require: w (weighting factor), #mIter (number of shifting rounds)

1: per**for**mRepairing(Pop){

2: create Pop ′ by filtering out solutions from Pop that use up composites entirely

3: **for**

⌈

|Pop ′ |

RN

⌉

≤ k ≤ #CompsInPop do

4: cluster solution in Pop ′ using k-medoids

5: if required, shift solutions between clusters (per**for**m mIter rounds of shifting); retain

configuration with the smallest Hamming distance loss and record HDL k

6: normalizethe valuesHDL k andk ofallconfigurations,andcalculatethe scoresHDLN⌉

k·(1−

w)+kN ·w,∀k, where HDLN k =

HDL k −min k (HDL k )

max k (HDL k )−min k (HDL k )

and kN =

k−

⌈

|Pop ′ |

RN

#CompsInPop−⌈ |Pop ′ |

RN

7: repair solutions in Pop ′ according to the configuration associated with the smallest score

or min k (HDLN k ·(1−w)+kN ·w); add the filtered-out solutions back to Pop ′ to create a

new population Pop ′′ .}

⌈ ⌉

|Pop ′ |

≤ k ≤ #CompsInPop, where ⌈.⌉ is the ceil function and #CompsInPop

RN

the number of non-identical composites in Pop ′ . Let us first describe how we

per**for**m the clustering **for** a given k, and then how we select a particular cluster

configuration according to which we repair.

To obtain a partition with k clusters we apply k-medoids (Kaufman and

Rousseeuw, 1990) on the different composites. 11 The distance between two composites

is the Hamming distance between their constraint schemata. The medoid

composite of a cluster is the composite that would be used to repair (using **for**cing)

all solutions in that cluster that require a different composite. For diversity

reasons, we want to avoid ordering a medoid composite more than once, or, in

other words, we want to keep the number of solutions within a cluster smaller

than the reuse number RN. A simple way to ensure this is to randomly shift

solutions from clusters that are overly large to clusters that can accommodate

further solutions. We per**for**m mIter shifting rounds in total and select the configuration

with the smallest total Hamming distance of all repairs HDL k to be the

best configuration obtained with k clusters; **for**mally we can define this selection

step as:

HDL k = min

i=1,...,mIter

⎛ ⎧ ⎫⎞

k∑ ⎨ ∑ ⎬

⎝ HD(H j ,H j,r,i ) ⎠ , (5.4)

⎩ ⎭

j=1 r∈CL j,i

⌉

11 Thek-medoidsalgorithmdividesasetofndatapointsintok groups(clusters), withk being

known a priori, in such a way that the sum of pairwise dissimilarities between the data points

of a cluster is minimal. The medoid of a cluster is the data point whose average dissimilarity

to all other points in that cluster is minimal. In other words, the medoid is the most centrally

located data point in a cluster.

162 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

where H j represents the constraint schema of the medoid composite in cluster j,

and H j,r,i the rth solution in the set CL j,i , which is the cluster j under shifting

configuration i; note that while the medoid remains the same **for** different shifting

rounds, the set CL j,i depends on how solutions are shifted between clusters. The

function HD(.,.) computes the Hamming distance between two schemata.

The cluster configuration according to which we repair is the one with the

smallest score HDLN k·(1−w)+kN·w, where HDLN k andkN arethe normalized

valuesofHDL k andk (seeLine6ofAlgorithm5.3),andw ∈ [0,1]apredetermined

weightingfactorrepresenting thedegreeofrepairing.Roughlyspeaking, thelarger

the value w the more solutions are repaired; note, a setting of w = 0 would

result in the basic JIT strategy because the smallest total Hamming distance of

all repairs is achieved by not repairing at all. We can now add the solutions that

were filtered out againto Pop ′ to **for**m a new population Pop ′′ , which is processed

further as usual using the method alignOrdersAndSolutions(Pop ′′ ).

5.3.1.3 Sliding window strategy

While JIT and JITR operate in a sequential mode in that they devise a schedule

of solution evaluations and purchase orders upon receiving a population from

the EA first, the sliding window (SW) strategy deals with the working of the

algorithm and the purchasing of orders in parallel. More precisely, the strategy

submits solutions **for** evaluation in the order they are generated, non-evaluable

solutions are always repaired, and it aims **for** the ‘most useful’ composites to be

kept in storageby(i) replenishing composites so that there cannever be anempty

storage cell and (ii) maintaining composites that were recently requested by the

optimizer.

The first aspect, avoiding empty storage cells, is achieved by making purchase

orders to fill all storage cells every RN time steps (because SL ≥ RN). The

second aspect, maintaining recently requested composites, is achieved by ordering

composites from the sliding window, which we define here as a set κ(t) containing

composites that were requested most recently but were unavailable at the time

of the request; κ(t) is limited in size such that |κ(t)| ≤ WS, where WS is the

window size. Here, a requested composite means one that was part of a solution

that the EA submitted to the function wrapper **for** evaluation. In this thesis, we

simply order the#SC composites fromκ(t) thathave been addedtothisset most

recently. That is, if we set WS = #SC, then we order always all composites in

5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 163

κ(t); note, this means that all composites in the storage cells are replaced upon

the arrival of new composites. If we cannot order a composite **for** each storage

cell because |κ(t)| < #SC, then we order random composites **for** the remaining

storage cells; this is, **for** example, the case at t = 0 where the set κ(0) is empty.

To repair a non-evaluable solution, we use the composite from the storage cell

that has the smallest Hamming distance to the actually required composite; ties

between equally distant composites are broken randomly. This repairing step is

encoded within the repairing method, repair(⃗x,t), of the function wrapper (see

Line 36 of Algorithm 5.4). This is different to the JITR strategy, which applies

repairing after generating an offspring population (see Line 15 of Algorithm 5.4).

5.3.2 Experimental setup

We augment the three online resource-purchasing strategies on the same elitist

generationalEAasusedintheanalysisofthestaticconstraint-handlingstrategies

(see Section 5.1.2.1); we also use the same parameter settings (see Table 5.1).

Algorithm 5.4 shows how we incorporate the purchasing strategies into this EA. 12

Astheobjective functionf weconsider againtheMAX-SATprobleminstance

introduced in Section 5.1.2.2 (i.e. solutions are represented as binary strings of

length l = 50). Any results shown are average results across 100 independent

algorithm runs on this instance. Similarly to our previous studies, we choose the

order-defining bits of the high-level constraint schema H # at random at each run

but, of course, use the same schemata across the strategies analyzed.

Table 5.5 gives the settings of the online purchasing strategies. JIT is parameter-free

and thus not included in this table. For JITR, we anneal the weighting

factor w stepwise as a function of the cost counter c as w = max(0,0.1 ×

max(0,4−⌊c/250⌋)). This means that the weighting factor is decreasing in the

range [0.4,0.1], resulting in a decreasing number of repairs as the **optimization**

proceeds. The factor remains constant at w = 0.1 (i.e. only little repairing is

applied) **for** c ≥ 1000. This setting per**for**med most robustly on the test problem

considered here. For SW, we found that ordering a composite from κ(t) that

first undergoes mutation yields better per**for**mance. Preliminary experiments revealed

a per-bit mutation rate of p κ = 0.05, which will also be used here, to be

promising. This modification encourages exploration in case the EA gets stuck at

12 The purpose of Line 27 and 28 of Algorithm 5.4 is to permit the submission and arrival of

composite orders also during time steps where null submissions are submitted.

164 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Algorithm 5.4 Generational EA with online resource-purchasing strategies

Require: commCompERC(...) (a commitment composite ERC), f (objective function), T

(time limit), C (budget), c time step (cost per time step), µ (parent population size), λ

(offspring population size), #Strategy (selected resource purchasing strategy)

1: t = 0,c = 0(globalvariablesrepresentingthe currenttime step and costs), Pop = ∅ (current

population), OffPop = ∅ (offspring population)

2: while |Pop| < µ ∧ t < T ∧ c < C do

3: generate solution ⃗x at random

4: ⃗x = functionWrapper(⃗x,t)

5: Pop = Pop∪{⃗x}; t++

6: while t < T ∧ c < C do

7: OffPop = ∅

8: repeat

9: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

10: OffPop = OffPop∪{⃗x (1) }∪{⃗x (2) }

11: until |OffPop| = λ

12: if Z ≠ ∅ ∧ (#Strategy=JIT ∨ #Strategy=JITR) then

13: useUpOldComposites(OffPop)

14: if #Strategy=JITR then

15: per**for**mRepairing(OffPop)

16: if #Strategy=JIT ∨ #Strategy=JITR) then

17: alignOrdersAndSolutions(OffPop)

18: **for** i = 0 to |OffPop| do

19: if t < T ∧ c < C then

20: ⃗x i = functionWrapper(⃗x i ,t) // ⃗x i represents the ith solution of OffPop

21: t++; c+=c time step

22: **for**m new Pop by selecting the best µ solutions from the union population Pop ∪ OffPop

23: functionWrapper(⃗x,t){ // Notice that Line 32to 36can only be enteredby the SW strategy

24: y t = null

25: if ⃗x is a null solution then

26: ⃗x t = ⃗x;

27: do the same task as in Line 4 of Algorithm 4.5

28: do the same task as in Line 5 of Algorithm 4.5

29: else

30: if ⃗x satisfies the ERC commCompERC(...) then

31: ⃗x t = ⃗x; y t = f(⃗x t )

32: else

33: if |κ(t)| = WS then

34: remove the element of κ(t) that has been added earliest

35: κ(t)∪⃗x comp // ⃗x comp denotes the composite required by ⃗x

36: ⃗x t =repair(⃗x,t); y t = f(⃗x t )

37: return ⃗x t and y t }

suboptimal composites. It is likely that the appropriate mutation rate depends

on the string length l of solutions, and the number of composite-defining bits; a

more thorough investigation is needed to confirm this.

5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 165

Table 5.5: Parameter settings of online resource-purchasing strategies.

Strategy Parameter Setting

JITR

SW

Number of shifting rounds mIter 500

Weighting factor w

0.1×max(0,4−⌊c/250⌋), where

c is the current cost counter

Per-bit mutation probability

**for** composites from κ(t), p κ

0.05

5.3.3 Experimental results

Ourfocusinthisstudyistoanalyzetheimpactofcommitment compositionERCs

inthepresence ofbudgetary constraints ratherthantimelimitations. Wethus set

the time limit T to an arbitrarily large number and terminate the **optimization**

upon reaching some budget C.

Figure 5.19 shows the probability of SW and JIT of achieving the population

average fitness of our base algorithm obtained in an ERC-free environment. For

SWtheper**for**manceimpactdependscruciallyonthenumberofstoragecells#SC

and the reuse number RN. From the left plot we observe that the per**for**mance

improves the more storage cells are available and the lower the reuse number

is. The reason there**for**e is that **for** these settings we order a larger number

of potentially different composites more frequently. This in turn reduces the

probability of needing to repair and if in case it needs to be repaired it increases

the diversity of the repaired solutions among the composite-defining bits.

From the right plot of Figure 5.19 we observe that the per**for**mance of JIT improves

as the time lag decreases and the reuse number increases. As this strategy

does not repair and composites are gratis in this experiment, any positive effect

on the per**for**mance comes from a shorter waiting period between transitions of

population generations. While a shorter time lag has a direct positive effect on

the waiting period, a greater reuse number allows an optimizer to evaluate more

solutions from a new population using old composites and thus to compensate a

slightly larger time lag. An increase in the number of storage cells #SC, results

not shown here, can have a positive per**for**mance effect because one is more likely

to have a required composite in storage and thus can reduce the waiting time

between population generations.

166 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Sliding window - Probability of achieving the

population average fitness of an ERC-free **search**

64

0.6

Just-in-time - Probability of achieving the

population average fitness of an ERC-free **search**

64

0.6

52

0.5

52

0.5

Reuse number RN

40

28

16

0.4

0.3

0.2

0.1

Reuse number RN

40

28

16

0.4

0.3

0.2

0.1

4

3 12 21 30 39 48

Number of storage cells #SC

0

4

0 12 24 36 48

Time lag TL

0

Figure 5.19: Plots showing the probability of SW (left) and JIT (right) of achieving

the population average fitness of our base algorithm obtained in an ERC-free environment

given a budget and time limit of C = T = 1500. For SW this probability

is shown as a function of #SC and RN **for** the ERC commCompERC(o(H # ) =

30,#SC,TL =10,RN,SL = RN), and **for** JIT it is shown as a function of TL and RN

**for** the ERC commCompERC(o(H # ) = 10,#SC =10,TL,RN,SL = RN); cost were

set to c order = 0, c time step = 1, C = 1500.

The per**for**mance of JITR obtained at large budgets is not significantly different

from the per**for**mance of JIT as a function of TL and RN. This is due to the

rather conservative cooling scheme of w. Let us now analyze whether there is a

per**for**mance difference between the two strategies when we vary the budget and

associate composite orders with costs c order > 0. Figure 5.20 shows pairwise per**for**mance

comparisons of the three strategies as a function of the the cost counter

c and the cost parameter c order . From the top left plot it is apparent that JITR

is able to locate fit solutions at a lower budget than required by JIT (see region

0 < c ≤ 600, 0 ≤ c time step ≤ 0.5). That is, repairing yields a more economical

**search** and should be preferred in situations where the number of evaluation steps

is limited due to a low budget and/or high costs.

From the top right plot of Figure 5.20 we observe that the weakness of JIT of

being too expensive or wasteful at low budgets and/or high costs is also apparent

when comparing it with SW. The bottom plot of Figure 5.20 indicates that

repairing improves per**for**mance significantly in these constraint setting regimes,

thoughtheper**for**manceofSWcannot bematched. Thereasonthat SWper**for**ms

so well in the presence of small budgets is that it avoids waiting **for** composite

orders to arrive. However, this is achieved at the cost of many repairs, which,

ultimately, may cause a shift in the **search** direction towards suboptimal composites.

An indication that such shifts in the **search** direction take place is the fact

5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 167

1.6

Ratio P(f(x) > f JITR )/P(f(x) > f JIT )

10 1

1.6

Ratio P(f(x) > f JIT )/P(f(x) > f SW )

10 4

Cost per time step c time_step

1.4

1.2

1

0.8

0.6

0.4

0.2

10 0

10 -1

10 -2

10 -3

Cost per time step c time_step

1.4

1.2

1

0.8

0.6

0.4

0.2

10 3

10 2

10 1

10 0

0

0 400 800 1200 1600

Cost counter c

10 -4

0

0 400 800 1200 1600

Cost counter c

10 -1

1.6

Ratio P(f(x) > f JITR )/P(f(x) > f SW )

10 2

Cost per time step c time_step

1.4

1.2

1

0.8

0.6

0.4

0.2

10 1

10 0

0

0 400 800 1200 1600

Cost counter c

10 -1

Figure 5.20: Plots showing the ratio P(f(x) > f JIT )/P(f(x) > f SW ) (left) and

P(f(x) > f JITR )/P(f(x) > f JIT ) (right) as a function of c and c time step **for** the ERC

commCompERC(o(H # ) = 10,#SC = 5,TL = 5,RN = 30,SL = 30), c order = 1;

here, x is a random variable that represents solutions drawn uni**for**mly at random

from the **search** space and f ∗ the population average fitness obtained with strategy

∗. Here, x is a random variable that represents solutions drawn uni**for**mly at random

from the **search** space and f ∗ the population average fitness obtained with policy ∗. If

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a higher average

best solution fitness than strategy ∗ and a greater advantage of ∗∗ is indicated by a

darker shading in the heat maps; similarly, if P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is

better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

that JIT and JITR can match, and sometimes outper**for**m SW **for** larger budgets

(see region 1000 < c ≤ 1600, 0 ≤ c time step ≤ 0.5 in the top right and bottom

plot).

The previous experiment looked at composite commitment ERCs featuring

a rather small number of storage cells SC, which is in favour of SW. We also

used a rather high order o(H # ) = 30 in the previous experiment. For SW, a high

order o(H # ) tends to reduce the risk of having identical composites on storage

and thus benefits the diversity in the population with respect to the compositedefining

solution bits. Let us now investigate whether the per**for**mance advantage

168 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Ratio P(f(x) > f JIT )/P(f(x) > f SW ) at C=1500

Ratio P(f(x) > f JIT )/P(f(x) > f SW ) at C=3000

50

10 2 50

10 1

Order of high level

constraint schema H # , o(H # )

42

34

26

18

10

10 1

10 0

10 -1

10 -2

Order of high level

constraint schema H # , o(H # )

42

34

26

18

10

10 0

10 -1

2

3 12 21 30 39 48

Number of storage cells #SC

10 -3

2

3 12 21 30 39 48

Number of storage cells #SC

10 -2

Figure 5.21: Plots showing is the ratio P(f(x) > f JIT )/P(f(x) > f SW ) **for** a budget

of C = 1500 (left) and C = 3000 (right) as a function of the number of #SC

and o(H # ) **for** the ERC commCompERC(o(H # ),#SC,TL =25,RN =25,SL =25),

c order =c time step = 1. Here, x is a random variable that represents solutions drawn uni**for**mly

at random from the **search** space and f ∗ the population average fitness obtained

with policy ∗. If P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a

higheraverage bestsolution fitnessthanstrategy ∗andagreater advantage of∗∗isindicatedbyadarkershadingintheheatmaps;

similarly, ifP(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1,

then ∗ is better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

of SW over the other two strategies can still be preserved when we vary #SC and

o(H # ). Figure 5.21 compares the per**for**mance between JIT and SW as a function

of#SC ando(H # ) **for**abudget ofC = 1500(left plot)andC = 3000(right plot);

JITR achieved a similar per**for**mance to JIT **for** the considered budget regimes.

For the lower budget (left plot), we observe that SW outper**for**ms JIT **for** rather

large orders o(H # ) and few storage cells #SC (see range 8 < o(H # ) < 40,

#SC < 15); SW is too wasteful in the presence of more storage cells. As the

budget increases (right plot), JIT is able to match the per**for**mance of SW and

eventually to overtake it, particularly if both #SC and o(H # ) are large.

5.3.4 Summary and conclusion

In this study we have considered an **optimization** scenario in which costly and

expendable resources (composites) are required in the evaluation of solutions,

and these composites had to be ordered in advance, kept in capacity-limited storage,

and used within a certain time frame. We proposed three online resourcepurchasing

strategies (see Section 5.3.1) and analyzed them over a number of

resource-constraint settings. A simple just-in-time (JIT) strategy, which can be

seenastheminimalstrategy, hasshowntobegenerallyeffectivebuttooexpensive

at early **optimization** stages. This drawback has been eliminated by intelligently

5.4. CHAPTER SUMMARY 169

repairing solutions during these stages (JITR). Besides these two reactive strategies

we also considered a sliding window (SW) strategy, which applied some from

of anticipation. This (simple) strategy per**for**ms better than JITR and JIT **for**

some constraint settings, although it is costly in the presence of much storage

space.

We can try out various modifications to the purchasing strategies to improve

their per**for**mance. Forinstance, **for**theSWstrategyonemay ensurethatdistinct

composites only are ordered from the sliding window, and also avoid having all

storage cells filled at any time. Furthermore, rather than ordering the composites

that were added to κ(t) most recently, one may order composites that have been

requested most frequently, or have proved to be successful in the past. In essence,

a sophisticated ordering strategy may trade off exploration of rarely-requested

composites (from the sliding window) with exploitation of composites that have

proved to be successful and/or were requested frequently. Alternative repairing

strategies **for** the JIT strategy may also prove fruitful.

5.4 Chapter summary

In this chapter we proposed and analyzed a variety of strategies **for** coping with

ERCs. Wefirst lookedatstaticstrategies **for**dealing withnon-evaluable solutions

arising due to ERCs. Experiments over a number of test functions suggest that

the appropriate strategy depends on the constraint parameters of the ERC rather

than the fitness landscape it is optimized over. If this turns out to be generally

true, then this means we can select a strategy offline given the common situation

that the ERCs are known be**for**ehand. Motivated by this observation we then

showed that learning offline when to switch between the static strategies during

an **optimization** process can yield better per**for**mance than the static strategies

themselves. Finally, we proposed several online resource-purchasing strategies **for**

coping with an ERC type that leaves the arrangement of resources to the optimizer.

Overall we can tentatively say that the most suitable constraint-handling

strategy depends on the constraint settings of the ERCs considered.

The next chapter focuses on the two remaining resourcing issues considered in

this thesis: **optimization** subject to changes of variables and lethal environment.

Chapter 6

Further Resourcing Issues in

Closed-Loop Optimization

In this chapter we propose and analyze strategies **for** coping with two further

resourcing issues in **closed**-**loop** **optimization**: problems subject to changes of

variables and lethal environments. The first resourcing issue, **optimization** subject

to changes of variables, is motivated by an experimental problem involving

the identification of effective drug combinations drawn from a non-static drug

library. The second resourcing issue, **optimization** in lethal environments, causes

the population of an EA to decrease in size (permanently) upon evaluating a poor

or lethal solution. We motivate this scenario, and devise and validate guidelines

**for** **tuning** EAs to cope with it.

6.1 Optimizationon problemssubjectto changes

of variables

Optimization problems subject to changes of variables belong to the class of dynamic

**optimization** problems, which we introduced in Section 3.4. The dynamic

component is related to the fitness landscape rather than the constraints (as was

the case with ephemeral resource-constrained problems). The next section motivates

**optimization** problems subject to changes of variables, and in Section 6.1.2

we propose a method, which we call fair mutation, specifically designed **for** coping

with such problems. The test problems we use to compare fair mutation against

four standard strategies from dynamic **optimization** are described Section 6.1.3.

170

6.1. CHANGES OF VARIABLES 171

Library of drugs at t = 0

Library of drugs at t = ∆g

Replace drugs c and e

with drugs f and g

a b

c d

e

a b f d g

Figure 6.1: An illustration of the effect of changing variables. In the (drug discovery)

scenario considered, this corresponds to replacing a number of drugs (here #v = 2) of

a library with new ones (every ∆g generations during an **optimization** process).

Theexperimental setupisoutlinedinSection6.1.4,andanempiricalstudyfollows

in Section 6.1.5. Section 6.1.6 concludes this study.

6.1.1 Motivation

Our motivation **for** considering **optimization** problems subject to changes of variables

is a completed experimental study (Small et al., 2011), in which we were

involved, concerned with the identification of effective drug combinations using

EAs. For a detailed description of the experimental setup please refer to Small

et al. (2011); here we focus solely on the effect of changing variables. In this

study, a human experimentalist needs to arrange a library (of size l) of promising

or relevant drugs to be considered by an optimizer. In our case, each drug i

corresponds to a single binary variable x i indicating whether the drug is included

into a drug mixture (x i = 1) or not (x i = 0); the drug mixture itself represents

an entire solution ⃗x. In applications of this type, the experimental equipment is

often set up such that multiple experiments (which are here the mixing of drugs

and subsequent testing of the mixtures) can bedone in parallel. In our scenario, a

robot is taking care of the mixing of drugs, and this robot is able to mix up to 50

combinations in parallel. During the running experimental **optimization**, which

may take several months, the human experimentalist may decide to replace drugs

from the library with new ones (as indicated in Figure 6.1) because, **for** instance,

the old drugs have an undesired effect on the efficiency of a drug cocktail or are

simply not of interest anymore. 1 Hence, whenever a change of drugs takes place,

the corresponding variables and their effects on the fitness are changed as well.

Note, a change of drugs does not change the effectivity of cocktails that did not

use any of the replaced drugs; i.e. solutions with a 0-bit at any of the replaced

1 The situation where drugs are only added to or removed from an existing library is realistic

too but will not be considered in this study.

172 CHAPTER 6. FURTHER RESOURCING ISSUES

variables retain their fitness. Obvious ways of dealing with this are to carry on

regardless, or to restart **optimization** entirely. We wish to see if other strategies

fare any better.

Similar to other dynamic **optimization** problems, problems subject to changes

of variables feature a changing fitness function and thus a changing fitness landscape.

An example of a conceptually similar dynamic **optimization** problem is the

dynamic traveling salesman (DTSP) or vehicle routing problem (Larsen, 2000),

where cities may be replaced with new ones during the **optimization**. The two

main differences between our dynamic combinatorial problem and these existing

dynamic problems are that: (i) we do not need to detect changes in the landscape

because it is always known when variables are changed (sometimes the

exact generation of a change can even be controlled), and (ii) the fitness value

remains unchanged **for** a known subset of solutions (see above). The fact that in

our case solutions are evaluated by conducting physical experiments, whose number

is limited by resource restrictions, can be considered as another distinction to

previous analyses of dynamic problems. The presence of resource limitations requires

a quick tracking of newoptima. Fortunately, thanks to(i) and(ii), fulfilling

this task is usually easier in our scenario than, say, in DTSP.

6.1.2 Fair mutation

In Section 3.4 we introduced three main approaches **for** tracking optima in a

dynamic **optimization** environment: diversity control, memory-based, and multipopulation

approaches. The strategy we propose here **for** dealing with changes

of variables, which we call fair mutation, can be regarded as a diversity-control

approach. Unlike **for** the majority of strategies of this type, we do not need to answer

the question of how to detect environmental changes. Instead, the question

is rather how to exploit best the (often expensively obtained and partially intact)

in**for**mation contained in the current population in the generation of a new population

after exchanging some variables. Fair mutation introduces diversity into

the population only in the initial generations following an environmental change.

This is done by using effectively the in**for**mation stored in the current population

and by making use of the fact that only bits with value 1 have potentially an

effect. We explain the strategy in detail in the following.

6.1. CHANGES OF VARIABLES 173

Algorithm 6.1 Fair mutation

Require: p r (raised mutation rate), ∆h (number of generations **for** which p r is used)

1: applyFairMutation(Pop,λ,V,∆t){

2: if ∆t = 0 then

3: **for** each variable k ∈ V do

4: **for** 1 ≤ i ≤ ⌊λ/|V|⌋ do

5: generate an offspring ⃗x off as normal (if the application of selection and variation

operators yields two offspring, then select one of them at random); mutate ⃗x off

using a rate of p r ; set variable k of ⃗x off to 1 and add ⃗x off to offPop; i.e. offPop =

offPop ∪⃗x off

6: in the situation where |offPop| < λ, fill offPop in the same way as above but instead of

setting variable k to 1, set to 1 a variableselected from V at random without replacement

7: else

8: if 0 < ∆t ≤ ∆h then

9: generate offspring population as normal but **for** any offspring containing a 1-bit at any

of the variables in V, mutate the other bits using p r

10: else

11: generate offspring population as normal

12: return offPop}

6.1.2.1 Algorithm

The idea of fair mutation is to allow an optimizer both to continue **optimization**

as normal and at the same time to rapidly explore the space of solutions that use

any of the new variables. In our context the new variables represent a set of new

drugs that replaces a set of existing drugs in the drug library. We employ two

mechanisms to facilitate the exploration of the **search** space spanned by the new

variables: (i) allowing an optimizer to test each new variable (i.e. its bit being set

to1)withinthesamenumber ofoffspringsolutions, and(ii) enablinganoptimizer

to explore solutions with a 1-bit at any of the new variables by applying a raised

mutation rate p r **for** ∆h generations (p r and ∆h are user-defined parameters).

Algorithm 6.1 describes the method applyFairMutation(Pop,λ,V,∆t), which we

call to generate the offspring population OffPop at each generation. The method

takes as input the current population Pop, the number of offspring solutions λ to

be generated, the set of changed variables V, and the number of generations ∆t

passed after the last environmental change (or variable replacement step). We

assume that fair mutation is augmented on a generational EA with a (µ + λ)-ES

reproduction scheme. 2

From Line 8 and 9 in Algorithm 6.1 one can see that, in the first ∆h generations

after an environmental change, we apply a raised mutation rate (from which

2 For a steady state EA or λ = 1, we would call the method applyFairMutation(Pop,1,V,∆t)

multiple times thereby ensuring that each variable is tested an equal number of times.

174 CHAPTER 6. FURTHER RESOURCING ISSUES

1-bits among the new variables are protected) if an offspring has one or more 1-

bits among any of the new variables. Although this step allows an optimizer to

per**for**m exploration among solutions that use one or more of the new variables,

it also runs the risk that an offspring may be perturbed so much that it turns into

a poor one. However, when we embed this strategy within an (elitist) population

update scheme it achieves both the continued normal **optimization** of strings that

do not contain any of the new variables, and the fair and rapid exploration of

each of the subspaces (schemata) that contain a new variable. Alternatively, one

may decrease the mutation rate p r during the period of ∆h generations, or look

at other parameter controlling techniques (see Section 3.1.2) **for** **tuning** p r .

6.1.3 Testing Environment

To validatetheper**for**manceoffairmutationwe useamain-and-joint effects (MJ)

problem, which models the real drug mixture problem we are interested in. We

present the MJ model in the following section. We also consider variable changes

on NK landscapes as a comparison.

6.1.3.1 MJ model

In the application described in (Small et al., 2011), the experience of the human

experimentalist and a previously per**for**med pilot study suggested that only some

drugs (bits) of the library have a positive effect while others have a negative side

effect or no effect at all. Moreover, it is believed that interactions among the

considered drugs can be described by main and joint effects only with higher

order interaction effects being negligible.

Let M denote the number of randomly selected bits x h(q) ,q = 1,...,M ≤ l

(l is the total number of bits) which are assigned a main effect; h(q) is the variable’s

index of the qth main effect. And let J denote the number of randomly

selected distinctive bit pairs (x i(r) ,x j(r) ),r = 1,...,J,i(r) ≠ j(r), which are assigned

a joint effect; i(r) and j(r) are the indexes of the two variables that have

the rth joint effect. In our case, the strengths of all main effects m h(q) and joint

effects g i(r),j(r) are drawn uni**for**mly from the interval [−3,1]. That is, on average,

a quarter of all main and joint effects will be positive; negative effects can model

undesired side effects associated with (pairs of) drugs. As the purpose of this

6.1. CHANGES OF VARIABLES 175

model is to model the real drug mixture problem at hand, we assume that variables

or drugs have only an effect if they are included in a drug mixture. Hence,

the strength of a main effect m h(q) or joint effect g i(r),j(r) is only considered if the

variable x h(q) , respectively, both variables x i(r) and x j(r) are 1-bits. To evaluate

a candidate solution, we look up all the main and joint effects which are on and

take theaverageofthesumofall these values. Inother words, thefitness function

f to be maximized is defined as

(

f(⃗x) = 1 M

)

∑

J∑

m h(q) ·x h(q) + g i(r),j(r) ·x i(r) ·x j(r) . (6.1)

l

q=1 r=1

6.1.3.2 NK landscapes

The second test problem we consider are NK landscapes, or, equivalently, NKα

landscapes with the setting α = 0. Please refer to Section 5.1.2.2 **for** a description

of the test problem. Compared to an MJ model, an NK landscape assigns

a positive effect (drawn here from [0,1]) to each of the possible 2 K+1 bit-wise

neighborhood configurations; i.e. 0 and 1-bits are treated equally. For fair mutation

this means that (i) offspring solutions devoted to a new variable need to

be set to 1 and to 0 in equal number (this concerns the **for**-**loop** in Line 4 of

Algorithm 6.1), and (ii) new variables are always protected from mutation (this

concerns the if-clause in Line 8 of Algorithm 6.1).

6.1.3.3 Realizing a change of variables

On an NK landscape, a change of a variable involves reselecting its K neighbors

and reinitializing the corresponding 2 K+1 effects (of the neighborhood configurations).

Similarly, on a MJ model, if the replaced variables had a main and/or

joint effect, then these effects are reinitialized and the joint bits reselected. Recall

that we select the joint bits such that we end up with J distinctive bit pairs only.

In this study, a change of variables is per**for**med every ∆g generations whereby

always #v randomly selected variables are changed. That is, while ∆g controls

the frequency of environmental changes, #v controls the severity of a change.

176 CHAPTER 6. FURTHER RESOURCING ISSUES

6.1.4 Experimental setup

We compare the per**for**mance of fair mutation (denoted as #Strategy=1 in Algorithm

6.2) on dynamic NK landscapes and MJ models against four standard

strategies from dynamic **optimization**: triggered hypermutation (#Strategy=2),

restart of the **optimization** (#Strategy=3), re-evaluating the current population

after changing variables and then carrying on as normal (#Strategy=4), and a

niching algorithm (#Strategy=5). We will also show partial results of a strategy

called random immigrants (#Strategy=6).

We introduced the idea of triggered hypermutation (Cobb, 1990) and random

immigrant (Grefenstette, 1992) in Section 3.4. Here, we will outline the working

procedures of the two diversity-controlled approaches in more detail. The

triggered hypermutation approach is typically employed within a generational

EA, with crossover and a small per-bit mutation probability p m,b (also referred

to as base mutation rate). The method monitors the quality of the fitness of the

best individual in the population over time, and when this measure declines, it

is assumed that the environment has changed and the mutation rate is increased

drastically (by some factor HR) **for** a single generation. The obvious drawback

of this approach is that changes in the landscape occurring in regions that are

not populated by an EA will not trigger the hypermutation. The random immigrants

approach circumvents this drawback by promoting exploration of the

**search** space continuously. In fact, every generation, it replaces a percentage p RI

of the population by randomly generated individuals.

As the niching algorithm, we use deterministic niching, whose procedure we

already described in Algorithm 3.2 of Section 3.4.

Parameter settings: We augment all dynamic **optimization** strategies (except

the niching algorithm) on an EA with a (µ+λ)-ESreproduction scheme; the niching

algorithm is elitist in its own way as offspring solutions are allowed to replace

their parents only if they are fitter. The EA is identical to the one described in

Section 5.1.2.1 using also the same setup (see Table 5.1). Algorithm 6.2 shows

how we integrate the different dynamic **optimization** strategies as well as variable

changes (per**for**med in Line 9) into this EA.

The parameter settings of the different dynamic **optimization** strategies is

given in Table 6.1. Note from the table that the triggered hypermutation is set

up such that, once triggered, we use a mutation rate of p m,b × HR = 0.5 **for**

a single generation. The effect of mutating a solution with this probability is

6.1. CHANGES OF VARIABLES 177

Algorithm 6.2 Generational EA with dynamic **optimization** strategies **for** handling

problems subject to changes of variables

Require: f (objective function), G (maximal number of generations), µ (parent population

size), λ (offspring population size), #Strategy (number of selected dynamic **optimization**

technique)

1: g = 0 (generation counter), ∆t = 0 (number of generations past after the last variable

change), MA = −1 (moving average of the best-of-generation fitness), Pop = ∅ (current

population), OffPop = ∅ (offspring population)

2: while |Pop| < µ ∧ g < G do

3: generate solution ⃗x at random

4: evaluate ⃗x using f

5: Pop = Pop∪{⃗x}; g++; update MA

6: while g < G do

7: OffPop = ∅

8: if g mod∆g = 0 then

9: select #v variables to be replaced at random, and add their indices to the set of

changed variables, V; reinitialize the fitness effects of the new variables in case the

replaced variables were previously associated with any (i.e. generate a new objective

function f ); set ∆t = 0

10: if #Strategy=3 then

11: reinitialize Pop with random solutions

12: else

13: re-evaluate all solutions in Pop using (the new objective function) f

14: if #Strategy=1 then

15: OffPop = applyFairMutation(Pop,λ,V,∆t)

16: else

17: if #Strategy=2 ∧ MA declined then

18: generate λ solutions at random to **for**m the population OffPop

19: if #Strategy=5 then

20: create new population Pop according to Algorithm 3.2 of Section 3.4

21: else

22: **for** i = 0 to λ/2 do

23: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

24: OffPop = OffPop∪{⃗x (1) }∪{⃗x (2) }

25: if #Strategy=6 then

26: replace p RI ×λ solutions of OffPop with random solutions

27: if #Strategy≠5 then

28: evaluate all solutions of OffPop using f

29: update MA based on the best solution fitness in OffPop

30: **for**m new Pop by selecting the best µ solutions from the union population Pop ∪

OffPop

31: g++; ∆t++

identical to generating a solution at random. Hypermutation is triggered when

there is a decrease in the moving average of thebest-of-generation fitness over five

generations. These arestandardparameter setting combinations **for**this strategy,

and so is the setting of the replacement rate p RI **for** the random immigrants

178 CHAPTER 6. FURTHER RESOURCING ISSUES

Table 6.1: Parameter settings of dynamic **optimization** techniques.

Strategy Parameter Setting

Fair mutation

Raised mutation rate p r 1/l

Number of generations **for**

which p r is applied, ∆h

0

Triggered Base mutation rate p m,b 0.001

Hypermutation Hypermutation rate HR 500

Random Immigrants Replacement rate p RI 0.3

approach. Note, triggered hypermutation is typically employed within a nonelitist

EA. We integrate it into an elitist EA because the per**for**mance obtained

with that EA proved to be better on the test problems considered.

We are mainly interested in the dynamics resulting from changing a small

number of variables frequently as this is the scenario we are faced with in (Small

et al., 2011). On this class of problems, the parameter setting **for** which fair

mutation per**for**med best is ∆h = 0 and p r = 1/l, which will also be used here

(see Table 6.1). The interpretation of this parameter setting is that new variables

are protected from mutation only at the generation at which an environmental

change occurs, and that the raised mutation rate is the same as the standard

mutation rate (i.e. 1/l). Higher values of ∆h and p r are required **for** more severe

environmental changes and different interval ranges of main and joint effects. At

the generation of an environmental change, all strategies (apart from the restart

strategy, see Line 10 and 11 of Algorithm 6.2) re-evaluate the current population

be**for**e they generate an offspring population.

Ouraimistounderstandwhateffect changesofvariableshaveon**evolutionary**

**search** **for** different settings of #v and ∆g. For ease of analysis we there**for**e fix

the problem parameters of the NK landscape and MJ model to N = l = 30 and

K = 2, respectively, M = 30 and J = 30. That is, in the MJ model all bits

have a main effect and 30 bit pairs have a joint effect. The **search** space size l

corresponds also to the size of the drug library considered in our experimental

study (see Small et al. (2011)). Table 6.2 summarizes the test problem settings.

Any results shown are average results across 100 independent algorithm runs.

Per**for**mance measurement: To measure the per**for**mance of the investigated

6.1. CHANGES OF VARIABLES 179

Table 6.2: Parameter settings of test functions f modelling problemssubjectto changes

of variables.

Test function Parameter Setting

MJ model

NK landscapes

Solution parameters l 30

Number of main effects M 30

Number of joint effects J 30

Solution parameters N = l 30

Neighborhood size K 2

strategies we use threemeasurements: accuracy, adaptability, andaverage best-ofgeneration

fitness. AccuracyandadaptabilityhavebeenproposedbyTrojanowski

and Michalewicz (1999) and rate an algorithm’s overall per**for**mance with a single

number. Both metrics are offline per**for**mance measures (De Jong, 1975) and rely

on the assumption that the time steps at which the environment changes are

known be**for**ehand. The accuracy value represents the fitness difference between

the best solution in the population at the generation just be**for**e a change and the

optimal fitness in that environment, averaged over the entire run. Let S denote

thetotalnumber ofenvironmental changesduringan**optimization**process, andas

be**for**e ∆g the number of generations between successive environmental changes.

We can define the accuracy metric as

Acc = 1 S

S∑

∆e i,∆g−1 , (6.2)

i=1

where∆e i,j isthefitnessdifferencebetween thebestsolutionatthejthgeneration

after the last environmental change (j ∈ [0,∆g −1]), and the optimal fitness of

the landscape after the ith environmental change (i ∈ [0,S −1]).

Theadaptabilityvaluerepresentsthefitnessdifferencebetweenthebestfitness

of each generation and the optimal fitness in the current environment, averaged

over the entire run. We can **for**mally define this metric as

Ada = 1 S

(

S∑

i=1

1

∆g

∆g−1

∑

j=0

∆e i,j

)

. (6.3)

For both metrics the optimal solution in an environment is approximated by

using the best solution found from running an elitist generational EA **for** 100

180 CHAPTER 6. FURTHER RESOURCING ISSUES

generations, 100 times independently. It is easy to see that an algorithm is better

thesmaller its accuracyandadaptabilitymeasurements are. As Acc decreases the

per**for**mance of an algorithm in tracking the optima within the ∆g generations

improves; a value of Acc = 0 means that the algorithm was always able to find

the optimum within ∆g generations. A small value of Ada indicates that the best

solution fitness found during an **optimization** run was always close to the optimal

fitness; a value of Ada = 0 means that the optimal fitness has never been lost.

The average best-of-generation fitness is a more standard per-generation measure,

suitable **for** plotting.

6.1.5 Experimental results

Figure 6.2 to 6.7 show the best-of-generation fitness of all dynamic **optimization**

strategies (except random immigrants) on MJ models and NK landscapes **for**

three different settings of #v and ∆g: #v = 2 and ∆g = 10 (Figure 6.2 and 6.3),

#v = 8 and ∆g = 25 (Figure 6.4 and 6.5), and #v = 15 and ∆g = 40 (Figure 6.6

and 6.7). The accuracy and adaptability measurements of all strategies (including

random immigrants) on these problem instances are shown in Table 6.3. For

comparison reasons, the table includes also the results obtained by triggered hypermutationusingabasemutationrateofp

m,b = 1/l = 1/30andahypermutation

rate of HR = 15. This base mutation rate is identical to the per-bit mutation

rate p m employed by the other dynamic **optimization** strategies; the hypermutation

rate is again set such that hypermutated solutions are equivalent to random

solutions.

For each of the three problem instances it is apparent that all strategies per**for**m

better on the MJ model than on the NK landscape. This is due to the

higher degree of variable interactions featured by the NK landscape, which tends

to make the problem more difficult to solve. Let us analyze the per**for**mance of

the strategies on each of the problem instances in turn.

Figure 6.2 and 6.3 investigate the situation where the environment changes

only little but frequently. This is the most realistic and **for** us most interesting

case. Here we observe that:

• Environmental changes are easier to detect on NK landscapes (Figure 6.3)

than on the MJ model (Figure 6.2) because all variables have an effect

regardless of whether they are 0 or 1-bits (as opposed to 1-bits only on the

6.1. CHANGES OF VARIABLES 181

MJ model, #v=2, ∆g=10

Average best-of-generation fitness

0.1

0

-0.1

-0.2

-0.3

0 20 40 60 80 100

#Generations

Average best-of-generation fitness

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

Zoom on MJ model

0

70 75 80 85 90 95 100 105 110

#Generations

Figure 6.2: Plots showing the average best-of-generation fitness obtained by various

dynamic **optimization** strategies on a MJ model with #v = 2,∆g = 10. The bottom

plot zooms into the area indicated by the box in the top plot.

MJ model).

• Using a restart approach or triggered hypermutation results in a poor per**for**mance,

especially on NK landscapes (Figure 6.3). In the case of the

restart strategy, the reasons are the low frequency and severity of environmental

changes; there is simply insufficient time **for** the EA to converge to

a good population state be**for**e the environment changes again. The poor

per**for**mance of triggered hypermutation is due to a similar reason: while

environmental changes are reliably detected on this problem, the low base

mutation rate does not allow **for** a sufficiently quick convergence to fit regions

in the **search** space. From the accuracy and adaptability measures in

Table6.3weobserve thatahigherbasemutationrateimproves per**for**mance

significantly.

182 CHAPTER 6. FURTHER RESOURCING ISSUES

Average best-of-generation fitness

Average best-of-generation fitness

0.76

0.74

0.72

0.7

0.68

0.66

0.64

0.62

0.6

NK landscape, #v=2, ∆g=10

0.58

0 20 40 60 80 100

#Generations

0.76

0.75

0.74

0.73

0.72

0.71

0.7

0.69

0.68

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

Zoom on NK landscape

0.67

20 25 30 35 40 45 50 55 60

#Generations

Figure 6.3: Plots showing the average best-of-generation fitness obtained by various

dynamic **optimization** strategies on NK landscapes with #v = 2,∆g = 10. The bottom

plot zooms into the area indicated by the box in the top plot.

• The other three strategies yield similar per**for**mances though a slight advantage

is apparent **for** fair mutation, which is closely followed by the reevaluation

strategy; this is also confirmed by the accuracy and adaptability

measurements. Fair mutation seems to recover significantly faster than

the other strategies when there is a severe change in the landscape; see

generation 90 and 100 on the NK landscape (Figure 6.3) and MJ model

(Figure 6.2), respectively.

• The niching algorithm maintains a relatively high population diversity

throughout the **search** as it **search**es within several promising niches simultaneously.

This slows down the convergence, as canbe seen at the beginning

of the **optimization**, but it also increases the probability of finding new optima

quicker after undergoing an environmental change; see e.g. generation

50 and 80 on the NK landscape (Figure 6.3) and MJ model (Figure 6.2),

respectively.

6.1. CHANGES OF VARIABLES 183

Average best-of-generation fitness

MJ model, #v=8, ∆g=25

0.2

0.1

0

-0.1

-0.2

-0.3

-0.4

0 50 100 150 200 250

#Generations

Average best-of-generation fitness

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

Zoom on MJ model

180 200 220 240 260

#Generations

Figure 6.4: Plots showing the average best-of-generation fitness obtained by various

dynamic **optimization** strategies on a MJ model with #v = 8,∆g = 25. The bottom

plot zooms into the area indicated by the box in the top plot.

From Figure 6.4 and 6.5, where more variables are changed less frequently we

observe that:

• Changing a larger number of variables alters the fitness landscape more

severely and thus simplifies detecting environmental changes.

• The restart approach can sometimes outper**for**m the other strategies on the

NK landscape (Figure 6.5). This is, **for** example, the case at generation 50,

where the landscape seems to undergo a relatively severe change.

• Fair mutation outper**for**ms the other strategies again slightly. The fact that

only a small number of offspring solutions can be devoted to each new

variable reduces its ability to explore the **search** region spanned by the new

variables effectively.

• The deficit in the convergence speed of the niching algorithm is now more

apparent. This is because a more severe landscape change reduces the probability

that any of the currently occupied niches is close to the new optimal

184 CHAPTER 6. FURTHER RESOURCING ISSUES

Average best-of-generation fitness

NK landscape, #v=8, ∆g=25

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

0.64

0.62

0.6

0 50 100 150 200 250

#Generations

Average best-of-generation fitness

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

0.64

Zoom on NK landscape

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

180 200 220 240 260

#Generations

Figure 6.5: Plots showing the average best-of-generation fitness obtained by various

dynamic **optimization** strategies on NK landscapes with #v = 8,∆g = 25. The bottom

plot zooms into the area indicated by the box in the top plot.

**search**region; i.e.thepopulationneedstoshiftits**search**focusmoreseverely

and this takes time when trying to maintain several niches simultaneously.

Figure 6.6 and 6.7 consider the case where half of all solution variables are

replaced with new ones every ∆g = 40 generation. This setting may simulate the

case where a human experimentalist wants to test many different drugs without

being able (e.g. due to storage or budget limitations) or wanting to change the

drug library size. From Figure 6.7 and the accuracy and adaptability measurements

inTable 6.3 it isapparent that a restart approach significantly outper**for**ms

the other strategies on the NK landscape. On the MJ model (Figure 6.6), a

restart approach yields best results in terms of the accuracy metric with fair mutation

being best in terms of the adaptability metric. That is, a restart approach

is able to find, on average, a better solution fitness than fair mutation within the

∆g generations available between successive environmental changes. Fair mutation,

on the other hand, tends to be on average closer to the optimal fitness at

all times than the restart approach; this is due to the quicker convergence of fair

6.1. CHANGES OF VARIABLES 185

Average best-of-generation fitness

Average best-of-generation fitness

0.2

0.1

0

-0.1

-0.2

-0.3

0.14

0.12

0.1

0.08

0.06

0.04

0.02

MJ model, #v=15, ∆g=40

0 50 100 150 200 250 300 350 400

#Generations

Zoom on MJ model

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

0

280 300 320 340 360 380 400 420 440

#Generations

Figure 6.6: Plots showing the average best-of-generation fitness obtained by various

dynamic **optimization** strategies on a MJ model with #v = 15,∆g = 40. The bottom

plot zooms into the area indicated by the box in the top plot.

mutation at the initial phase upon an environmental change (see bottom plot of

Figure 6.6). The very small number of offspring solutions devoted to each new

variable reduces the local **search** abilities of fair mutation considerably, and this

translates into long-term per**for**mance effects. The niching algorithm per**for**ms

worst on the NK landscape. The reason is again its deficit in the convergence

speed combined with the severity of the environmental changes. On the NK

landscape, triggered hypermutation per**for**ms sometimes quite well, losing often

only to the restart approach (see e.g. generation 200, 250, and 440 in Figure 6.7).

6.1.6 Summary and conclusion

In this study we have compared different dynamic strategies **for** enabling a generational

EA to cope with problems that are subject to changes of variables,

motivated by a real problem in drug discovery. We proposed a strategy, which

we call fair mutation, specifically designed **for** dealing with such problems, and

compared it against five standard strategies applied in dynamic **optimization**.

186 CHAPTER 6. FURTHER RESOURCING ISSUES

NK landscape, #v=15, ∆g=40

Average best-of-generation fitness

0.75

0.7

0.65

0.6

0.55

0 50 100 150 200 250 300 350 400

#Generations

Average best-of-generation fitness

0.78

0.76

0.74

0.72

0.7

Zoom on NK landscape

0.68

0.66

Restart

Reevaluate

0.64

Niching

Trig. hypermutation

0.62 Fair mutation

Optimal fitness

0.6

280 300 320 340 360 380 400 420 440

#Generations

Figure 6.7: Plots showing the average best-of-generation fitness obtained by various

dynamic **optimization** strategies on NK landscapes with #v = 15,∆g = 40. The

bottom plot zooms into the area indicated by the box in the top plot.

The results have shown that only little additional diversity if at all needs to be

introduced into the population when changing a few variables frequently, or optimizing

a landscape with a low degree of variable interactions. Here, very good

results were also obtained using a niching algorithm or a standard elitist EA that

simply re-evaluates the population after a variable change and then carries on

as normal. When changing many variables (around 10 or more), particularly on

a quite epistatic fitness landscape, restarting the **optimization** from scratch has

shown to be often the best choice.

Future work should investigate how to exploit further the correlation existing

between the changed variables and the parts of the landscape undergoing a

change, featured by **optimization** problems subject to changes of variables. If the

optimizer were allowed to decide the point of time (generation) at which variables

are exchanged, then optimizing this aspect might be worth considering in the design

of strategies too; this kind of scenario has been considered by Olhofer et al.

(2001) **for** a shape design **optimization** problem but more work is needed to fully

6.2. LETHAL ENVIRONMENTS 187

Table 6.3: Accuracy and adaptability results of various dynamic**optimization** strategies

on different problem instances of dynamic NK landscapes and MJ models. For each

problem instance and metric, we highlighted all strategies in bold face that are not

significantly worse than any other strategy. The statistical test applied here is the

non-parametric Kruskal-Wallis test with a significance level of 5% (2-sided).

Dynamic **optimization** Met- #v = 2,∆g = 10 #v = 8,∆g = 25 #v = 15,∆g = 40

strategy ric NK MJ NK MJ NK MJ

Fair mutation

Restart

Re-evaluation

Niching

Acc 0.0163 0.0155 0.0150 0.0002 0.0193 0.0011

Ada 0.0242 0.0344 0.0298 0.0210 0.0480 0.0279

Acc 0.0520 0.0815 0.0151 0.0022 0.0076 0.0003

Ada 0.0942 0.2133 0.0552 0.1232 0.0395 0.0609

Acc 0.0170 0.0163 0.0202 0.0003 0.0202 0.0012

Ada 0.0252 0.0350 0.0302 0.0220 0.0516 0.0282

Acc 0.0242 0.0290 0.0188 0.0032 0.0264 0.0012

Ada 0.0321 0.0471 0.0314 0.0271 0.0560 0.0296

Triggered hyperm. Acc 0.0387 0.0313 0.0376 0.0334 0.0241 0.0290

p m,b = 0.001,HR = 500 Ada 0.0492 0.0554 0.0586 0.0778 0.0552 0.0833

Triggered hyperm Acc 0.0194 0.0195 0.0163 0.0004 0.0146 0.0015

p m,b = 1/l,HR = 15 Ada 0.0299 0.0442 0.0360 0.0354 0.0487 0.0444

Random Acc 0.0214 0.0228 0.0209 0.0011 0.0231 0.0019

immigrants Ada 0.0297 0.0434 0.0357 0.0266 0.0537 0.0337

understand how algorithmic choices affect per**for**mance. Furthermore, in **closed****loop**

applications like our scenario, variables might not only be associated with

resources (e.g. drugs) but their use may also be subject to resource-constraints:

e.g. the use of drugs might be associated with costs, or drugs might have to be

bought in batches. Here, a good strategy **for** dealing with changing variables

has to account **for** both fitness gradients and variable experimental costs. For

instance, while a restart approach is likely to be quite expensive under these circumstances,

a strategy that aims at wasting very little resources might take too

long to find fit solutions. Thus, future re**search** into the development of strategies

that per**for**m well in the presence of resource-constraints is needed too.

6.2 Optimization in lethal environments

In this section we consider a resourcing issue related to what we term ‘lethal

environments’. Unlike problems subject to ephemeral resource constraints and

188 CHAPTER 6. FURTHER RESOURCING ISSUES

changes of variables, this resourcing issue yields an **optimization** problem featuring

a static feasible region and fitness landscape. However, the population size

may reduce over the course of the **optimization** as a function of poor (lethal) solutions

evaluated. We define a lethal solution as one with a fitness below a certain

threshold. This setup models certain **closed**-**loop** evolution scenarios that may

be encountered, **for** example, when evolving nano-technologies or autonomous

robots. The next section motivates this **optimization** scenario with a specific example.

We then look at the **tuning** of some of the basic EA configuration parameters

**for** coping with lethal environments: degree-of-elitism, population size,

selection pressure, mutation mode and strength, and crossover rate. We also vary

aspects of the fitness landscape we are optimizing to observe which landscape

topologies pose a particular challenge when optimizing in a lethal environment.

Section 6.2.2 will describe the setup of this study, and the study itself will be

conducted in Section 6.2.3. Finally, Section 6.2.4 concludes this study.

6.2.1 Motivation

We consider the use of EAs **for** **optimization** in a setting where the notions of an

individual and of a population size, are slightly different from standard, which

**for**ces us to reconsider how to configure an EA appropriately. The general setting

is **closed**-**loop** **optimization** but our concern is **for** a particular sort of **closed**-**loop**

setting where the hardware on which individuals are tested are reconfigurable,

destructible and non-replaceable.

Consider the following scenario. We are developing, by evolution, the control

software **for** an autonomous endoscopic robot (similar to Moglia et al. (2007);

Ciuti et al. (2010)), which is ultimately intended to be swallowed by a patient in

capsule **for**m, and then used within the body **for** screening, diagnosis and therapeutic

procedures. Be**for**e robotic capsules are used on humans, however, their

reliability and effectiveness typically needs to be first validated through in vivo

animal trials. Imagine we have available µ prototypes of a robotic capsule, and

we can “radio in” new control software to each of these individual capsules and

evaluate them. However, our control software may cause a capsule to malfunction

in a lethal (**for** it) way, in which case it is no longer available (and falls out of the

evolving population). In such a scenario, our population size **for** the remainder

of the evolution is now at most µ −1, in terms of how many pieces of software

can be tested each generation. If we continue to explore too aggressively, we may

6.2. LETHAL ENVIRONMENTS 189

upload other pieces of software that cause the loss of individuals, which will cost

evolution in terms of the extent of parallelism (and hence efficiency in time) we

enjoy, and also perhaps in the loss of expensive hardware. So whilst we wish to

innovate by testing new control policies, this must be balanced carefully against

considerations of maintaining a reasonable population size.

This setting may seem far-fetched to some readers. However, as we see evolution

being applied more and more at the boundary between a full embodied

type of evolution (where the hardware individuals may reproduce or exchange in**for**mation)

3 and a fully in silico simulation-only approach, especially as EAs are

taken up in the experimental sciences more and more, it is likely that variants of

the above scenario will emerge. It need not be robotic capsules; it could equally

be nano-machines or drone planes we are dealing with; it could be robot swarms,

or software released on the Internet, or anything that is reconfigurable remotely

in some way, and also destructible.

Clearly, theprincipleof“survivalofthefittest”isgoingtobesomewhat dampened

by the constraint that we wish all designs to survive. Indeed, no innovation

at all will be possible if we en**for**ce the evaluation of only safe designs, given the

assumption that any unseen design could be lethal. So, as is usual with evolution,

we should expect that some rate of loss of individuals is going to be desirable, but

set against this is the fact that they cannot be replaced, and hence population

size and parallelism will be compromised. So, the question is, how should one

control the exploration/exploitation trade-off in this **optimization** setup.

In more natural settings than we consider here, the notion of robustness and

the tendency of evolution to build in mutational robustness **for** free has been

previously studied, especially in the context of evolution on neutral plateaux (see

e.g. Bullock (2003); Schonfeld and Ashlock (2004); Schonfeld (2007)). Bullock’s

workinparticularshowsthatcertainin-builtbiasestowardmutationalrobustness

can retard innovation (our primary concern here). He proposed methods that

avoidthebiasandhenceexploreneutral plateauxmorerapidly. Althoughrelated,

this work has a different purpose to ours, and also involves different assumptions.

6.2.2 Experimental setup

This section describes the **search** algorithms **for** which we investigate the impact

of a lethal **optimization** scenario, and the parameter settings as used in the

3 For concrete examples of embodied evolution applications please refer to Section 2.2.

190 CHAPTER 6. FURTHER RESOURCING ISSUES

subsequent experimental analysis.

6.2.2.1 Search Algorithms

We consider three types of **search** algorithms in this study: a tournament selection

based genetic algorithm (TGA), a modified version of it, which we call

RBS (standing **for** reproduction of best solutions), and a population of stochastic

hill-climbers (PHC). Similar types of algorithms have been considered in the

previously mentioned studies related to mutational robustness.

Be**for**e we describe each algorithm, we first set out the procedures common

to all of them. These are the generation process of the initial population of candidate

solutions, the setting of the fitness threshold below which solutions are

deemed lethal, the mutation operators, and the handling of duplicate solutions.

Regarding the initialization, our aim is to simulate the scenario where the initial

population, whose size we denote by µ 0 (the reason **for** adding a subscript to µ is

described below), consists of evolving entities (e.g. roboticcapsules, robotswarms

or software) that are of a certain (high or state-of-the-art) quality. We achieve

this by first generating a sample set of S random and non-identical candidate solutions,

and then selecting the fittest µ 0 solutions from this set to **for**m the initial

population. The lethal fitness threshold f LFT is set to the fitness value of the qth

fittest solution of the sample set. This threshold is kept constant throughout an

algorithmic run.

For mutation, we will investigate two modes (similar to Barnett (2001)):

(i) Poisson mutation, where each bit is flipped independently with a mutation

rate of p m , and (ii) constant mutation, where exactly d randomly selected bits

are flipped. Poisson mutation is the more commonly used mutation mode of the

two and also the mode we have used in this thesis so far. Furthermore, all **search**

algorithms ensure that a solution is not evaluated multiple times. That is, all

evaluated solutions are cached and compared against a new solution (offspring)

be**for**e per**for**ming an evaluation; preliminary experimentation showed that avoiding

duplication improves the per**for**mance significantly in the presence of lethal

environments. If a solution has been evaluated previously, then the GA iteratively

generates new solutions (offspring solutions) until it generates one that is

evaluable, i.e. has not been evaluated previously, or until L trials have passed

without success. In the latter case, we reset the mutation rate to p m = p m +0.5/l

(l is the total number of bits) and d = d+1 in the case of Poisson mutation and

6.2. LETHAL ENVIRONMENTS 191

Algorithm 6.3 Tournament selection based GA (TGA)

Require: f (objective function), G (maximal number of generations), µ 0 (initial parent population

size), λ 0 (initial offspring population size), L (maximal number of regeneration

trials)

1: g = 0 (generation counter), Pop = ∅ (current population), OffPop = ∅ (offspring population),

AllEvalSols = ∅ (set of solutions evaluated so far), trials = 0 (regeneration trials

counter), counter = 0 (auxiliary variable indicating the number of solutions evaluated during

a generation)

2: initialize Pop and set lethal fitness threshold f LFT ; copy all solutions of Pop also to

AllEvalSols, and set µ g = µ 0 ,λ g = λ 0

3: while g < G ∧ µ g > 0 do

4: OffPop = ∅;trials = 0;counter = 0

5: set mutation rate to initial value

6: while counter < λ g do

7: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

8: **for** i = 1 to 2 do

9: if ⃗x (i) /∈ AllEvalSols ∧ counter < λ g then

10: evaluate ⃗x (i) using f; counter++; trials = 0

11: AllEvalSols = AllEvalSols∪⃗x (i)

12: if f(⃗x (i) ) ≥ f LFT then

13: OffPop = OffPop∪⃗x (i) //non-lethalsolutionsonlyareaddedtotheoffspring

population

14: else trials++

15: if trials = L then

16: depending on the mutation operator, reset mutation rate to p m = p m + 0.5/l

(Poisson mutation) or d = d+1 (constant mutation)

17: reset new population sizes to µ g = λ g = |OffPop|, and **for**m new population Pop by

selecting the best µ g solutions from the union population of Pop∪OffPop

18: g++

constant mutation, respectively; note, due to its deterministic nature, it is more

likely that this reset step is applied with constant mutation. For TGA and RBS

we set the mutation rates back to their initial values at the beginning of each

new generation. For PHC it makes more sense to set the mutation rates back

whenever a new hill-climber of the current population is considered **for** mutation

because of the independent **search** of the hill-climbers. The idea of increasing

the mutation strength temporarily is to increase an EA’s exploration abilities to

allow it to create solutions that have not been evaluated in the past but are close

to the **search** region currently under investigation.

Tournament Selection Based GA (TGA): The algorithm is an EA with a

(µ+λ)-ES reproduction scheme (see Algorithm 6.3), as we have used in previous

studies in this thesis. This elitist scheme per**for**med significantly better than a

standard generational reproduction scheme in lethal environments. We equip

192 CHAPTER 6. FURTHER RESOURCING ISSUES

the EA with tournament selection with replacement **for** parental selection (TS

shall denote the tournament size) and uni**for**m crossover (p c shall denote the

crossover rate). We explained above the two mutation modes to be investigated

later. We set the parent and offspring population size to be identical or µ =

λ. As the population size may decrease during the **optimization** process (due

to evaluated lethal solutions), in future, we will denote the population size at

generation g (0 ≤ g ≤ G) by µ g = λ g with g = G being the maximum number of

generations. Clearly, the maximum number of generations can only be reached if

the population does not “die out” be**for**ehand or µ g = 0 **for** g < G.

Reproduction of best solutions (RBS): This algorithm is based on TGA

andmotivated by the netcrawler of Barnett (2001). The netcrawler is designed to

efficiently solve fitness landscapes featuring neutral networks. It reproduces only

a single best solution at each generation using mutation only, and the mutation

strength adapts to the size of each neutral network of the landscape. Similar to

the netcrawler, RBS reproduces only the best solutions. More precisely, it selects

parents**for**reproductionexclusively andatrandomfromthesetofsolutionsofthe

current populationwiththehighest fitness. Hence, itcanbeseenasatournament

selection based GA with TS = µ g , and where the tournament participants are

selected without replacement. Apart from the selection procedure, RBS will use

the same algorithmic setup as TGA.

Population of Hill-Climbers (PHC): This **search** algorithmmaintains apopulation

of stochastic hill-climbers, which explore the landscape independently of

each other. Each hill-climber undergoes mutation, and, if the mutant or offspring

is not lethal, it replaces the original solution if it is at least as fit. Algorithm 6.4

shows the pseudocode of PHC. In essence, PHC with constant mutation is identical

to running a set of random-mutation hill-climbers (Forrest and Mitchell,

1993) in parallel with the difference that, as in the case of TGA, the mutation

strength is temporarily increased if a previously unseen solution cannot be generated

within L trials. PHC has similarities to Barnett’s netcrawler, and also

the plateau crawler of Bullock (2003). The working procedure of the plateau

crawler closely resembles a population of independent hill-climbers. Similar to

the netcrawler, it is has been designed to be immune to selection **for** mutational

robustness when optimizing fitness landscapes featuring neutral networks. The

property common to all three approaches (netcrawler, plateau crawler and PHC)

is that they guarantee that each parent potentially leaves a copy or a slightly

6.2. LETHAL ENVIRONMENTS 193

Algorithm 6.4 Population of hill-climbers (PHC)

Require: f (objective function), G (maximal number of generations), µ 0 (initial parent population

size), L (maximal number of regeneration trials)

1: g = 0 (generation counter), Pop = ∅ (current population), OffPop = ∅ (offspring population),

AllEvalSols = ∅ (set of solutions evaluated so far), trials = 0 (regeneration trials

counter), lethal = false (auxiliary boolean variable indicating whether a hill-climber is

lethal)

2: initialize Pop and set lethal fitness threshold f LFT ; copy all solutions of Pop also to

AllEvalSols, and set µ g = µ 0

3: while g < G ∧ µ g > 0 do

4: OffPop = ∅;trials = 0

5: **for** i = 1 to µ g do

6: set mutation rate to initial value and lethal = false

7: repeat

8: mutate solution ⃗x i of Pop to obtain mutant ⃗x ′ i

9: if ⃗x ′ i /∈ AllEvalSols then

10: evaluate ⃗x ′ i using f; trials = 0

11: AllEvalSols = AllEvalSols∪⃗x ′ i

12: if f(⃗x ′ i ) ≥ f LFT then

13: if f(⃗x ′ i ) ≥ f(⃗x i) then

14: OffPop = OffPop∪⃗x ′ i

15: else OffPop = OffPop∪⃗x i

16: else trials++

17: if trials = L then

18: depending on the mutation operator, reset mutation rate to p m = p m + 0.5/l

(Poisson mutation) or d = d+1 (constant mutation)

19: until ⃗x ′ i is evaluable

20: reset new population size to µ g = |OffPop|, and **for**m new population Pop by copying

all solutions from OffPop to Pop

21: g++

modified version of itself in the population. This ensures that offspring, parents,

or other ancestors, are equally likely to be selected as future parents. For fitness

landscapes with neutral networks, this feature can cause a population to be unaffected

by mutational robustness as it allows the populationto spend at each point

on a neutral network an equal amount of time (Hughes, 1995; Bullock, 2003).

6.2.2.2 Parameter Settings

The experimental study will investigate different settings of the parameters involved

in the three **search** algorithms (TGA, RBS and PHC). However, if not

otherwise stated, we use the default settings as given in Table 6.4 with constant

mutation being the default mutation operator. Notice that this setup, especially

the setting of p c ,TS, and the mutation mode, is significantly different from previous

algorithm setups employed in this thesis. Recall that RBS uses the same

194 CHAPTER 6. FURTHER RESOURCING ISSUES

Table 6.4: Default parameter settings of **search** algorithms.

Algorithm Parameter Setting

TGA,

RBS,

PHC

TGA

RBS

Initial parent population size µ 0 30

Constant mutation rate d 1

Regeneration trials L 1000

Number of generations G 200

Sampling size S 1000

Solution rank q **for** lethal

fitness threshold setting

250

Initial offspring population size λ 0 30

Tournament size TS (selection

with replacement is applied)

4

Crossover rate p c 0.0

Initial offspring population size λ 0 30

Tournament size TS (selection

without replacement is applied)

µ g ,0 ≤ g ≤ G

Crossover rate p c 0.0

default parameter settings as TGA with the difference that reproductive selection

is done with a tournament size of TS = µ g ,0 ≤ g ≤ G, and tournament

participants being chosen without replacement.

To investigate the effect of lethal environments on the per**for**mance of the

algorithms introduced above we consider NKα landscapes (see Section 5.1.2.2 **for**

a description of this problem). We analyze the impact of lethal solutions using

different settings of the neighborhood size K and the model parameter α. Recall

that the parameter K allows us to control the degree of epistasis or ruggedness

of the landscape (larger values of K represent more epistasis), while the model

parameter α allows us to specify how influential some variables may be compared

to others (larger values of α assign more influence to a minority of variables).

We fix the **search** space size to N = l = 50, a setting we have used and justified

previously.

Any results shown are average results across 100 independent algorithm runs.

As be**for**e, we will use a different randomly generated problem instance **for** each

run but, of course, the same instances **for** all algorithms. To allow **for** a fair

comparison, we also use the same initial population and (hence) lethal fitness

threshold **for** all algorithms in any particular run.

6.2. LETHAL ENVIRONMENTS 195

Model parameter α

2

1.5

1

0.5

0

TGA - Average best solution fitness

obtained in lethal environment

0 1 2 3 4 5 6 7 8 9 10

0.77

0.76

0.75

0.74

0.73

0.72

0.71

0.7

0.69

0.68

0.67

0.66

Model parameter α

1.5

0.5

TGA - Average absolute difference between

the best solution fitness found in

a lethal and non-lethal environment

2

1

0

0 1 2 3 4 5 6 7 8 9 10

0.005

0

-0.005

-0.01

-0.015

-0.02

-0.025

-0.03

-0.035

-0.04

-0.045

-0.05

#Neighbors K

#Neighbors K

TGA - Survived proportion of the population

at generation G=200, or µ G /µ 0

2

1

0.9

TGA - Average absolute difference between

the best solution fitness found in a non-lethal

environment and the lethal fitness threshold

2

0.24

0.23

0.8

0.22

Model parameter α

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Model parameter α

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

0.21

0.2

0.19

0.18

0.17

0.16

0.15

0.14

#Neighbors K

#Neighbors K

Figure 6.8: Plots showing the average best solution fitness found by TGA (using TS =

2,p c = 0.0 and Poisson mutation with p m = 1/N) in a lethal environment (top left),

the average absolute difference between this fitness and the fitness achieved in a nonlethal

environment, i.e. f LFT = 0 (top right), the average proportion of the population

that survived at generation G = 200, or µ G /µ 0 (bottom left), and the average absolute

difference between the best solution fitness found and the lethal fitness threshold in

a non-lethal environment (bottom right), on NKα landscapes as a function of the

neighborhood size K and the model parameter α.

6.2.3 Experimental results

Let us first analyze the effect of lethal environments on **evolutionary** **search** **for**

different topologies of the NKα landscapes. Figure 6.8 shows the effect on the

best solution fitness found (top plots) and the (remaining) population size (bottom

left plot) **for** TGA with a rather traditional setup (TS = 2,p c = 0.0 and

Poisson mutation with p m = 1/l) as a function of the parameters K and α; the

196 CHAPTER 6. FURTHER RESOURCING ISSUES

bottom right plot indicates the scope available **for** **optimization** after the initialization

of the population. From the top plots we can see that the per**for**mance is

most affected **for** fitness landscapes that are rugged (large K) but contain some

structure in the sense that some solution bits are more important than others

(large α). As is apparent from the bottom left plot, the reason **for** this pattern

is that the population size at the end of the **optimization** decreases as K and/or

α increase, i.e. lethal solutions are here more likely to be encountered. In fact,

**for** large K and α, the **optimization** terminates on average after only about 25

generations (out of 200); this result is not shown here.

The reason that the population size reduces **for** large K (bottom left plot of

Figure6.8)isthatthefitnesslandscapebecomesmorerugged, which increases the

risk of evaluating lethal solutions despite them being located close to high-quality

solutions. Although an increase in α introduces more structure into the fitness

landscape, ithas also theeffect thatsolutions which have some ofthese important

bits set incorrectly are more likely to be poor. A plausible alternative explanation

**for** the poor per**for**mance at large α may be that the lethal fitness threshold is

very close to the best solution fitness that can be found, thus reducing the scope

of **optimization** possible and causing many solutions to be lethal; the bottom

right plot rules this explanation out. Regarding the (remaining) population size,

a greater selection pressure, i.e. a larger tournament size, can help to maintain

a large population size **for** longer, particularly **for** large K and small α. On the

other hand, being more random in the solution generation process by using, **for**

example, a larger mutation and/or crossover rate, has the opposite effect, i.e. the

population size decreases more quickly.

The effect on the (remaining) population size translates into an impact on

the average best solution fitness. From Figure 6.9, we can see that, **for** rugged

landscapes with structure (i.e. in the range K > 5,α > 1), TGA is outper**for**med

by a GA with a rather traditional setup and a PHC in a non-lethal **optimization**

scenario (right plots) but per**for**ms better than both algorithms in a lethal

**optimization** scenario (left plots). In fact, as we will also see later, in the range

K > 5,α > 1, PHC per**for**ms best among all the **search** algorithms considered in

the non-lethal environment but worst in the lethal environment. The poor per**for**mance

of PHC in this range is due to the fact that the hill-climbers in the

population are **search**ing the fitness landscape independently of each other. This

may be an advantage in a non-lethal environment because it can help to maintain

6.2. LETHAL ENVIRONMENTS 197

Model parameter α

Model parameter α

Ratio P(f(x) > f TGA )/P(f(x) > f TGA,TS=2,pc =0.0,p m =1/l ),

Ratio P(f(x) > f TGA )/P(f(x) > f TGA,TS=2,pc =0.0,p m =1/l ),

lethal **optimization**

10 1

non-lethal **optimization**

10 2

lethal **optimization**

10 2 non-lethal **optimization**

10 3

2

2

10 0

1.5

1.5

10 -1

10 1

1

1

10 -2

0.5

0.5

10 0

10 -3

0

0

10 -4

10 -1

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

Ratio P(f(x) > f TGA )/P(f(x) > f PHC ),

Ratio P(f(x) > f TGA )/P(f(x) > f PHC ),

2

10 1

2

10 2

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

10 0

10 -1

10 -2

10 -3

10 -4

10 -5

Model parameter α

Model parameter α

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

Figure 6.9: Plots showing the ratio P(f(x) > f TGA )/ P(f(x) > f TGA,TS=2,pc=0.0,p m=1/l)

(top plot) and P(f(x) > f TGA )/ P(f(x) > f PHC ) (bottom plots) in a lethal (left plots)

and non-lethal environment (right plots) on NKα landscapes as a function of K and

α. Here, x is a random variable that represents the best solution from a set of solutions

drawn uni**for**mly at random from the **search** space and f ∗ the average best

solution fitness obtained with strategy ∗. If P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy

∗∗ is able to achieve a higher average best solution fitness than strategy ∗ and a

greater advantage of ∗∗ is indicated by a darker shading in the heat maps; similarly, if

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is better than ∗∗ and a lighter shading indicates

a greater advantage of ∗.

10 1

10 0

10 -1

10 -2

10 -3

diversity in the population and prevent premature convergence. However, in a

lethal environment, uncontrolled diversity is rather a drawback as it increases the

probability of generating solutions with genotypes that are significantly different

from the ones in the population, which in turn increases the probability of generating

lethal solutions. The less diverse and random **optimization** of TGA is also

the reason that it outper**for**ms a GA with a more traditional setup on rugged

198 CHAPTER 6. FURTHER RESOURCING ISSUES

landscapes with structure.

Let us now analyze how the **search** algorithms fare with different parameter

settings. Table 6.5 shows the average best solution fitness obtained by different

algorithm setups in a lethal and non-lethal environment (values in parenthesis)

on NKα landscapes with K = 4,α = 2.0; Table 6.6 shows the same in**for**mation

**for** NKα landscapes with K = 10,α = 2.0. Note that due to the nature of NKα

landscapes (particularly due to the absence of plateaus or neutral networks in

the landscapes), there was only a single best solution in the population at each

generation. For RBS, this has the effect that the per**for**mance is independent

of the crossover rate because crossover is applied to identical solutions. The

two tables largely confirm the observation made above: while a relatively high

degree of population diversity and randomness in the solution generation process

may be beneficial in a non-lethal environment (which is particularly true **for**

rugged landscape as can be seen from the Table 6.6), it is rather a drawback in

a lethal environment as it may increase the risk of generating lethal solutions

and subsequently limit the effectiveness of **evolutionary** **search**. More precisely,

**for** a lethal environment, the tables indicate that one should use a GA instead

of a PHC, and reduce randomness in the solution generation process by avoiding

crossover (generally), and using constant mutation as well as a relatively large

tournament size. The poor per**for**mance of Poisson mutation, which is a typical

mutation mode **for** GAs, is related to situations where an over-averaged number

of solution bits is flipped at once. Such variation steps lead to offspring that are

located in the **search** space far away from their non-lethal parents, which can in

turn increase the probability of generating lethal solutions.

Finally, Figure6.10analyzes whether evolving alargeset ofentities **for**asmall

number of generations yields better per**for**mance than evolving only a few entities

**for** many generations; i.e. how is the trade-off between the initial population size

µ 0 and the maximum number of generations G. For a non-lethal environment

(right plots), we make two observations: (i) best per**for**mance is achieved with

PHCusingarathersmall populationsizeofaboutµ 0 = 30, and(ii)**for**population

sizes µ 0 > 50 one should use a GA whereby the applied tournament size or

selection pressure should increase with the population size. As the population

size increases, the maximum number of generations G available **for** **optimization**

is simply insufficient **for** PHC or a GA with low selection pressure and/or much

6.2. LETHAL ENVIRONMENTS 199

K=4, α=2.0, lethal **optimization**

K=4, α=2.0, non-lethal **optimization**

Average best solution fitness

0.72

0.71

0.7

0.69

0.68

0.67

Average best solution fitness

0.72

0.71

0.7

0.69

0.68

0.67

0 50 100 150 200 250

0 50 100 150 200 250

Initial population size µ 0

Initial population size µ 0

0.75

K=10, α=2.0, lethal **optimization**

0.75

K=10, α=2.0, non-lethal **optimization**

Average best solution fitness

0.74

0.73

0.72

0.71

0.7

0.69

0.68

0.67

0.66

0 50 100 150 200 250

Average best solution fitness

0.74

0.73

0.72

0.71

0.7

TGA, TS=1

0.69

TGA, TS=2

0.68

TGA, TS=4

RBS

0.67

PHC

TGA, TS=2,p c =0.0,p m =1/N

0.66

0 50 100 150 200 250

Initial population size µ 0

Initial population size µ 0

Figure 6.10: Plots showing the average best solution fitness obtained by different **search**

algorithms in a lethal environment (left) and non-lethal environment (right) on NKα

landscapes with K = 4,α = 2.0 (top) and K = 10,α = 2.0 (bottom) as a function of

the initial population size µ 0 ; the maximum number of fitness evaluations available **for**

**optimization** was fixed at 6000 (corresponding to the default setting G = 200,µ 0 = 30),

i.e. G = ⌊6000/µ 0 ⌋. If not otherwise stated, an algorithm used a constant mutation

mode with d = 1 and no crossover, i.e. p c = 0.0.

randomness in the solution generation process to converge quickly to a highquality

part of the **search** space. However, on the other hand, a GA with too

much selection pressure may cause the population to get stuck at local optima,

especially on rugged fitness landscapes (see result of RBS in the bottom right

plot). In a lethal **optimization** scenario (left plots), a GA clearly outper**for**ms

PHC (regardless of the population size) due to the reasons mentioned above.

Furthermore, unlike to the non-lethal case, we observe that a GA should use an

above-average largetournament size already**for**small populationsizes of µ 0 > 30.

Similar to the non-lethal case, however, there is a saturation point with RBS,

beyond which a further increase in the population size has no effect (see e.g.

bottom left plot).

200 CHAPTER 6. FURTHER RESOURCING ISSUES

6.2.4 Summary and conclusion

In this study we have conducted an initial investigation of the impact of lethal

solutions (here modeled as candidate solutions below a certain fitness threshold)

on**evolutionary** **search**. Inessence, theeffect ofevaluating a lethal solutionisthat

the solution is immediately removed from the population and the population size

is reduced by one. This kind of scenario can be found in natural evolution, where

the presence of lethal mutants may promote robustness over evolvability, but it is

also a characteristic of certain **closed**-**loop** evolution applications, **for** example, in

autonomousrobotsornano-technologies. Whenfacedwiththiskindofscenarioin

**optimization**, the challenge is to discover innovative and fit candidate solutions

without reducing the population too rapidly; i.e. trading off evolvability with

robustness.

Our analysis has been focused on the following three main aspects: (i) analyzing

the impact of lethal solutions on a standard **evolutionary** algorithm (EA)

framework, (ii) determining challenging fitness landscape topologies in a lethal

environment, and (iii) **tuning** EAs to per**for**m well within a lethal **optimization**

scenario. Generally, the presence of lethal solutions can affect that per**for**mance

of EAs, but the largest (negative) impact was observed **for** fitness landscapes that

are rugged and possess some structure in the sense that some solution bits are

more important than others; in terms of our test functions, which were NKα

landscapes, this type of landscape corresponds to large values of K and α. For

this fitness landscape topology, we obtained best per**for**mance in a non-lethal

environment using a small population of stochastic hill-climbers. In a lethal environment,

however, significantly better results were obtained using an EA that

limits randomness in the solution generation process by employing elitism, a relatively

large selection pressure, a constant mutation mode (i.e. flipping exactly

one solution bit as opposed to flipping each bit independently with some low

probability), and no or a small crossover rate. Also, in a lethal **optimization** scenario

and on the test problems considered, we observed that an EA that evolves

a large set of entities (e.g. autonomous robots or software) **for** a small number of

generations per**for**ms better than one that evolves a few entities **for** many generations.

The practical implication of this is that, if possible, a larger budget should

be allocated **for** the production of the entities in the first place rather than **for**

attempting to prolong the testing or **optimization** phase.

As with the other resourcing issues considered in this thesis, there remains

6.3. CHAPTER SUMMARY 201

much else to learn about **optimization** problems subject to lethal environments.

The next step could be to design strategies that specifically account **for** lethal

environments (rather than to tune EA operators).

6.3 Chapter summary

In this chapter we have investigated the impact of two resourcing issues on **evolutionary**

**search** applied to **closed**-**loop** **optimization**: changes of variables, and

lethal environments. Problems subject to changes of variables feature a dynamically

changing fitness landscapes (but a static feasible region) and thus belong

to the class of dynamic **optimization** problems. However, unlike standard problems

in this domain, it is known when changes are applied to the landscape

(the point of time may even be controlled), and which part of the **search** space

undergoes a change. We proposed and analyzed a strategy, which we call fair mutation,

that specifically exploits these two properties. Lethal environments yield

**optimization** problems that limit the number of solutions evaluable below a certain

fitness threshold; we refer to these poor solutions as lethal solutions. This

setup models certain **closed**-**loop** evolution scenarios that may be encountered,

**for** example, when hardware (e.g. autonomous robots) on which solutions (e.g a

control software) are tested is available in limited quantity only. To cope with

lethal environments, we looked at the **tuning** of some of the basic EA configuration

parameters: degree-of-elitism, population size, selection pressure, mutation

mode and strength, and crossover rate. The main finding of our study was that

lethal environments can be dealt with effectively, even when **tuning** only some of

the basic parameters of EAs, however, the optimal EA configuration depends on

the fitness landscape to be optimized.

In the following chapter we will conclude the thesis. We will draw together

the findings and their implications, and discuss directions **for** further re**search**.

202 CHAPTER 6. FURTHER RESOURCING ISSUES

Table6.5: Thetableshowstheaverage bestsolution fitnessfoundinalethal environmentafter G = 200generations, andinparenthesis,

the fitness found in a non-lethal environment, **for** different algorithm setups on K = 4,α = 2.0; the results of random sampling were

obtained by generating G×µ 0 = 200×30 = 6000 solutions per run at random and averaging over the best solution fitness values found.

The numbers in the subscript indicate the rank of the top five algorithm setups within the respective environment. We highlighted

all algorithm setups (among the top five) in bold face that are not significantly worse than any other setup. A Friedman test revealed

a significant difference between the **search** algorithm setups in general, but differences among the individual setups were tested **for** in

a post-hoc analysis using (paired) Wilcoxon tests (significance level of 5%) with Bonferroni correction.

TGA

Search algorithm

TS = 1

TS = 2

TS = 4

Constant mutation Poisson mutation

d = 1 d = 2 p m = 0.5/N p m = 1.0/N

p c = 0.0 0.722 4 (0.7297 4 ) 0.6998 (0.7286) 0.7174 (0.7272) 0.7141 (0.7285)

p c = 0.25 0.7188 (0.7272) 0.6969 (0.7296 5 ) 0.7128 (0.7275) 0.7029 (0.7296)

p c = 0.5 0.713 (0.7279) 0.6901 (0.7287) 0.7091 (0.7276) 0.7078 (0.7277)

p c = 0.0 0.7223 3 (0.7269) 0.7053 (0.7287) 0.7198 (0.7271) 0.7175 (0.7275)

p c = 0.25 0.7217 (0.7263) 0.7039 (0.7284) 0.7172 (0.7263) 0.7119 (0.7275)

p c = 0.5 0.7178 (0.7261) 0.7003 (0.728) 0.7176 (0.7268) 0.7135 (0.727)

p c = 0.0 0.7218 5 (0.7266) 0.7107 (0.726) 0.7203 (0.7269) 0.7184 (0.7269)

p c = 0.25 0.7238 1 (0.7257) 0.7085 (0.7254) 0.7196 (0.7273) 0.7169 (0.7278)

p c = 0.5 0.7224 2 (0.7251) 0.7101 (0.7256) 0.7172 (0.7263) 0.7167 (0.726)

RBS 0.7179 (0.7226) 0.7129 (0.723) 0.7177 (0.7229) 0.7172 (0.7223)

PHC 0.7117 (0.7334 2 ) 0.6843 (0.7271) 0.7033 (0.7343 1 ) 0.6912 (0.7329 3 )

Random sampling 0.6197 (0.6358)

Table6.6: Thetableshowstheaverage bestsolution fitnessfoundinalethal environment afterG = 200generations, andinparenthesis,

the fitness found in a non-lethal environment, **for** different algorithm setups on K = 10,α = 2.0; the results of random sampling were

obtained by generating G×µ 0 = 200×30 = 6000 solutions per run at random and averaging over the best solution fitness values found.

The numbers in the subscript indicate the rank of the top five algorithm setups within the respective environment. We highlighted

all algorithm setups (among the top five) in bold face that are not significantly worse than any other setup. A Friedman test revealed

a significant difference between the **search** algorithm setups in general, but differences among the individual setups were tested **for** in

a post-hoc analysis using (paired) Wilcoxon tests (significance level of 5%) with Bonferroni correction.

TGA

Search algorithm

TS = 1

TS = 2

TS = 4

Constant mutation Poisson mutation

d = 1 d = 2 p m = 0.5/N p m = 1/N

p c = 0.0 0.6918 (0.7317) 0.6637 (0.7383 4 ) 0.6812 (0.7338) 0.6767 (0.7356 5 )

p c = 0.25 0.6752 (0.7324) 0.6546 (0.7332) 0.6623 (0.7299) 0.6578 (0.7324)

p c = 0.5 0.6653 (0.7298) 0.6472 (0.7331) 0.6478 (0.7264) 0.6472 (0.7287)

p c = 0.0 0.702 4 (0.7299) 0.6697 (0.7318) 0.6938 (0.7307) 0.6821 (0.7305)

p c = 0.25 0.6897 (0.7277) 0.662 (0.732) 0.6717 (0.7281) 0.6643 (0.7302)

p c = 0.5 0.6768 (0.7288) 0.6537 (0.7296) 0.6607 (0.7263) 0.659 (0.7274)

p c = 0.0 0.7072 2 (0.7275) 0.677 (0.7294) 0.6981 5 (0.7302) 0.6902 (0.7272)

p c = 0.25 0.6979 (0.7294) 0.6701 (0.7298) 0.6834 (0.7279) 0.6805 (0.7263)

p c = 0.5 0.6936 (0.7265) 0.6635 (0.7296) 0.6715 (0.7258) 0.67 (0.7255)

RBS 0.7102 1 (0.7207) 0.6871 (0.7219) 0.7056 3 (0.7217) 0.6987 (0.7215)

PHC 0.6651 (0.7434 2 ) 0.6476 (0.7345) 0.6568 (0.7438 1 ) 0.6517 (0.7421 3 )

Random sampling 0.632 (0.6524)

6.3. CHAPTER SUMMARY 203

Chapter 7

Conclusion

In this thesis we have established an understanding of why and how three resourcing

issues — ephemeral resource constraints, changing variables, and lethal

environments — affect **evolutionary** **search** when applied to **closed**-**loop** **optimization**,

and devised **search** strategies **for** combating these issues. Generally, we observed

that all three resourcing issues may negatively impact the per**for**mance

of an optimizer, but that the strategies that we devised can be effective against

these issues. We will now discuss what we have learnt about each resourcing issue

and the practical implications surrounding it.

Optimization subject to ephemeral resource constraints (ERCs): The

main resourcing issue we focused on in this thesis (Chapters 4 and 5) were **optimization**

problems subject to ephemeral resource constraints (ERCs). These are a

typeofdynamicconstraintthatcancausecertainsolutionstobetemporarilynonevaluable

during the **optimization** process. We define solutions as non-evaluable

when the resources required in their evaluation are temporarily not available.

Clearly, ERCs are less likely to arise if we ensure that all the resources needed

to cover all possible **search** paths are available at any point during an **optimization**

process. However, this strategy is rather costly and inefficient. Instead, we

need to develop strategies that interplay with the workings of an **evolutionary**

algorithm (EA) to ensure that promising and relevant resources are available; if a

required resource is not available, then a strategy should also be able to efficiently

utilize those resources that are available.

Generally, we observed that ERCs may affect the per**for**mance of an optimizer

in some (but by no means all) cases, and clear patterns emerge that relate ERC

parameters to per**for**mance effects. For instance, we observed that the later the

204

205

constraint time frame of an ERC begins, the less disruptive is the impact on

**search**. This could mean that investment in resources at early stages of the **optimization**

should bepreferred, where possible. We also observed that the impact of

an ERC on the per**for**mance of **evolutionary** **search** depends on the order and the

quality of the genetic material represented by the constraint schemata of an ERC.

Thus, to some degree, we may be able to predict the extent of impact if in**for**mation

about these schemata is available. Consequently, this prediction can serve as

the basis **for** investment decisions related to resources. For an ERC that leaves

the arrangement of resources to the optimizer (see Section 5.3), we observed that,

at the beginning of an **optimization** process, it is necessary to limit the number

of different resources purchased in the presence of budgetary constraints. This

policy is cost-effective and does not impact the per**for**mance significantly in the

long run.

We have also clearly seen (theoretically and empirically) that the impact on

the per**for**mance of an **evolutionary** algorithm (EA) is modulated by the choice of

constraint-handling strategy adopted and EA parameters. Which choice of strategy

is best is dependent on the details of the ERC, and the time and budgetary

constraints present. For summaries of (theoretical and empirical) results relating

ERC parameters to suitability of EA configurations and **search** strategies please

refer to Sections 4.3.4, 5.1.5, 5.2.4, and 5.3.4.

Finally, we also observed that knowing about the type of ERC may be sufficient

to select an effective **search** strategy **for** dealing with it a priori, even when

knowledge of the fitness landscape is limited. If this observation turns out to be

more generally true, then it is good news because we usually have more knowledge

about the ERCs than about the fitness landscape. There**for**e we would not

need to be ‘right’ about the fitness landscape in order to choose the right **search**

strategy. Nevertheless, as we indicated in the case study of Section 5.1.4, a comprehensive

a priori analysis of the ERCs at hand can be beneficial when it comes

to selecting a suitable policy.

Optimization subject to changes of variables: Motivated by a real **closed****loop**

problem in drug discovery, the second resourcing issue we considered in

this thesis (Chapter 6) was concerned with optimizing subject to changes of

variables (drugs) and thus a changing fitness landscape. The general objective

when tackling such dynamic problems is to track the global optimum, which may

shift upon a change in the landscape. Quick tracking of global optima is essential

206 CHAPTER 7. CONCLUSION

in a **closed**-**loop** **optimization** scenario because evaluations are expensive so their

number is typically limited. Clearly, we will not encounter this resourcing issue in

the first place if the experimentalist knows in advance all the drugs she wants to

test during the **optimization** process. However, knowing that **evolutionary** **search**

can cope with changing variables gives an experimentalist the opportunity to

interfere in the **optimization** process if necessary.

Compared to standard dynamic problems, problems subject to changes of

variables feature two important advantages **for** tracking shifting optima quickly:

(i) changes in the fitness landscape do not need to be detected explicitly because

an optimizer knows and may even be allowed to control when and which variables

(drugs) are replaced, and (ii) tracking of optima is simplified because there is a

correlation between the variables changed and the parts of the landscape undergoing

a change. We proposed a strategy (see Section 6.1.2), which we call fair

mutation, that specifically exploits these two aspects **for** tracking shifting optima.

Our analyses showed that the difficulty of tracking optima depends on the

number of variables changed and the frequency with which they are changed.

Fair mutation has two user-defined parameters that allow the experimentalist

to account **for** these two factors. We have shown in Section 6.1.5 that, when

tuned appropriately, fair mutation can track shifting optima more quickly than

standard strategies from the dynamic **optimization** literature. However, as more

variables are changed less frequently, restarting the **optimization** from scratch

upon a variable change has often turned out to be the best choice on the test

problems considered (see Section 6.1.6).

Optimization in lethal environments: The final resourcing issue we considered

in this thesis (Chapter 6) was related to lethal **optimization** environments.

This resourcing issue has an effect on the population size of an optimizer as poor

solutions ‘die at birth’ and are permanently lost from the evolving population.

This issue arises because the hardware on which individuals (representing, e.g.

control software) are tested is reconfigurable, destructible and non-replaceable.

The hardware may be, **for** instance, prototypes of autonomous robots, drone

planes, or nano-machines, all of which are expensive pieces and commonly only

available in limit numbers **for** experimentation. Clearly, the hardware is expensive

and we would like to avoid damaging it, but set against this is that we want

to improve the control software that runs on the hardware. This conflict requires

an optimizer to trade off exploration and exploitation rather carefully.

7.1. FUTURE WORK 207

To cope with lethal environments, we looked (in Section 6.2.3) at the **tuning**

of some of the basic EA configuration parameters: degree-of-elitism, population

size, selection pressure, mutation mode and strength, and crossover rate. We also

varied aspects of the fitness landscape we are optimizing to observe which landscape

topologies pose a particular challenge when optimizing in a lethal environment.

The main finding of our investigation was that randomness in the solution

generation process should be limited in a lethal **optimization** environment. That

is, elitism and a relatively high selection pressure should be employed and so

should be a more deterministic mutation operator; the mutation and crossover

rates should be low too. We have also observed that an EA that evolves many

individuals (e.g. control software) **for** a small number of generations per**for**ms

better than one that evolves a few entities **for** many generations. The practical

implication of this is that, if possible, a larger budget should be allocated **for** the

production of the entities in the first place rather than **for** attempting to prolong

the testing or **optimization** phase.

7.1 Future work

Although this work allows us to draw valuable conclusions, our study has of

course been very limited. There remains much else to learn about the effects of

resourcing issues in **closed**-**loop** **optimization** and how to handle them. We now

discuss several directions **for** future re**search** towards achieving this goal.

Gaining a more robust understanding **for** the **search** strategies developed.

To gain a more robust understanding of the behavior of the **search** strategies

developed, it would be beneficial to consider further and perhaps more realistic

fitness landscapes than the ones we employed here. Of course, it would be

ideal tovalidatethe**search**strategiesonreal-world**closed**-**loop**problemsfeaturing

real resource constraints. However, this approach is generally not realistic due

to the time and/or budgetary requirements. The next best thing we can do is to

simulate a fitness landscape based on data obtained from real-world experiments.

This is the approach we have taken in the case study of Section 5.1.4, and more

studies of this kind are needed.

The test problems considered in this theses were mainly of pseudo-boolean

nature; it would also be important to investigate how the **search** strategies behave

**for** **search** spaces with real or mixed integer variables. Also, our analysis

208 CHAPTER 7. CONCLUSION

has looked at the impact of different resourcing issues isolated from each other,

which was necessary to gain an initial understanding of their impact on **optimization**.

However, in real applications it is possible to encounter **closed**-**loop**

problems that are subject to multiple resourcing issues simultaneously. Hence,

further work should investigate how to trade off conflicting properties of different

**search** strategies to cope efficiently with multiple resourcing issues.

Further theoretical analysis of resourcing issues. In Chapter 4 we have

used Markov chains to analyze theoretically the effect of a particular ERC type

on simple EAs. Although our analysis used a simplified **optimization** environment

(two solution types only), valuable observations were made with respect to

the applicability of different selection and reproduction schemes. We also gained

some understanding about the impact of ERCs on **evolutionary** **search**, which,

ultimately, may help us in the design of effective and efficient **search** strategies

**for** **closed**-**loop** **optimization**. However, our theoretical results were limited in the

sense that we did not derive mathematical equations relating, **for** instance, ERC

configurations to optimal EA parameter settings. It remains to be seen whether

it is possible to derive such expressions, and how applicable they would be in

practice. A number of recent advances in EA theory might present the possibility

of understanding ERCs (and other resourcing issues) more deeply, including

drift analysis (Auger and Doerr, 2011) and the fitness level method (Chen et al.,

2009a; Lehre, 2011).

Understanding the effects of non-homogeneous experimental costs in

**closed**-**loop** **optimization**. So far, we have made the assumption that all solution

evaluations take equal time or resources. This need not be the case. For

instance, when dealing with commitment composite ERCs, it is a very realistic

scenario that the composites to be ordered vary in their prices and delivery periods.

Under a limited budget, this scenario might cause an optimizer not only to

follow fitness gradients but also to account **for** variable experimental costs. Hence,

further work should investigate how to trade off these two aspects effectively. For

inspiration, we may look at strategies employed in the Robot Scientist study

of King et al. (2004), where, as noted earlier, this scenario has been encountered

within an inference problem rather than an **optimization** problem. Alternatively,

we may also cast the problem of trading off fitness with costs as a multi-objective

**optimization** problem and employ multi-objective EAs to tackle this problem.

Once we have gained an understanding of the effect of non-homogeneous costs

7.1. FUTURE WORK 209

on **optimization**, the next step would be to identify challenging problem instances

to investigate whether or not it may be worth developing sophisticated **search**

strategies **for** dealing with non-homogeneous costs, or whether the same level of

per**for**mance can be obtained using more straight**for**ward naïve **search** strategies.

Developing strategies **for** coping with lethal environments. Our investigation

of the impact of lethal environments (see Section 6.2) on **evolutionary**

**search** was limited to the **tuning** of simple EA configuration parameters, such as

population size, variation operators and their settings. Clearly, it is possible that

a better per**for**mance may be achieved using strategies specifically designed to

cope with lethal environments. From preliminary experiments, we know already

that a learning approach, similar to the one proposed in Section 5.2.1, which

learns a switching policy offline can yield better per**for**mance. Alternatively, we

may augment an EA with a strategy that uses assumptions of local fitness correlation

to pre-screen the designs and **for**bid the upload of potential lethals. Such

a strategy is similar to brood selection with repair and/or some fitness approximation

schemes used in EAs to filter solutions be**for**e evaluation (Walters, 1998;

Jin and Sendhoff, 2003; Jin, 2005).

Improving and broadening the application of machine learning techniques

in **closed**-**loop** **optimization**. We have shown (in Section 5.2) that

**evolutionary** **search** augmented with machine learning techniques, such as rein**for**cement

learning (RL), can be a powerful **optimization** tool to cope with

ERCs. Similar learning approaches can be applied to other resourcing issues.

As mentioned above, we employed a learning-based approach to cope with lethal

environments, and the results obtained are promising. However, as mentioned in

Section 5.2.3, some **tuning** of the rein**for**cement learning agent may be required

to achieve this strong per**for**mance. The **tuning** refers to things like parameter

settings involved in the RL agent, the state space partitioning, and the reward

function. To increase the applicability of learning-based optimizers to different

types of **optimization** problems, one could also try combining offline learning with

online learning. For instance, RL can be used to learn offline a policy until some

distant point in time, and this policy can then be refined or slightly modified

online using the anticipation approach of Bosman (2005). Another avenue worth

pursuing is to extend **search** methods that use surrogate models (see Section 2.6),

which are commonly used in real-world problems, to cope with the resourcing

issues considered in this thesis.

210 CHAPTER 7. CONCLUSION

Casting ERCOPs as RL problems. We indicated in Section 4.1.2 that an

ERCOP can be cast as an RL problem rather than a pure **optimization** problem

(perhaps this perspective is more intuitive now that we have seen how RL can be

used to learn when to switch between different constraint-handling strategies).

When cast as an RL problem, we could use a standard RL technique (such as

Sarsa(λ)) to generate solutions to an ERCOP, rather than augment an EA with

constraint-handling strategies under RL control. As we know, to make this approach

work, we need to define the state space, actions, and the reward function.

To define a state, we may now need to consider also the set of evaluable solutions

at the current time step (in addition to other indicators, such as the current time

step and repairs per**for**med so far). The acti