tuning evolutionary search for closed-loop optimization - Computer ...

ucl.ac.uk

tuning evolutionary search for closed-loop optimization - Computer ...

TUNING EVOLUTIONARY SEARCH

FOR CLOSED-LOOP OPTIMIZATION

A thesis submitted to the University of Manchester

for the degree of Doctor of Philosophy

in the Faculty of Engineering and Physical Sciences

2012

By

Richard Allmendinger

School of Computer Science


Contents

Abstract 12

Declaration 13

Copyright 14

Acknowledgements 15

Nomenclature 16

1 Introduction 17

1.1 Objectives and contributions . . . . . . . . . . . . . . . . . . . . . 21

1.2 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.2.1 Publications resulting from the thesis . . . . . . . . . . . . 25

2 Closed-Loop Optimization 26

2.1 Concepts of closed-loop optimization . . . . . . . . . . . . . . . . 26

2.2 Applications of closed-loop optimization . . . . . . . . . . . . . . 29

2.3 Challenges of closed-loop optimization . . . . . . . . . . . . . . . 38

2.4 Design of experiments . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5 Response surface methods . . . . . . . . . . . . . . . . . . . . . . 41

2.6 Simulated evolution in closed-loop optimization . . . . . . . . . . 43

2.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Evolutionary Optimization 46

3.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . 46

3.1.1 Analyzing and explaining evolutionary algorithms . . . . . 49

3.1.2 Tuning evolutionary algorithms . . . . . . . . . . . . . . . 53

3.2 Optimization in uncertain environments . . . . . . . . . . . . . . 59

2


3.2.1 Optimization subject to noise . . . . . . . . . . . . . . . . 59

3.2.2 Searching for robust and reliable solutions . . . . . . . . . 62

3.2.3 Further forms of uncertainties in optimization . . . . . . . 63

3.3 Constrained optimization . . . . . . . . . . . . . . . . . . . . . . . 64

3.4 Dynamic optimization . . . . . . . . . . . . . . . . . . . . . . . . 68

3.4.1 Dynamically-constrained optimization . . . . . . . . . . . 71

3.5 Online optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.6 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . 76

3.6.1 Reinforcement learning in evolutionary optimization . . . . 81

3.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 ERCs: Definitions and Concepts 83

4.1 ERCOPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.1.1 General problem definition of an ERCOP . . . . . . . . . . 84

4.1.2 Further comments on ERCOPs . . . . . . . . . . . . . . . 87

4.2 Specific types of ERCs . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2.1 Fundamental elements of ERCs . . . . . . . . . . . . . . . 88

4.2.2 Commitment relaxation ERCs . . . . . . . . . . . . . . . . 91

4.2.3 Periodic ERCs . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2.4 Random ERCs . . . . . . . . . . . . . . . . . . . . . . . . 95

4.2.5 Commitment composite ERCs . . . . . . . . . . . . . . . . 97

4.3 Theoretical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.3.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.3.2 Modeling ERCs with Markov models . . . . . . . . . . . . 103

4.3.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . 110

4.3.4 Summary of theoretical study . . . . . . . . . . . . . . . . 115

4.4 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5 Strategies for Coping with ERCs 118

5.1 Static constraint-handling strategies . . . . . . . . . . . . . . . . . 118

5.1.1 Specific static constraint-handling strategies . . . . . . . . 119

5.1.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 122

5.1.3 Experimental results . . . . . . . . . . . . . . . . . . . . . 129

5.1.4 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

5.1.5 Summary and conclusion . . . . . . . . . . . . . . . . . . . 142

5.2 Learning-based constraint-handling strategies . . . . . . . . . . . 144

3


5.2.1 Offline learning-based strategy . . . . . . . . . . . . . . . . 145

5.2.2 Online learning-based strategy . . . . . . . . . . . . . . . . 146

5.2.3 Experimental analysis . . . . . . . . . . . . . . . . . . . . 147

5.2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . 154

5.3 Online resource-purchasing strategies . . . . . . . . . . . . . . . . 155

5.3.1 Specific online purchasing strategies . . . . . . . . . . . . . 156

5.3.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 163

5.3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . 165

5.3.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . 168

5.4 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

6 Further Resourcing Issues 170

6.1 Changes of variables . . . . . . . . . . . . . . . . . . . . . . . . . 170

6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

6.1.2 Fair mutation . . . . . . . . . . . . . . . . . . . . . . . . . 172

6.1.3 Testing Environment . . . . . . . . . . . . . . . . . . . . . 174

6.1.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 176

6.1.5 Experimental results . . . . . . . . . . . . . . . . . . . . . 180

6.1.6 Summary and conclusion . . . . . . . . . . . . . . . . . . . 185

6.2 Lethal environments . . . . . . . . . . . . . . . . . . . . . . . . . 187

6.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.2.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 189

6.2.3 Experimental results . . . . . . . . . . . . . . . . . . . . . 195

6.2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . 200

6.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

7 Conclusion 204

7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

A Further Results of Strategies Applied to ERCOPs 211

A.1 Commitment composite ERCs . . . . . . . . . . . . . . . . . . . . 211

A.2 Periodic ERCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

B The D-MAB Algorithm 217

Word Count: 56459

4


List of Tables

2.1 Domains of closed-loop applications. . . . . . . . . . . . . . . . . 29

3.1 Nomenclature used in evolutionary algorithms . . . . . . . . . . . 47

4.1 Different types of ephemeral resource constraints (ERCs) and examples.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.1 EA parameter settings as used in the study of static constrainthandling

strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.2 Parameter settings of static constraint-handling strategies. . . . . 128

5.3 Parameter settings of test functions f as used in the study of static

constraint-handling strategies. . . . . . . . . . . . . . . . . . . . . 129

5.4 Parametersettingsofstaticandlearning-basedconstraint-handling

strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.5 Parameter settings of online resource-purchasing strategies. . . . . 165

6.1 Parameter settings of dynamic optimization techniques. . . . . . . 178

6.2 Parameter settings of test functions f modelling problems subject

to changes of variables. . . . . . . . . . . . . . . . . . . . . . . . . 179

6.3 Accuracy and adaptability results of various dynamic optimization

strategies on problems subject to changes of variables . . . . . . . 187

6.4 Default parameter settings of search algorithms. . . . . . . . . . . 194

6.5 Average best solution fitness obtained with various parameter settings

of search algorithms optimizing problems subject to lethal

environments (N = 50,K = 4,α = 2.0) . . . . . . . . . . . . . . . 202

6.6 Average best solution fitness obtained with various parameter settings

of search algorithms optimizing problems subject to lethal

environments (N = 50,K = 10,α = 2.0) . . . . . . . . . . . . . . 203

6


List of Figures

1.1 Schematic of the experimental setup as used by Schwefel (1968) in

the shape optimization of a flashing nozzle . . . . . . . . . . . . . 18

2.1 Schematic of closed-loop optimization . . . . . . . . . . . . . . . . 28

2.2 Experimental setup as used in the configuration optimization of

an FPGA chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3 Experimental setup as used in an application for quantum control 33

2.4 Experimental setup as used in the robot scientist study . . . . . . 35

2.5 Schematic of the closed loop as employed with EVOP . . . . . . . 37

2.6 Box-Behnken experimental design . . . . . . . . . . . . . . . . . . 40

2.7 A response surface plot . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1 Visualization of one-point crossover and bit flip mutation . . . . . 49

3.2 Modeling genetic algorithms with a dynamical systems approach . 51

3.3 Classification of parameter setting methods for EAs . . . . . . . . 54

3.4 Comparison between a brute-force and a statistical racing method 55

3.5 The general scheme of an EA employing an adaptive operator selection

scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.6 Visualization of the effect of a noisy fitness function . . . . . . . . 60

3.7 Evolution strategy with rescaled mutation . . . . . . . . . . . . . 62

3.8 Visualization of the effect of approximation error in meta-models . 64

3.9 A dynamically changing fitness landscape . . . . . . . . . . . . . . 68

3.10 A schematic illustration of the effect of anticipation . . . . . . . . 75

3.11 The agent-environment interaction in reinforcement learning . . . 76

4.1 Distribution of evaluable search space E t and feasible region X . . 86

4.2 Schematic of the constraint time frame, preparation and recovery

period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.3 An illustration of a commitment relaxation ERC . . . . . . . . . . 91

7


4.4 An illustration of a periodic ERC . . . . . . . . . . . . . . . . . . 95

4.5 An illustration of a random ERC . . . . . . . . . . . . . . . . . . 97

4.6 A visual example of a commitment composite ERC . . . . . . . . 99

4.7 Theoretical simulation result of periodic ERCs as a function of

selection steps (f(A) = f(B)) . . . . . . . . . . . . . . . . . . . . 111

4.8 Theoretical simulation result of periodic ERCs as a function of the

activation period (f(A) = f(B)) . . . . . . . . . . . . . . . . . . . 112

4.9 Theoretical simulation result of periodic ERCs as a function of

selection steps (f(A) ≠ f(B)) . . . . . . . . . . . . . . . . . . . . 113

4.10 Theoretical simulation result of periodic ERCs as a function of the

preparation time and activation period (f(A) ≠ f(B)) . . . . . . 114

4.11 Theoretical simulation result of periodic ERCs as a function of the

recovery time (f(A) ≠ f(B)) . . . . . . . . . . . . . . . . . . . . . 115

4.12 Theoretical simulation result of periodic ERCs as a function of the

fitness ratio f(A)/f(B)) . . . . . . . . . . . . . . . . . . . . . . . 116

5.1 Drift-effect caused by repairing . . . . . . . . . . . . . . . . . . . 119

5.2 Illustration of repairing strategies . . . . . . . . . . . . . . . . . . 123

5.3 A TwoMax function . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.4 Empirical results of commitment relaxation ERCs on OneMax . . 131

5.5 Empirical results of commitment relaxation ERCs on OneMax

(heat map I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.6 Empirical results of commitment relaxation ERCs on OneMax

(heat map II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.7 Empirical results of commitment relaxation ERCs on MAX-SAT

(heat map) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.8 Empirical results of periodic ERCs on OneMax . . . . . . . . . . 136

5.9 Empirical results of periodic ERCs on OneMax (heat map I) . . . 137

5.10 Empirical results of periodic ERCs on OneMax (heat map II) . . 138

5.11 Empirical results of commitment relaxation ERCs on NKα landscapes

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5.12 EmpiricalresultsofcommitmentrelaxationERCsonafitnesslandscape

interpolated from real-world data . . . . . . . . . . . . . . . 142

5.13 Empirical results of learning-based constraint-handling strategies . 150

5.14 Optimal actions learnt by a reinforcement learning agent . . . . . 151

5.15 Arms played by the multi-armed bandit algorithm . . . . . . . . . 152

8


5.16 Performance of reinforcement learning agent as a function of different

parameter settings . . . . . . . . . . . . . . . . . . . . . . . 153

5.17 Learning-based strategies performing on an unknown fitness landscape

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.18 Visualization of an online resource-purchasing strategy . . . . . . 160

5.19 Empirical results of commitment composite ERCs on MAX-SAT

(heat map) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

5.20 Empirical results of commitment composite ERCs on MAX-SAT

as a function of available budget . . . . . . . . . . . . . . . . . . . 167

5.21 Empirical results of commitment composite ERCs on MAX-SAT

(heat map-comparison) . . . . . . . . . . . . . . . . . . . . . . . . 168

6.1 An illustration of the effect of changing variables . . . . . . . . . . 171

6.2 Empirical results of dynamic optimization strategies optimizing

MJ models subject to frequent changes of few variables . . . . . . 181

6.3 Empirical results of dynamic optimization strategies optimizing

NK landscapes subject to frequent changes of few variables . . . . 182

6.4 Empirical results of dynamic optimization strategies optimizing

MJ models subject to frequent changes of few variables . . . . . . 183

6.5 Empirical results of dynamic optimization strategies optimizing

NK landscapes subject to frequent changes of few variables . . . . 184

6.6 Empirical results of dynamic optimization strategies optimizing

MJ models subject to frequent changes of few variables . . . . . . 185

6.7 Empirical results of dynamic optimization strategies optimizing

NK landscapes subject to frequent changes of few variables . . . . 186

6.8 Empiricalresultsofsearchalgorithmsonproblemssubjecttolethal

environments (heat map) . . . . . . . . . . . . . . . . . . . . . . . 195

6.9 Empiricalresultsofsearchalgorithmsonproblemssubjecttolethal

environments (heat map-comparison) . . . . . . . . . . . . . . . . 197

6.10 Effect of population size on the average best solution fitness obtained

with various search algorithms optimizing problems subject

to lethal environments . . . . . . . . . . . . . . . . . . . . . . . . 199

A.1 Empirical results of commitment relaxation ERCs on MAX-SAT . 212

A.2 Empirical results of commitment relaxation ERCs on TwoMax . . 213

A.3 Empirical results of periodic ERCs on MAX-SAT . . . . . . . . . 214

9


A.4 Empirical results of periodic ERCs on TwoMax . . . . . . . . . . 215

A.5 Empirical results of periodic ERCs on TwoMax (heat map I) . . . 216

A.6 Empirical results of periodic ERCs on TwoMax (heat map II) . . 216

10


List of Algorithms

3.1 A basic generational evolutionary algorithm . . . . . . . . . . . . 48

3.2 A single generation of deterministic crowding (Mahfoud, 1995) . . 70

3.3 Tabular TD(λ) for estimating V π . . . . . . . . . . . . . . . . . . 78

3.4 Tabular Sarsa(λ), an on-policy TD control method . . . . . . . . 80

4.1 Implementation of a commitment relaxation ERC . . . . . . . . . 92

4.2 Illustration of how we manage ERCs and the evaluation of solutions 93

4.3 Implementation of a periodic ERC . . . . . . . . . . . . . . . . . 96

4.4 Implementation of a random ERC . . . . . . . . . . . . . . . . . . 98

4.5 Implementation of a commitment composite ERC . . . . . . . . . 100

5.1 Generational EA with static constraint-handling strategies . . . . 124

5.2 Steady-stateEAwithstaticandlearning-basedconstraint-handling

strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

5.3 Just-in-time strategy with repairing . . . . . . . . . . . . . . . . . 161

5.4 Generational EA with online resource-purchasing strategies . . . . 164

6.1 Fair mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.2 Generational EA with dynamic optimization strategies for handling

problems subject to changes of variables . . . . . . . . . . . 177

6.3 Tournament selection based GA (TGA) . . . . . . . . . . . . . . . 191

6.4 Population of hill-climbers (PHC) . . . . . . . . . . . . . . . . . . 193

11


Abstract

Closed-loop optimization deals with problems in which candidate solutions are

evaluated by conducting experiments, e.g. physical or biochemical experiments.

Although this form of optimization is becoming more popular across the sciences,

it may be subject to rather unexplored resourcing issues, as any experiment may

require resources in order to be conducted. In this thesis we are concerned with

understanding how evolutionary search is affected by three particular resourcing

issues — ephemeral resource constraints (ERCs), changes of variables, and lethal

environments — and the development of search strategies to combat these issues.

The thesis makes three broad contributions. First, we motivate and formally

define the resourcing issues considered. Here, concrete examples in a range of

applications are given. Secondly, we theoretically and empirically investigate the

effect of the resourcing issues considered on evolutionary search. This investigation

reveals that resourcing issues affect optimization in general, and that clear

patterns emerge relating specific properties of the different resourcing issues to

performance effects. Thirdly, we develop and analyze various search strategies

augmented on an evolutionary algorithm (EA) for coping with resourcing issues.

To cope specifically with ERCs, we develop several static constraint-handling

strategies, and investigate the application of reinforcement learning techniques

to learn when to switch between these static strategies during an optimization

process. We also develop several online resource-purchasing strategies to cope

with ERCs that leave the arrangement of resources to the hands of the optimizer.

For problems subject to changes of variables relating to the resources, we find

that knowing which variables are changed provides an optimizer with valuable

information, which we exploit using a novel dynamic strategy. Finally, for lethal

environments, where visiting parts of the search space can cause the permanent

loss of resources, we observe that a standard EA’s population may be reduced in

size rapidly, complicating the search for innovative solutions. To cope with such

scenarios, we consider some non-standard EA setups that are able to innovate

genetically whilst simultaneously mitigating risks to the evolving population.

12


Declaration

No portion of the work referred to in this thesis has been

submitted insupport ofanapplicationforanother degree

or qualification of this or any other university or other

institute of learning.

13


Copyright

i. The author of this thesis (including any appendices and/or schedules to

this thesis) owns certain copyright or related rights in it (the “Copyright”)

and s/he has given The University of Manchester certain rights to use such

Copyright, including for administrative purposes.

ii. Copies of this thesis, either in full or in extracts and whether in hard or

electronic copy, may be made only in accordance with the Copyright, Designs

and Patents Act 1988 (as amended) and regulations issued under it

or, where appropriate, in accordance with licensing agreements which the

University has from time to time. This page must form part of any such

copies made.

iii. TheownershipofcertainCopyright, patents, designs, trademarksandother

intellectual property (the “Intellectual Property”) and any reproductions of

copyright works in the thesis, for example graphs and tables (“Reproductions”),

which may be described in this thesis, may not be owned by the

author and may be owned by third parties. Such Intellectual Property and

Reproductions cannot and must not be made available for use without the

priorwrittenpermissionoftheowner(s)oftherelevantIntellectualProperty

and/or Reproductions.

iv. Further information on the conditions under which disclosure, publication

and commercialisation of this thesis, the Copyright and any Intellectual

Property and/or Reproductions described in it may take place is available

in the University IP Policy (see http://documents.manchester.ac.uk/

DocuInfo.aspx?DocID=487), inany relevant Thesis restriction declarations

deposited in the University Library, The University Library’s regulations

(seehttp://www.manchester.ac.uk/library/aboutus/regulations) and

in The University’s policy on presentation of Theses

14


Acknowledgements

First and foremost I would like to thank my PhD supervisor, Joshua Knowles, for

his valuable moral support, constructive criticism, help and feedback throughout

the progress of this work, and for being a good friend too.

I am also very grateful to Hans-Paul Schwefel, Alan Reynold, David Corne,

and Warwick Dunn for answering many questions about their experience with

resource constraints.

Thanks to Arjun Chandra and Julia Handl for reading portions of my thesis

and making useful suggestions.

Thanks also to Mauricio, Michalis, Kevin, Adam, Ahmad, Peter, Ruofei, Nicolo,

Manuela, Stefan, Freddy, and Richard for making the office more fun and

enjoyable!

To Neil Lawrence for being a fair opponent in internal swimming and cycling

competitions, interesting conversations about sports, and the BBQs.

Warm thanks to all the people I have met and lived with, and made my time

in Manchester more enjoyable, particularly Eyad, Liz, Claudia, Hagen, Abdul,

Chris, Dan, Steve, Zoe, Lotti, Freddy, and Karla.

Finally, many thanks to my family, without whom I would never have been

able to do what I did.

15


Nomenclature

σ t

Set of problem-specific and timeevolving

parameters

E t

Set of evaluable solutions at time step

t

k

Length of activation period

t start

ctf

Time step at which the constraint time

frame of an ERC starts

t end

ctf

Time step at which the constraint time

frame of an ERC is finished

T

Total optimization time

H

Constraint schema

l(H)

Length of a constraint schema

o(H)

Order of a constraint schema

V

Duration of an epoch

commRelaxERC(t start

ctf ,t end

ctf ,V,H) Representation of a commitment relaxation

ERC

P

Period length

perERC(t start

ctf ,t end

ctf ,k,P,H) Representation of a periodic ERC

p

Activation probability

randERC(t start

ctf ,t end

ctf ,k,o(H),p) Representation of a random ERC

H #

High-level constraint schema

#SC

Number of storage cells

TL

Time lag

RN

Reuse number

SL

Shelf life

c order

Cost per submitted composite order

c time step

Cost per time step

C

Budget

commCompERC(H # ,#SC,TL, Representation of a commitment composite

RN,SL)

ERC

16


Chapter 1

Introduction

In the late 60s, Hans-Paul Schwefel reported an ingenious set of experiments designed

to optimize the shape of a flashing nozzle (Schwefel, 1968; Klockgether

and Schwefel, 1970). Figure 1.1 illustrates the setup employed. The aim was to

set aside traditional design principles — and the equations of fluid dynamics —

and turn the design process instead over to a series of trial and error experiments

akin to natural evolution. Schwefel was using an early form of evolutionary algorithm

(EA) and evaluating designs, not through simulation, but by conducting

real (physical) experiments. Although expensive (i.e. time consuming and/or resource

expensive), this setup is effective because experiments replace the need

for having available, or designing, sufficient mathematical models of the problem

being solved.

This thesis will consider problems featuring experimental setups of very similar

character to that of Schwefel’s, which are nowadays commonly referred to as

closed-loop optimization problems. Thedistinct differencebetweentheseproblems

andstandardoptimizationproblems is thatwhile candidatesolutions (genotypes)

toaproblem(e.g.setofparametervaluesspecifying nozzleshapes) areplanned on

a computer, their phenotypes (e.g. an actual flashing nozzle) are realized or prototyped

ex-silico (e.g. relying onaphysical experiment of some sort). In general, the

process of measuring the fitness (or quality) of the phenotype involves conducting

a physical experiment too, as it was also the case in Schwefel’s setup. Over

the years, many scientific and technological problems — including in areas like

quantum control, analytical biochemistry, robotics, electronics design, medicine,

food science, neuroscience, and other sciences (an overview of closed-loop applications

will be given in Section 2.2) — have been tackled using an experimental

17


18 CHAPTER 1. INTRODUCTION

Conical rings

Boiler

Superheated

water

Fluid steam

Measuring device

Velocity of

fluid steam

Shape of nozzle

Computer selects

next nozzle shape

Figure 1.1: Schematic of the experimental setup as used by Schwefel (1968) in the

shape optimization of a flashing nozzle. The nozzle consisted of a series of conical brass

rings, each having its own diameter. The quality or fitness of the nozzle was measured

by injecting superheated water into one side of the nozzle using a pressurized boiler,

and measuring the velocity of the fluid steam at the other side of the nozzle. Based on

this quality measure, a computer then selected the next nozzle shape for testing.

procedure similar to that of Schwefel’s.

However, the reliance on physical experiments comes along with a variety of

challengestobefacedbyanexperimentalist/optimizer. Schwefel’s applicationfeatures

several challenges that are also common to other problems in this domain,

suchasnoisyorimprecise measurements, uncontrolledenvironmental factors, and

expensive fitnessevaluations; anoverview ofcommonchallengesinclosed-loopoptimizationwill

beprovided inSection2.3. Tocopewiththese challenges, Schwefel

andother pioneering researchers inclosed-loopoptimization(Box, 1957;Schwefel,

1968; Klockgether and Schwefel, 1970; Rechenberg, 1973; Schwefel, 1975; Rechenberg,

2000) have opted for methods based on simulated evolution, such as EAs. 1

EAs have several advantages over other optimizers when it comes to closed-loop

optimization(see Section 2.6forexamples) but oneimportant practical reason for

their usage is that they employ a set of solutions rather than a single one. This

property is convenient when experimental equipment is capable of doing many

experiments in parallel.

There seems to be an increasing interest in applying EAs to closed-loop optimization

(Calzolari et al., 2008; Wong et al., 2008; Chen et al., 2009b; Knowles,

2009; Shir et al., 2009; Caschera et al., 2010), due perhaps to the increasing popularity

of EAs as optimization methods across the sciences, the greater availability

1 When an EA is used, closed-loop optimization is sometimes referred to as evolutionary

experimentation or experimental evolution.


19

of automation of experiments in areas like physics, chemistry and the biosciences,

and a push towards more interdisciplinary working. The tutorials by Shir and

Bäck (2009) and by Bäck et al. (2010), and recent keynote talks at international

conferences such as GECCO (Bedau, 2010) and PPSN (Michalewicz, 2010) are

further evidence of a burgeoning interest.

Whilst there is growing evidence that closed-loop optimization works, there

are various unexplored resourcing issues an experimentalist may face during an

iterative experimental loop. The aim of this thesis is to understand how evolutionary

search is affected by three particular resourcing issues in a closedloop

optimization scenario — ephemeral resource constraints (ERCs), changes

of variables, and lethal environments — and the development of search strategies

to combat these issues. The resourcing issue around ERCs was encountered by

Knowles in several closed-loop optimization settings (see O’Hagan et al. (2005,

2007); Knowles (2009); Jarvis et al. (2010)), while changing variables were encountered

in our own collaborative work (Small et al., 2011). 2 The third issue,

lethal environments, is a challenge that may occur when equipment is potentially

destructible as in software testing of autonomous robots or nanotechnology

testing (we have not encountered this particular issue ourselves but believe that

similar scenarios will arise in future).

The primary resourcing issue we are focusing on is related to the temporary

non-availability of resources required in the evaluation process of solutions. This

situation may cause solutions that are perfectly feasible candidate solutions to

the problem to be temporarily non-realizable and thus not available for fitness

assessment; we will refer to such solutions as (temporarily) non-evaluable. We

refer to the constraints specifying which solutions are not evaluable at a given

time as ephemeral resource constraints (ERCs), and any optimization problem

that involves ERCs as an ephemeral resource-constrained optimization problem

(ERCOP).

Although the resourcing issue around ERCs has not been raised much in the

literaturetodate, atleastinthetermssimilar tothoseweuseinthisthesis, precedents

doexist. Wehavespokentoseveral peoplewhodo, orhavedoneclosed-loop

work, and they have recounted how they dealt with the issue. For example, from

discussions with Schwefel (April 2009 - January 2010), we know that he faced

2 Note that ERCs are not discussed specifically in the work of O’Hagan et al. (2005, 2007);

Jarvis et al. (2010), but they existed and were circumvented using waiting or other resourcewasteful

strategies.


20 CHAPTER 1. INTRODUCTION

ERCs in his flashing nozzle work. In fact, not knowing in advance which shape

would be realized, Schwefel sometimes ran out of brass rings of certain diameters.

In that case, additional brass rings were ordered, produced and delivered,

causing the optimization process to wait until these resources arrived; that is,

here ERCs prevented the genotype-phenotype mapping. In modern closed-loop

problems, such as combinatorial drug discovery and optimization of analytical

instrument configurations (Weuster-Botz and Wandrey, 1995; Davies et al., 2000;

King et al., 2004; O’Hagan et al., 2007; Knowles, 2009), there are several aspects

that may further complicate the process of ensuring resource availability. For instance,

the storage space for resources may be limited, there may be time lags

between ordering resources and receiving them, and resources may be associated

with shelf-lives or may be viable for usage only a limited number of times.

In another problem encountered by Schwefel, the fitness of a single solution

wasmeasuredbyrunningatime-consumingsimulationonacomputer(whilethere

were no resources required in the genotype-phenotype mapping). During some

simulations, the process ended prematurely (i.e. an execution error or exception

occurred)andnofitnesswasreturned.FromadiscussionwithReynoldsandCorne

(September 2010) we know that they faced a similar problem recently; execution

errors were also encountered by other researchers including Booker et al. (1999);

Büche et al. (2002).

Motivated by a closed-loop problem involving the identification of effective

drug combinations drawn fromanon-static drug library, the next resourcing issue

considered in this thesis is related to problems featuring dynamically changing

variables. The effect of changing variables is that we introduce a change in the

fitness landscape we are optimizing over, which may cause the global optimum to

shift; notice the difference to ERCOPs, which feature a static fitness landscape

but dynamic constraints. Problems that are subject to changing variables can be

seen as traditional dynamic optimization problems (Branke, 2001). However, in

thescenario considered inthisthesis anoptimizer knows andmay even beallowed

to control when and which variables (drugs) are replaced. Compared to standard

dynamic optimization problems, this property yields two important advantages:

(i) changes in the fitness landscape do not need to be detected explicitly before

a potentially shifting optimum can be tracked, and (ii) tracking of optima is

simplified because there is a correlation between the changed variables and the

parts of the landscape that undergo a change. While these two aspects provide


1.1. OBJECTIVES AND CONTRIBUTIONS 21

clearly some useful extra knowledge, the challenge remaining is how to exploit

this knowledge efficiently to accelerate tracking of optima.

The final resourcing issue we are considering is related to optimization in

lethal environments. The general setup is that we wish to evolve (i.e. improve)

a finite population of entities, but when a lethal solution (i.e. one that damages

equipment) is evaluated, it is immediately removed from the population and

the population size is reduced by one. This models certain closed-loop evolution

scenarios that may be encountered when the hardware on which individuals are

tested are reconfigurable, destructible and non-replaceable. A motivational example

of an optimization problem subject to lethal environments will be given in

Section 6.2.1. In contrast to the other two resourcing issues (ERCs and changes

of variables) lethal environments feature a static fitness landscape and feasible

region; the dynamic component is associated with the fact that the population

size may reduce over the course of the optimization.

1.1 Objectives and contributions

Our goal is to gain a better understanding of why and how the three resourcing

issues mentioned above affect the application of evolutionary search for closedloop

optimization, and consequently devise search strategies for combating these

issues.

We claim the thesis makes the following contributions:

• A general problem definition of ERCOPs, including the development of

various types of ERCs. As our primary focus is on the investigation of the

effects of ERCs; a large body of the thesis is devoted to advancing this

subject (Chapter 4).

• An initial theoretical analysis of the effect of ERCs on simple EAs using

Markovchains. Themajoraimofthisanalysisistounderstandwhetherand

how ERCs should affect configuration choices of an EA; the configuration

choices considered relate to different selection and reproduction schemes

commonly used within EAs (Chapter 4).

• Various static constraint-handling strategies, including repairing, waiting,

and penalty strategies, for coping with ERCs. The empirical study of these

strategies provides evidence that knowing about the type of ERC may be


22 CHAPTER 1. INTRODUCTION

sufficient to select an effective constraint-handling strategy for efficiently

coping with it a priori, even when knowledge of the fitness landscape is

limited (Chapter 5).

• Empirical investigation of two learning-based approaches for coping with

ERCs. Both approaches aim at learning when to switch between various

static constraint-handling strategies during the optimization process. However,

whileoneapproachlearnsthis taskoffline using areinforcement learning

method, theother learns it online using a multi-armed bandit algorithm

(Chapter 5).

• Various online resource-purchasing strategies to cope with ERCs that leave

the arrangement of resources to the hands of the optimizer. We consider a

specific scenario motivated by a real-world ERC (Chapter 5).

• A novel technique, which we call fair mutation, for dealing with problems

featuring changing variables. To track potentially shifting optima upon a

variable change, fair mutation exploits the correlation existing between

the changed variables and the parts of the landscape undergoing a change

(Chapter 6).

• GuidelinesfortuningsimpleEAconfigurationparameters(degree-of-elitism,

population size, selection pressure, mutation mode and strength, and crossover

rate) for coping with lethal environments. The guidelines are validated

on different fitness landscape topologies, helping us understand the underlying

challenging features of landscapes in lethal environments (Chapter 6).

The thesis contributions in a wider scope are:

• A review of the field of closed-loop optimization including general concepts,

applications, challenges, and optimization algorithms employed to address

these challenges (Chapter 2).

• A taxonomy for describing ERCOPs and the different ERCs types we introduce

(Chapter 4).

• A variety of detailed suggestions on directions of future research that we

believe are necessary and fruitful in combating resourcing issues in closedloop

optimization (Chapter 7).


1.2. OUTLINE OF THE THESIS 23

1.2 Outline of the thesis

In Chapter 2, we review literature specifically relating to closed-loop optimization.

After defining general concepts of closed-loop optimization, we review a

variety of closed-loop applications we found in the literature across the sciences

and engineering. Following this review we summarize the main challenges common

to closed-loop applications, and finally give an overview of methods used to

tackleclosed-loopproblems. Inparticular, weintroducethedesignofexperiments

discipline, response surface methods, and methods based on simulated evolution

and EAs. This thesis favours the application of EAs to closed-loop problems.

The reasons for this choice will be described in detail in Section 2.6.

In Chapter 3, we review the application of EAs to problem classes that are related

to closed-loop problems with resourcing issues, including noisy, constrained,

dynamic, and online optimization problems. Prior to this review we give a more

detailed description of the basic principles and concepts of evolutionary search,

and outline techniques employed for: (i) theoretically analyzing and explaining

the working of EAs, and (ii) tuning the performance of EAs. The final part of

this chapter is devoted to the field of reinforcement learning (RL), which offers a

variety of tools for tuning EAs. We introduce the fundamental ideas of RL and

review several studies that looked at extending EAs with RL concepts to improve

performance. This concludes the review of literature; the other chapters present

original research.

In Chapter 4, we give a general problem definition of an ERCOP and define

several types of ERCs derived from the experimental work of Knowles (2009).

The second part of this chapter presents an initial theoretical analysis, using

Markov chains, of the effect of one of the ERC types on common selection and

reproduction schemes used within an EA. The theoretical analysis concludes that

an order relation-based selection operator, such as tournament selection, is more

robust to simple ERCs than a fitness proportionate-based selection operator. A

perhaps moresurprising result isthat, while anEA withanon-elitist generational

reproduction scheme converges more quickly to some optimal population state

than with a non-elitist steady state scheme during unconstrained optimization

periods, the opposite is the case during activation periods of an ERC. This result

implies that ERCs should be accounted for when tuning EAs for ERCOPs. An

empirical study of ERCOPs follows in the subsequent chapter.

In Chapter 5, we develop and empirically analyze various constraint-handling


24 CHAPTER 1. INTRODUCTION

strategies for coping with the ERCs introduced in Chapter 4. First we develop

several general static constraint-handling strategies including repairing, penalizing

and waiting strategies. Some of these strategies are inspired by existing

strategies used for coping with standard (static) constraints, and some others are

realizations of the tested approaches of Schwefel, Reynolds and Corne, described

in the preface of this chapter. The empirical analysis reveals that different strategiesshouldbefavoureddependingonthepropertiesofanERC,andgivesevidence

that it is possible to select a suitable strategy for an ERCOP offline, if the ERCs

are known in advance. Inspired by this observation, we investigate whether an

EA that learns offline (using a reinforcement learning approach) when to switch

between the static strategies during the optimization process can do better than

the static strategies themselves. We consider also a strategy that learns the same

switching task online using a multi-armed bandit algorithm. Finally, we investigatestrategiesforcopingwithanERCtypethatrequiresanoptimizertopurchase

resources online and pre-emptively so that they arrive in time to be used in the

evaluation process of solutions. This ERC imposes three additional limitations

related to the storage space, the shelf life and reuse number of the resources purchased.

We show that this challenging ERC affects EA performance significantly

but that certain online purchasing strategies that we propose and investigate here

can be effective against it.

In Chapter 6, we consider the resourcing issues related to changing variables

in the first part of the chapter, and lethal environments in the latter. A change

in the variables causes parts of the fitness landscape to change in unpredictable

ways; the fact that we know which variables are changed means that tracking

of a potentially shifting optimum is slightly easier than in a standard dynamic

optimization scenario. Based on this, we propose a novel strategy, which we call

fair mutation, and compare it against standard techniques in dynamic optimization.

For coping with lethal environments, we do not propose novel strategies

per se but look at how this issue can be dealt with best by tuning some of the

basic EA configuration parameters: degree-of elitism, population size, selection

pressure, mutation mode and strength, and crossover rate. We also vary aspects

of the fitness landscape we are optimizing to observe which landscape topologies

pose a particular challenge when optimizing in a lethal environment.

In Chapter 7, we conclude the thesis, summarizing what we have learnt about


1.2. OUTLINE OF THE THESIS 25

the effects of resourcing issues on evolutionary search when applied within closedloop

optimization, and how EAs can be extended to cope with these issues. In

addition we summarize directions for further research.

1.2.1 Publications resulting from the thesis

Referred journal papers

B. G. Small, B. W. McColl, R. Allmendinger, J. Pahle, G. López-Castejón, N. J.

Rothwell, J. Knowles, P. Mendes, D. Brough, and D. B. Kell (2011). Efficient

discovery of anti-inflammatory small molecule combinations using evolutionary

computing. Nature Chemical Biology, 7:902–908, 2011.

Referred conference papers

R. Allmendinger and J. Knowles (2011). Evolutionary search in lethal environments.

In Proceedings of the Evolutionary Computation Theory and Applications

Conference, to appear.

R. Allmendinger and J. Knowles (2011). Policy learning in resource-constrained

optimization. In Proceedings of the Genetic and Evolutionary Computation Conference.

ACM Press, pages 1971-1978.

R. Allmendinger and J. Knowles (2010). Evolutionary optimization on problems

subject to changes of variables. In Proceedings of Parallel Problem Solving from

Nature – PPSN XI. Springer LNCS 6239, pages 151-160.

R. Allmendinger and J. Knowles (2010). On-line purchasing strategies for an evolutionaryalgorithmperformingresource-constrainedoptimization.

InProceedings

of Parallel Problem Solving from Nature – PPSN XI. Springer LNCS 6239, pages

161-170.

Under review

R. Allmendinger and J. Knowles (2011). Policies for dealing with ephemeral resource

constraints in closed-loop optimization. IEEE Transactions on Evolutionary

Computation, under review.


Chapter 2

Closed-Loop Optimization

In the previous chapter, we introduced the class of closed-loop optimization problems

in which solutions are evaluated through physical experimentation rather

than simulation. The objective of this chapter is to give a more comprehensive

introduction to the field of closed-loop optimization and its historic development.

We begin this introduction with a more detailed description of the general concepts

of closed-loop optimization in Section 2.1. An overview of applications of

closed-loop optimization from various fields of science and engineering is provided

in Section 2.2, followed by a summary of common challenges faced by experimentalists

and algorithmic designers in these applications (Section 2.3). Finally, we

give a historical review of approaches used to cope with closed-loop optimization

problems. Theseinclude approachesbasedondesignofexperiments (Section2.4),

the response surface fitting (Section 2.5), and simulated evolution (Section 2.6).

2.1 Concepts of closed-loop optimization

Optimization is concerned with finding an answer ⃗x from a set of alternatives X

that is best with respect to some criterion f. A more technical definition of an

optimization problem of generic form is

maximize y = f(⃗x) (2.1)

subject to ⃗x ∈ X,

where ⃗x = (x 1 ,...,x l ) is a solution vector (also referred to as solution or candidate

solution) and its l components for which values must be found are called

26


2.1. CONCEPTS OF CLOSED-LOOP OPTIMIZATION 27

decision variables. The variables may be integer, real, or a mixture. The feasible

search space X is the set of solutions over which the search is performed. In

a standard constrained optimization problem, the set X is defined by a set of

equality and non-equality constraints. The static objective function (also known

as fitness function) f : X ↦→ Y represents a mapping from X into the objective

space Y ⊂ R.

In many standard optimization problems, the function f is available in algebraic

form allowing an optimizer to evaluate solutions by simply computing f.

However, forsome problems the mapping between thedecision variables andtheir

effects on the fitness of a solution is cost prohibitive to define or simply unknown;

in this case we also say that the function f is black box. For instance, when optimizing

combinations of drugs, the drugs interact in non-linear and complex ways

that are difficult to predict and express in terms of a model. The approach taken

to deal with such problems is to evaluate solutions by a process of physical experimentation

or alternatively expensive computer simulations (Schütz and Schwefel,

2000). This is the type of optimization problem we consider in this thesis, and

they are commonly referred as closed-loop or experimental problems. The term

closed-loop, first used by Box (1957), suggests here that the setup in such problems

establishes an interactive loop between an optimizer and the experimental

platform, potentially including a (human) experimentalist.

Figure 2.1 illustrates a closed-loop setup commonly used in experimental optimization;

Schwefel employed a very similar setup in his flashing nozzle problem

(see Figure 1.1). To make the description of this loop clearer we will distinguish

between two concepts: the genotype or structure of a solution (e.g. nozzle

shape), and the phenotype or actual solution that can be evaluated (e.g. an actual

nozzle). The genotype of a solution ⃗x is planned on a computer using an

optimization algorithm. Then, the process of evaluating this genotype involves

two steps: (i) realizing or prototyping the phenotype of ⃗x, and (ii) measuring the

(inherently noisy) fitness f(⃗x) of the phenotype. Both steps may require experiments

to be conducted (although a strict separation between the two steps is not

always possible). Upon obtaining the fitness value of a solution, this value is fed

back to the computer and considered in the planning of the next solution. This

closed-loop is repeated until some termination criterion is met. The real-world

nature of experimental problems means that the termination criterion may be as

simple as reaching a pre-determined fixed number of experiments conducted, but


28 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

Optimization algorithm

(running on a computer)

Physical Experimentation or

expensive computer simulations

Plan new solution ⃗x

Decision variables (genotype)

of solution ⃗x

- Ranking, selection, variation

- Prepare performance statistics

Prototype ⃗x

E.g.: - Mix drugs

- Adjust instrument

- Run simulation

Phenotype of ⃗x

Noisy measurements

of quality f(⃗x)

Measure fitness of phenotype of ⃗x

E.g.: - Try drug combination in vivo

- Run sample through instrument

- Aggregate simulated data

Figure2.1: Schematicofclosed-loopoptimization. Thegenotypeofacandidatesolution

⃗x is generated on the computer but its phenotype is experimentally prototyped. The

quality or fitness f(⃗x) of a solution may be obtained experimentally too and thus may

be subject to measurement errors (noise).

time and budgetary limitations may enforce premature termination too.

It is common that an experimental loop involves human intervention. For

example, biologists may be involved in the fitness measurement of drug cocktails,

while optimizing the configuration of an instrument may rely on engineers

for replacing and adjusting of instrument parts. Typically, human intervention is

needed whendealingwithlarge-scaleoptimizationproblems, suchasoptimization

of industrial processes, or problems involving a complex and meticulous setup,

such as applications in biochemistry. An alternative way for experimenting is to

employ a fully automated closed-loop setup. Here, a host computer is connected

to all the hardware possibly needed in the experimental loop, allowing it to autonomously

plan, prepare and launch experiments, process their outcomes and

subsequently use them to plan new experiments if needed. 1 Whilst this is a convenient

way of experimenting, it may incur additional costs and effort for setting

upthe loop. The next section takes acloser lookatexperimental setups employed

in applications from different areas of science and engineering.

1 Other researchers, such as Hornung et al. (2001), refer to a fully automated closed-loop as

a self-learning closed-loop.


2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 29

Table 2.1: Domains of closed-loop applications.

Application domain

Literature

Shape design optimizatiogether

and Schwefel (1970); Olhofer et al. (2011)

Rechenberg (1964, 1973, 2000); Schwefel (1968, 1975); Klock-

Evolvable hardware Thompson (1997, 1996a,b); Thompson et al. (1996); Thompson

and Layzell (1999)

Evolving controllers for Nolfi et al. (1994); Floreano and Mondada (1994); Matarić and

robots Cliff (1996); Harvey et al. (1996); Husbands and Meyer (1998);

Nolfi and Floreano (2004); Nelson et al. (2009)

Embodied evolution Ficici et al. (1999); Watson et al. (1999, 2002); Pollack et al.

(1999, 2001); Lipson and Pollack (2000); Zykov et al. (2005)

Experimental quantum Judson and Rabitz (1992); Baumert et al. (1997); Bardeen et al.

control (1997);Hornunget al. (2001);Zeidleret al. (2001);Pearsonet al.

(2001);Shir et al.(2007,2009);Shir(2008);Roslundet al.(2009)

Adaptive optics Sherman et al. (2002); Lubeigt et al. (2002); Marsh et al. (2003);

Wright et al. (2005b,a)

Fermentation optimizatiorni

(1995); Weuster-Botz and Wandrey (1995); Weuster-Botz

Rinconetal.(1993);Montserratet al.(1993);PoornaandKulka-

et al. (1995); Weuster-Botz (2000); Davies et al. (2000)

Functional genomics Zytkow et al. (1990); King et al. (2004); Byrne (2007)

Optimization of analyti-

Vaidyanathan et al. (2003); O’Hagan et al. (2005, 2007); Wallace

calinstrumentconfigura-

tions

et al. (2007); Jarvis et al. (2010)

Drug discovery Singh et al. (1996); Calzolari et al. (2008); Wong et al. (2008);

Optimizing taste and

aroma of food

Optimization of running

industrial processes

Caschera et al. (2010); Small et al. (2011)

Herdy(1997);FerrierandBlock(2001);YangandVickers(2004);

Ferreira et al. (2007); Knowles (2009); Veeramachaneni et al.

(2010)

Box (1957); Hunter and Kittrell (1966)

2.2 Applications of closed-loop optimization

Thissectiongivesahistoricaloverviewofapplicationsinclosed-loopoptimization.

The main applications are also summarized in Table 2.1.

In the 60s and 70s, closed-loop optimization problems were mainly concerned

with shape design problems for fluid dynamics. Schwefel’s work on shape

optimization of a flashing nozzle, which we described in the previous chapter,

is an example of this type. Another example is the work of Rechenberg (1973)

on optimizing the shape of a pipe bend so as to obtain minimal flow losses. The

bending was held by 6 manually adjustable shafts, and an optimization algorithm

running on a computer provided the shaft setting (genotype) to be tested. The

shafts were then adjusted accordingly (i.e. the phenotype was constructed) by


30 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

an experimentalist. In a more modern version of the experiment, this task was

carried out by an industrial robot. Similar to Schwefel’s application, the form of

a bending was assessed by injecting pressured air into one side of the pipe using a

kettle, and measuring the pressure of the air at the other side of the pipe. There

are many further examples of shape design problems; for detailed descriptions

please refer to (Rechenberg, 1964, 1973; Schwefel, 1975; Rechenberg, 2000).

Due to the increasing computational power and the constant development of

innovative simulation software, problems related to fluid dynamics can nowadays

often be optimized in silico. This technological progress affected the prominence

of closed-loop optimization. However, in the last 20 years or so, closed-loop optimization

regained its popularity thanks to applications in areas like evolvable

hardware, evolution of controllers for robots, and applications in biochemistry.

The general objective in the field of evolvable hardware (Greenwood and

Tyrrell,2006)istoimprovethesetupofanelectriccircuitusinganapproachbased

on simulated evolution. The work of Thompson and his collaborators (Thompson,

1996a,b;Thompson et al.,1996;Thompson andLayzell, 1999)wasseminal to

this field. One of his studies (Thompson, 1996a) was related to the configuration

of a FPGA consisting of 10×10 reconfigurable cells. The genotype of a particular

configuration was encoded onto a bit string of length 1800, with 8 configurable

bits allocated to each cell (the bits are related to the Boolean function a cell can

perform). A computer, and on it implemented an optimizer, was connected to

the chip allowing for automatic reading in and updating of the current configuration

(see Figure 2.2). The experimental setup was such that no configuration

could damage the chip nor had legality constraints to be checked. The aim was

to design a configuration that is able to discriminate between two power levels of

square waves presented at the input using a tone generator by accordingly changing

the output voltage (which was visually inspected by an oscilloscope). The

quality of a configuration was measured by calculating the mismatch between

outputs observed over a random sequence of bursts of the two input waves. As

an example where such a circuit could be used, Thompson (1996a) mentions the

demodulation of frequency-modulated binary data received over a telephone line.

Later, Thompson (1997) extended this work by considering difficult implementation

constraints such as fault-tolerance. The interested reader is referred to Yao

and Higuchi (1999) for a general review of challenges in the interesting and still

very active field of evolvable hardware.


2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 31

Output

(to oscilloscope)

Analogue

integrator

FPGA

Configuration

Tone

generator

Figure 2.2: Experimental arrangement as used by Thompson (1996a) in the configuration

optimization of an FPGA chip. A computer, and on it implemented an optimizer,

was connected to the chip allowing for automatic reading in and updating of the current

configuration. A tone generator drove the circuit’s input, and an oscilloscope was

employed to visually inspect the voltage of the circuit’s output.

Compared to the pioneering design problems tackled by Schwefel and Rechenberg,

we notice that the experimental loop employed in Thompson’s FPGA

work is fully automated and works without human intervention. A fully automated

closed-loop (although sometimes supported by simulation experiments

conducted beforehand) has also been adopted in the evolution of controllers

for robots (Nolfi et al., 1994; Floreano and Mondada, 1994; Matarić and Cliff,

1996; Harvey et al., 1996; Husbands and Meyer, 1998; Nolfi and Floreano, 2004;

Nelson et al., 2009). The general objective in this application is to improve the

structure (e.g. weights and thresholds of activation functions) of a neural network

(Bishop, 1995). The neural network is then used to transform the sensory

inputs describing the environment into motor outputs of the robot. The quality

of a controller is measured by assessing the behavior of the robot when applied

to a particular task. For example, when the task is to reach the centre of a room

from a fixed starting point, then a control system can be considered better the

smaller the average distance between the robot and the centre is. As the fitness

is expressed in terms of a robot’s behaviour, the design of an appropriate fitness

function becomes increasingly more challenging with the complexity of tasks to

be performed by the robot (Harvey et al., 1996).

Some researchers believe that the future in evolving controllers for robots

lies in a fully automated closed-loop procedure referred to as embodied evolution

(Ficici et al., 1999; Watson et al., 2002): In this procedure, a population


32 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

of physical robots autonomously evolves in the task environment by reproducing

with one another. A robot evaluates its own performance and broadcasts it along

with information related to its controller setup to other robots in the population.

These can then decide whether they want to use this information to update their

own controller setup. Despite the difficulty of realizing embodied evolution in

real-world (e.g. due to power and communication constraints), preliminary and

promising results have been published e.g. by Pollack et al. (1999); Watson et al.

(1999); Lipson and Pollack (2000); Pollack et al. (2001); Zykov et al. (2005).

A fully automated closed-loop that is rather inexpensive to execute is employed

in experimental quantum control. The aim in this application is to

tailor laser light fields by means of spectral amplitude and phase variables in order

to control things like the position, orientation, velocity, and quantum states

of the sample of interest (Zeidler et al., 2001; Pearson et al., 2001; Roslund et al.,

2009). The ultimate objective can be, for example, establishing a laser light field

that yields maximal molecular alignment (Shir et al., 2007, 2009). The convenient

property of quantum control experiments is their short duration, which is of

order of 1 second (Shir et al., 2009). This property allows for testing of different

optimization algorithms and setups to rather low cost; Figure 2.3 shows a setup

commonly employed in experimental quantum control. Additional information

related to applications in experimental quantum control including physical background

can be found in (Judson and Rabitz, 1992; Baumert et al., 1997; Bardeen

et al., 1997; Hornung et al., 2001; Shir, 2008).

Inexpensive experiments within a fully automated closed-loop setup can also

be conducted in adaptive optics. The general aim in this field is to design microscopytechniquesthatachieveopticallysectionedimageswithhighresolutionof

thebiologicalsampleunderinvestigation. However, whenimagingatdepthwithin

a biological sample, the resolution may suffer due to optical aberrations. Active

optical elements, such as deformable mirrors, aim at counteracting this effect by

altering the wavefront of the light source. Using a fully automated closed-loop

approach, Wright et al. (2005a) showed that the resolution, or, more precisely,

the brightness of the image of a sample can be optimized by appropriately configuring

the shape of the deformable mirrors. In this particular application, the

shape of the mirrors was controlled by 37 electrostatic actuators resulting in a

continuous 37-dimensional search space. To test a mirror shape, a test sample

was used instead of a real one in order to allow for accurate controlling of the


2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 33

Figure 2.3: Experimental arrangement of an application in quantum control. A dynamic

pulse shaper is used to generate modulated laser pulses. Experimental feedback

signals are then collected from the liquid-phase emission spectroscopy, the Cs gas cell,

and filtered optically and detected by the amplifier. The computer is used for reading

in the processed signals and for updating the modulator. Figure taken from Meshulach

and Silberberg (1998).

degree of aberration. Wright et al. report that an entire optimization run takes

on average between 1 and 12 minutes, depending on the algorithm. Researchers

in adaptive optics relying on a similar closed-loop setup include e.g. Sherman

et al. (2002); Lubeigt et al. (2002); Marsh et al. (2003); Wright et al. (2005b).

When dealing with large-scale applications where experimentation involves

complicated and/or extensive preparation and testing steps, it can be difficult

and costly to set up a fully automated closed-loop. Applications in fermentation

optimization as considered by Davies et al. (2000) belong to this class of

closed-loop problems. The goal in the work of Davies et al. was to design silage

additive combinations that yield herbage with high nutritional values and good

storage properties. Before the herbage could be analyzed, it had to be grown for

4 weeks, cut, additively treated and packed at the start of the ensilage process,

and after 2 days again unpacked. The evaluation of an entire generation, which

was limited by logistical constraints to 50 additive treatments, took two weeks


34 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

(as two plots of grass were available); and, the number of generations was limited

due to the constraints of herbage production over the normal growth season.

To account for the apparent risk of measurement errors coming from biological,

experimental and machine variability, each generation included replicates of control

silage (the same herbage but without any additives) and replicate testing of

silage additive combinations was applied too. An interesting aspect of the work

is that costs were associated with additive concentrations in order to discourage

the search for silages containing additives of high concentrations (which would

be difficult to realize on a farm scale). Similar closed-loop approaches to fermentation

optimization have been considered by Weuster-Botz and Wandrey (1995);

Weuster-Botz et al. (1995); Weuster-Botz (2000); while these studies use optimization

algorithms based on simulated evaluation, the work of Rincon et al.

(1993); Montserrat et al. (1993); Poorna and Kulkarni (1995) employs response

surface methods (see Section 2.5).

Closed-loop optimization is often the only way of experimenting in biochemistry

applications, such as combinatorial drug discovery and optimization of analytical

instrument configurations. A seminal closed-loop study in this field includes

the robot scientist work of King et al. (2004). In this work a system has

been developed (see Figure 2.4) to autonomously solve an inference problem

in the field of functional genomics by a process of originating hypotheses to

explain experimental data, devising and running experiments to test these hypotheses

using a laboratory robot, interpreting the results to falsify hypotheses

inconsistent with the data, and originating new hypotheses. A fully automated

closed-loop for tackling inference problems has already been employed by Zytkow

et al. (1990), though on a limited scale. In addition to achieving a high inference

accuracy, the robot scientist accounts also for variable experimental costs. This

gives risetotheproblemofhowtoselect aninformative batchofexperiments that

can be tested with little expense; Byrne (2007) has opted for a multi-objective

optimization approach to tackle this problem.

Modern analytical instruments such as mass spectrometers involve a large

number of configuration parameters, and testing them systematically in order to

optimizesomeobjectiveisoftenimpossiblebecauseofthelargenumberofpossible

combinations. Consider the instrumentation optimization work of O’Hagan

et al. (2005, 2007) aimed at detecting as many metabolites in a biological sample

as possible using mass spectrometry. This application involved 15 integer


2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 35

Background

knowledge

Machine

learning

Analysis

Consistent

hypotheses

Experiment(s)

Final

hypothesis

Experiment

selection

Robot

Results

Figure 2.4: Experimental arrangement as used in the robot scientist study of Kinget al.

(2004). An inference problem is solved using a computer-controlled closed-loop setup:

hypotheses are originated to explain experimental data; experiments are devised and

executed by a robot to investigate these hypotheses; results are interpreted to falsify

hypotheses inconsistent with the data, and new hypotheses are originated.

variables or instrument parameters that made up a search space of 7.32 × 10 9

different instrument configurations. A typical closed-loop cycle in analytical experimentation

involves to set up the instrument (phenotype) according to some

configuration (genotype), and then to evaluate the quality of the configuration

by running a biological sample through the instrument and measuring properties

of the instrument’s output signal, such as the number of peaks detected,

signal-to-noise ratio of the peaks, and run time. O’Hagan et al. (2005, 2007) fully

automated this cycle, while the studies presented by Vaidyanathan et al. (2003);

Wallace et al. (2007); Jarvis et al. (2010) involved human intervention, with the

human being involved in tasks like changing instrumentation settings.

Combinations of pharmaceuticals (also referred to as drugs, reagents, or compounds)interactincomplexwaysthattypicallycannotbepredicted.

Hence, when

optimizing combinations of drugs and/or concentrations of these drugs according

to some criterion, such as the expression of pro-inflammatory cytokines (Small

et al., 2011), an experimental approach needs to be adopted. In a common drug

discovery application, a human experimentalist arranges a library of promising

or relevant drugs to be considered by an optimizer running on a computer.

The library size may range from 10 drugs (Wong et al., 2008) to a few dozen

drugs (Small et al., 2011), and sometimes be as large as 100 drugs (Calzolari

et al., 2008). Drug mixtures (genotypes) devised by the optimizer are then mixed

together (phenotyped) commonly using a laboratory robot, and their potency is


36 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

measured using analytical tests. However, this process may involve several complex

and manually performed steps related to the storage and maintenance of

individual drugs, setup of analytical tests, and the actual testing. A glance at

the method section of the above cited studies including the work of Singh et al.

(1996); Caschera et al. (2010) will confirm the sophistication of experimental

setups in drug discovery applications.

Recent studies indicate that closed-loop optimization has also been used by

the food industry for improving taste and aroma of food. Knowles (2009)

reports on the optimization of a cocoa roasting process employed by a chocolate

manufacturer. The precise objective in this application is to optimize the design

of a roast curve so as to create cocoa of a specific desired flavour. A genotype

defines the properties of a roast curve including the temperature with which

cocoa nibs (the center of the beans) are roasted, and the placement and duration

of water injections during the roasting. Upon physically realizing a roast curve,

the quality of the resulting cocoa is subjectively assessed by a human taster.

Obvious challenges in this application are the variability in the quality of the

raw cocoa nibs, and the subjective fitness assessment, which greatly depends on

the experience of the human taster. Unfortunately, the optimization of taste and

aroma is often a business secret, limiting the number of studies publicly available.

Nevertheless, we have found applications related to the closed-loop optimization

of blends of coffee (Herdy, 1997), the blending of wine (Ferrier and Block, 2001),

and the taste of cheddar (Yang and Vickers, 2004). More accessible is research

related to the optimization of procedures for the isolation of taste in a food

sample (Ferreira et al., 2007). Of interest is the recent study of Veeramachaneni

et al. (2010), which mentions the application of closed-loop optimization in the

design of food flavors. More precisely, a small set of mixtures of ingredients are

subjectively evaluated by a panel of consumers using a closed-loop approach. A

model (surrogate) of the consumer data is then generated and used to optimally

determine a set of flavors liked by some target group of panellists.

Further and perhaps more isolated applications of closed-loop optimization

can be found, for example, in the field of combinatorial chemistry (Schlögl,

1998; Gobin et al., 2007; Wolf et al., 2000), ceramic ink-jet printing (Evans

et al., 2001), combustion process optimization (Bücheet al., 2002;Paschereit

et al., 2003; Hansen et al., 2009), directed evolution (Arnold, 1998; Knight

et al., 2009; Platt et al., 2009), and deep brain stimulation therapy (Feng


2.2. APPLICATIONS OF CLOSED-LOOP OPTIMIZATION 37

Responses

Information board

Plant

EVOP committee

Plant configuration

Plant manager

Figure 2.5: Schematic of the closed loop as employed with evolution operation

(EVOP)(Box, 1957) tooptimizearunningmanufacturingprocess. TheEVOPcommittee

contains specialists on EVOP that serve as advisors but the ultimate responsibility

for running the scheme rests with the plant manager. She sets up the plant configuration

to be tested. The resulting responses are then used to update the information

board on which future decisions are made.

et al., 2007). 2 Detailed case studies of selected applications in instrument optimization,

drug discovery, directed evolution, and cocoa roasting have also been

presented by Knowles (2009).

The above closed-loop applications have in common that the search for an

optimal solution is terminated after some fixed period of time (e.g. due to time

and/or budgetary limitations) or once a solution of high quality has been found.

Typically, the aim is then to manufacture the solution found (e.g. a drug combination,

nozzle, software, electric circuit, etc.) many times and sell it to the public

or other companies; perhaps improving it further at some future point in time

when the quality of raw material and technological equipment has advanced.

However, closed-loop setups have also been adopted in the continuous improvement

of a running industrial process, similar to the cocoa roasting process

mentioned above but on a larger scale. The goal here is to optimize some cumulative

performance of the process over an infinite or finite time horizon, rather than

finding a single optimal (and ultimate) solution, and this needs to be done in a

controlled and restricted fashion to reduce the risk of producing off-quality material.

To achieve this goal, Box (1957) suggested an approach termed evolution

operation (EVOP) (see Figure 2.5). The objective of EVOP is to continuously

improve a running industrial process by applying small changes (mutations) to

2 Although the work of Feng et al. (2007) describes a closed-loop optimization framework

(which the authors hope to realize in future) for achieving deep brain stimulation therapy of

Parkinsonian motor symptoms, for validation purposes a computational model is used.


38 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

it. Rather than taking random mutation steps, EVOP is guided by a committee

including human experimentalists and specialists, such as statisticians and

experts on EVOP. Since its introduction in 1957, EVOP has been applied in numerous

sectors including chemical manufacturers, automotive industry, natural

gas producers, food industry, and canning industry. Thanks to EVOP, savings

were made, for example, due to reduced usage of raw material, reduced time cycles

in batch operations, and equipment modifications. In addition to savings and

the resulting profits, EVOP contributed also to the understanding of certain processes,

and pointed engineers to the importance of specific control variables. An

extensive review of EVOP applications and benefits of EVOP has been published

by Hunter and Kittrell (1966).

Nowadays, the movement of continuously improving a process including products

and services with the aim to reduce waste and time is referred to as lean

manufacturing or lean management (Shah and Ward, 2003). Tools for realizing

lean manufacturing include EVOP but also techniques such as just-in-time production

(Sugimori et al., 1977), design of experiments (Montgomery, 1976) and

Taguchi methods (Taguchi, 1989; Peace, 1993).

2.3 Challenges of closed-loop optimization

From the description of some of the applications listed above it is apparent that

closed-loop problems feature challenges that may not be prevalent in standard

problems tackled by computer simulation. We will briefly recall the main challenges

here; for a more detailed description please refer to Knowles (2009).

Arguably the most obvious challenge of closed-loop optimization is that evaluations

of solutions are generally expensive. This concerns both the time and

costs needed to conduct experiments. In the presence of time and/or budgetary

limitations this factor can restrict the number of evaluations available to several

tens or hundreds only.

Experiments may be of different durations and associated with non-homogeneous

experimental costs. This scenario has been encountered e.g. in the robot

scientist work (King et al., 2004) and in fermentation optimization (Davies et al.,

2000). In the presence of a limit budget, this challenge can cause an optimizer to

account for both fitness gradients and differential costs of evaluations.

Designing a suitable fitness function may be difficult. This challenge exists


2.4. DESIGN OF EXPERIMENTS 39

in both computational and physical experimentation, but the complexity of real

world systems means often that fitness function design is more delicate and laborious

in practice (Mondada and Floreano, 1995; Matarić and Cliff, 1996; Knowles,

2009). As an example, consider the design of a fitness function that evaluates

the behavior of a robot in achieving a certain task. The difficulty of designing a

suitable fitness function tends to increase with the complexity of the task to be

fulfilled (Harvey et al., 1996).

Experiments are inherently noisy, and may be subject to uncertainty and uncontrolled

factors. Noise is a challenge that is much studied, and several methods

have been proposed to account for it. In Section 3.2 we will introduce some of

these methods in the context of evolutionary optimization.

Experimental setup and environmental factors may dictate settings of the optimization

algorithm. Recall that this situation was present, for instance, in

the fermentation optimization work of Davies et al. (2000). A population-based

optimizer was employed in this application. Its population size was limited by

logistical constraints to 50 additive treatments, and the number of generations

the optimizer could be employed for was also limited due to the constraints of

herbage production over the normal growth season. In biochemistry applications,

such as in drug discovery (Small et al., 2011), the use of standard laboratorytools

like microwell plates may dictate the population size too.

Experimentation may be subject to encoding issues and constraints. It is common

for closed-loop problems to have an encoding with components being both

discrete (or binary) and real values. Constraints may restrict the values certain

variables may take but also complicate the optimization process in other ways

(as investigated in this thesis in Chapters 4 to 6). For example, in the evolution

of robot controllers, the lifetime of the robot battery and the robot itself may

slow down the optimization procedure due to recharging, maintenance and repair

duties (Matarić and Cliff, 1996). Constraints may also be challenging to identify

and define as reported by Knowles (2009).

2.4 Design of experiments

Traditionally, closed-loop problems have been dealt with using methods from the

design of experiments (DoE) (also known as experimental design (ED)) (Montgomery,

1976;Masonetal., 1989;Boxet al.,2005)discipline. These arestatistical


40 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

x 2

x 3

x 1

Figure 2.6: Plot showing factor combinations to be tested with a Box-Behnken experimental

design for a system with l = 3 factors with each factor having three levels.

The factor combinations are specifically chosen to allow the efficient estimation of the

coefficients of a quadratic model as defined in Equation (2.2).

methods traditionally concerned with explaining a system or process in terms of

a model using as few informative experiments as possible. 3 This model can then

be used to study different properties of a system, such as the most important

factors of the system or the performance of the system in the presence of noise.

Figure 2.6 shows an example of a particular experimental design. For exhaustive

reviews of experimental designs, including conditions when their application

is recommended, and instructions to construct them, please refer to the books

(ibid.).

The three basic principles employed by many DoE techniques are replication,

randomization, and blocking (also known as local control) (Preece, 1990; Box

et al., 2005). Replication (i.e. repeating of the entire experiment or a portion of

it under two or more sets of conditions) is a way for estimating and accounting for

noise, while randomization and blocking are used to deal with variabilities caused

by unavoidable sources, or nuisance factors (e.g. ambient temperature or the time

of the day the experiment was conducted). Randomization refers to the process

by which factor combinations for testing are allocated to individual experiments

in a random or unbiased approach. Blocking is the process by which the total set

of trials is partitioned into subsets (blocks) that are as homogeneous as possible.

TheDoEdiscipline hasalsoproduced methodsforseeking optimal factorcombinations:

factorial, univariate, steepest-ascent, and random methods (Brooks,

1958, 1959). The univariate and steepest ascent method are sequential procedures,

while the other two methods generate an entire set of trials for testing

3 The terminology in the field of DoE differs slightly from the one in optimization. Most

notably, decision variables are referred to as factors, their values as levels, fitness as response,

and an experiment testing a particular factor combination (candidate solution) is called a trial.


2.5. RESPONSE SURFACE METHODS 41

beforehand. The random method tests factor combinations selected at random

in the factor space. Compared to factorial and univariate methods, which test

levels of all factors in some systematic approach, random methods can be applied

to systems involving a larger number of factors. Compared to steepest-ascent

methods, random methods are also less likely to find only local optimal factor

combinations. In any of these methods, there are two options by which an optimal

factor combination can be estimated. The first option is to simply select

the factor combination with the highest response from the set of all trials made.

The other option is to construct a response surface by fitting a polynomial surface

to the responses of the trials made, and then estimating the optimal factor

combination by finding the highest point of this surface. We will describe this

methodology in more detail in the next section.

2.5 Response surface methods

Traditionally, a response surface is fitted using polynomial models (in the factors)

for the responses y (Box and Wilson, 1951). As an example, consider the

quadratic model

y = b 0 +

l∑

b i x i +

i=1

∑l−1

l∑

i=1 j=i+1

b ij x i x j +

l∑

b ii x 2 i +ǫ . (2.2)

i=1

This model is expressed in terms of main effects of factor levels (x 1 ,...,x l ), their

interactions or joint effects with other factor levels (x 1 x 2 ,...,x l−1 x l ), and their

quadratic components (x 2 1 ,...,x2 l ). To account for noise and experimental errors,

a random error ǫ (assumed to be uncorrelated and distributed with mean 0 and

constant variance) may beconsidered inthe model aswell. The normal procedure

is to fit a surface to the available data using the least square method; Figure 2.7

shows an example of a response surface. Following this, the validity of the fitted

surface may be investigated using an analysis of variance (ANOVA), whereby the

observed variability of the responses (measured by sums of squared deviations

from mean response) is partitioned into independent, or orthogonal, components

from identifiable sources. A comprehensive introduction to least squares, and

the application of ANOVA in the context of response surfaces can be found in

Chapter 10 of (Box et al., 2005).

Response surface methods (RSMs) employing the above described approach


42 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

x 2

5

0

-0.5

4.5

-1

4

3.5

-2

-2.5

-1.5

1.5

1

2

3

2

1

0

3

-1

2.5

-2

2

0.5 -3

-5 -4 -3 -2 -1 0

x 1

Figure 2.7: A response surface plot for a system with l = 2 factors. The surface is

fitted to the nine data points representing different experiments.

generate a response surface that tends to be non-interpolating because they minimize

only the sum of squared deviations from some pre-determined (polynomial)

function. Such a surface may be misleading for seeking maxima as it may not

sufficiently approximate the shape of the actual underlying model (Jones, 2001).

This drawback encouraged many researchers to develop RSMs that approximate

the actual model more accurately, ideally, also in fewer trials. Modern RSMs

tend to generate a surface that interpolates the observed data points using a linear

combination of ‘basis functions’ (one ‘basis function’ term is centred around

one data point), such as thin-plate splines and kriging (Matheron, 1963; Sacks

et al., 1989).

RSMs can be further classified into two-stage and one-stage (or sequential)

DoE methods (Robbins, 1952b). Two-stage approaches fit a surface to observed

data in stage one, estimating any required parameters (related to the surface). In

the second stage, these parameters are assumed to be correct and the surface is

used to select the next data point for evaluation. The drawback of this approach

is that the search can be misguided if the estimated surface from stage one is

incorrect. Sequential DoE methods aim at overcoming this issue in that they

skip the first stage of fitting a surface to observed data. Instead, hypotheses

about the location of the optimum (its variables ⃗x and fitness f(⃗x)) are made

and evaluated by determining properties of the best-fitting response surface that

passes through the observed data and the point (⃗x,f(⃗x)) (Jones, 2001). However,

this approach comes at a cost of high computational demands, particularly when

kriging is used (because a matrix of estimated correlations between responses

needs to be inverted). For a more comprehensive review of one- and two-stage


2.6. SIMULATED EVOLUTION IN CLOSED-LOOP OPTIMIZATION 43

RSMs please refer to Jones (2001); Myers et al. (2009).

An alternative approach to RSMs are methods based on simulated evolution,

which we will discuss in the next section.

2.6 Simulatedevolutioninclosed-loopoptimization

Proposals to consider ideas from nature, in particular, the concept of natural selection

and variation, for closed-loop optimization were already made in the late

1950s. Brooks (1958) proposed a creeping random method that extends the DoEbased

random method introduced elsewhere (see Section 2.4) with the concept

of natural selection. The experimental design technique of Box (1957), EVOP

(see Section 2.2), employed the concepts of natural selection and variation too,

although the variation steps are proposed by some committee rather than being

random. Remember, however, that the objective of EVOP is to optimize a cumulative

score over an infinite or finite time horizon rather than to find a single

optimal (and ultimate) solution.

The work of Bremermann (1962); Fraser (1957); Friedberg (1958); Friedberg

et al. (1959) and others has further popularized approaches based on simulated

evolution and initiated the field of evolutionary computation. In particular,

this pioneering work influenced the development of the three related approaches,

genetic algorithms (GAs) (Holland, 1975; De Jong, 1975), evolution

strategies (ESs) (Rechenberg, 1973; Schwefel, 1975), and evolutionary programming

(EP) (Fogel, 1962, 1964), nowadays commonly referred to as evolutionary

algorithms (EAs). All three approaches are population-based search techniques

that generate solutions based on a process of simulated evolution. For a more

comprehensive historical overview of the development of EAs please refer to (Fogel,

1994; Bäck et al., 1997; Fogel, 1998).

WhilstcertainDoEmethods, suchasthecreeping randommethodandEVOP,

and EAs may share similar working principles, there are differences in the objectives

pursued by the two algorithm types: DoE is typically applied to lowdimensional

search spaces with the aim of obtaining statistically robust results

using as few experiments as possible. EAs, on the other hand, are approaches


44 CHAPTER 2. CLOSED-LOOP OPTIMIZATION

commonly used for finding a single optimal solution, using typically many evaluation

steps, to problems featuring high-dimensional search spaces e.g. combinatorial

optimization problems such as the TSP. Closed-loop optimization problems

often fall in the intersection of the two scenarios and thus require ideas from

both areas. In particular, we believe that closed-loop evolution methods should

draw on DoE in areas like noise-handling, replication, and blocking (e.g. the pipe

bending setup of Rechenberg (2000) used blocking). To reduce the number of

expensive evaluations, ideas from sequential DoE and RSMs can also be incorporated

into evolutionary search, where they are known variously as surrogate

models, meta-models or fitness approximation methods; this step has already

been realized in recent years (see e.g. Jin et al. (2002); Ong et al. (2004); Jin

(2005) for surveys on this subject). To our knowledge, however, the DoE field has

not so far considered resourcing issues as a perturbing influence on conducting

the most informative experiments, as we do here.

There are also several aspects that make EAs suitable candidates for closedloop

optimization: Evolutionary search relies solely on the fitness of previously

evaluated solutions and not on information related to the problem instance being

solved, making EAs suitable for problems featuring a black-box function, such as

closed-loop problems. In fact, evolution does not even need to be given specific

fitness values but searching for behavioral novelty may be sufficient to solve a

problem (Lehman and Stanley, 2011). EAs can also be hybridized easily if prior

knowledgeabouttheproblemathand, gradientinformation, orothermethodsare

available (Blum and Roli, 2003). The ability to employ a population of solutions

allows EAs to conveniently adjust the population size to meet a given experimental

setup, and, moreover, to copewithproblems potentiallyfeaturing multiple objectives

(Deb, 2001), complex (non-linear and dynamic) constraints (Michalewicz

and Schoenauer, 1996; Coello, 2002; Nguyen and Yao, 2009a), and a dynamically

changing fitness landscape (Branke, 2001).

2.7 Chapter summary

In this chapter we have reviewed the field of closed-loop optimization, which is

characterized by the property that solutions are evaluated by conducting experiments

(asopposedtoevaluating aclosed-formmathematical expression). Wefirst

introduced the general concepts of closed-loop optimization. Then, we reviewed a


2.7. CHAPTER SUMMARY 45

variety of closed-loop applications across the sciences and engineering, and summarized

the main challenges of closed-loop optimization. Finally, we provided an

overview oftechniques employed inclosed-loopoptimizationincluding approaches

based on design of experiments, response surfaces and simulated evolution. We

favour theapplicationofsimulatedevaluationapproachesand, inparticular, EAs,

to closed-loop problems and focus on them in this thesis. The reasons for this

choice have been stated in the previous section. The next chapter will introduce

EAs in more detail, and review techniques employed within EAs for coping with

many of the challenges common to closed-loop optimization, such as noise, constraints,

and dynamically changing environments. These challenges are related

to the resourcing issues considered in this thesis, and understanding how they are

dealt with is important and beneficial for the development of our own strategies.


Chapter 3

Evolutionary Optimization

In the previous chapter we introduced the field of closed-loop optimization and

motivatedthechoice forusing evolutionary algorithms(EAs) asoptimizers inthis

domain. The next section introduces some of the basic principles of evolutionary

search andreviews variousapproachesfortuningandcontrollingparametersofan

EA. The application domain of EAs includes several problem categories related

to closed-loop problems and the particular resourcing issues (ephemeral resource

constraints, changingvariables, andlethalenvironments) consideredinthisthesis:

the main categories are noisy, constrained, dynamic, and online optimization

problems. Section 3.2 to 3.5 review how EAs have been tuned to tackle these

problem categories (these sections will also highlight the relationship between the

resourcingissuesconsideredinthisthesisandthedifferentproblemcategories, but

for a more detailed description of the resourcing issues themselves please refer to

the subsequent chapters). Tuning of EAs has also been realized with techniques

from machine learning, such as reinforcement learning techniques. Section 3.6

gives a brief introduction to the RL field, and reviews methods for extending

EAs with RL to improve performance.

3.1 Evolutionary Algorithms

From Section 2.6 we know that evolutionary algorithms (EAs) (Fogel, 1962;

Rechenberg, 1973;Schwefel, 1975;Holland, 1975;Goldberg,1989)arepopulationbased

techniques that tackle optimization problems based on models of biological

evolution. We can think of biological evolution as a process where fit individuals

of a population are passed on from one generation to the next upon undergoing

46


3.1. EVOLUTIONARY ALGORITHMS 47

Table 3.1: Nomenclature used in evolutionary algorithms

Term

Meaning

Individual Solution

Population Set of individuals

Fitness In the context of EAs, the term fitness is used to refer

an individual’s quality, while in biology it refers to the

reproductive success of an individual in passing its

genetic information to the next generation

Objective function Function that computes the fitness of an individual

Chromosome/Genome Encoded individual; often represented as a string of

parameters

Gene Basic unit of an individual; in optimization, a gene

can be seen as a decision variable

Allele Value of a gene

Locus Position of a gene in a string; this term is often

used interchangeably with the term gene

Genotype Genetic structure that represents an individual

Phenotype Individual that can be evaluated

Crossover/Recombination Process by which two or more individuals exchange

some of their genetic information; alternatively, the term(s)

refer to the operator that performs this process

Mutation Process by which an individual’s genetic information

undergoes random changes; alternatively, the term refers

to the operator that performs this process

Variation Crossover and/or mutation

Parent Individual used to generate new individuals (offspring)

Offspring/Child Individual resulting after applying crossover

and/or mutation to parents

(Parental) selection Process by which parents of the current population are

selected for crossover and/or mutation

Generation One iteration through parental selection, variation and

environmental selection, to arrive at an updated population

Environmental selection Process by which individuals from the combined pool of

offspring and/or current population are selected to form

the new population

variation in form of mutation and/or crossover. Due to the relationship to biology,

many biological terms have been adopted to describe EAs; these terms and

their meanings are summarized in Table 3.1.

Let us describe the working scheme of a basic EA (see Algorithm 3.1) in more

detail; modifications of this scheme will be introduced in the remainder of this

section. Thisthesisconsiders EAsfeaturingaverysimilarstructuretothatshown

in Algorithm3.1. Inthe algorithm, µ denotes the size of the current populationof

solutions Pop, also referred to as parent population. Upon initializing Pop with,


48 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.1 A basic generational evolutionary algorithm

Require: µ (parent population size), λ (offspring population size), f (objective function)

1: g = 0 (generation counter), Pop = ∅ (parent population), OffPop = ∅ (offspring population)

2: initialize Pop

3: evaluate all solutions in Pop using f

4: while termination condition not satisfied do

5: repeat

6: select parents from Pop

7: generate offspring by applying variation operators to selected parents

8: evaluate offspring using f and add them to OffPop

9: until λ offspring are created

10: form new population Pop by selecting solutions from Pop and OffPop

11: g++

12: return best solution of Pop

for example, µ randomly generated individuals, an offspring population OffPop

of size λ is incrementally generated. This is done by first selecting parents for

reproduction from Pop, and then applying variation operators to them in order

to produce offspring solutions; commonly used variation operators are crossover

and/or mutation. Before offspring individuals are added to OffPop, they are evaluated

using the objective function f. Once the offspring population is complete,

a new population Pop is formed by selecting fit solutions from the combined pool

of solutions from Pop and OffPop. This step is referred to as environmental selection,

and may involve selecting the fittest µ solutions from Pop and OffPop. This

variant is referred to as (µ+λ) reproduction, and the special case where λ = 1

is also referred to as steady-state reproduction. The concept where the fittest

individuals found so far are allowed to survive to the next generation is known

as elitism (De Jong, 1975). Alternatively, if λ ≥ µ, a new population Pop can be

formed by selecting the fittest λ offspring individuals from OffPop; this scheme is

referredtoas(µ,λ)reproduction, andthespecial casewhereλ = µisalsoreferred

to as generational reproduction. The notations (µ+λ) and (µ,λ) were originally

used within evolution strategies (ESs) (Rechenberg, 1973; Schwefel, 1975).

ThevarietyofEAsisverylarge. Inadditiontodifferentreproductionschemes,

there are different schemes for performing parental selection, recombination, and

mutation. We will give examples of common EA setups. Two commonly used

selectionoperatorsarefitnessproportionalselection(alsoknownasroulette-wheel

selection) and tournament selection. The former operator selects a parent with a

probability that is proportional to its fitness. The latter operator first chooses a

set of TS solutions (TS is referred to as the tournament size) at random from the


3.1. EVOLUTIONARY ALGORITHMS 49

One-point crossover

Bit flip mutation

Parent 1

1 0 1 1 0 1

Parent 2 0 0 0 1 1 0

Crossover point e = 2

Offspring 1

1

0 0 1 1 0

1

1

0 1 1 0

Offspring 2

0 0 1 1 0 1 0 0 0 1 1 1

Figure 3.1: Visualization of one-point crossover (left) and bit flip mutation (right).

One-point crossover selects a single crossover point e (here e = 2) at random, and

then swaps all genes beyond that point in either parent’s chromosome between the two

parents. Bit flip mutation flips each gene in an offspring’s chromosome independently

with some probability p m .

current population, compares these solutions, and then selects the best one as a

parent. Note, in a standard EA, this selection step is carried out twice to obtain

two parents. Typically, with probability p c the two parents are then recombined

by, for example, swapping each of their genes with a probability of 0.5; this

form of crossover is known as uniform crossover (Syswerda, 1989). Alternatively,

parents can be recombined using one-point crossover, in which case a random

crossover point e ∈ {1,...,l − 1} is chosen and all genes of the chromosome

after that point are swapped; analogously, one can define a two-point or, more

generally, a multi-point crossover operator. The idea of mutation is to introduce

new genetic material into the population by altering an individual’s chromosome

in some random fashion. Commonly used mutation operators are ones that flip

eachgeneinachromosome independently withsomeprobabilityp m , orflipafixed

number of randomly selected genes; Figure 3.1 visualizes the interplay between

(one-point) crossover and (bit flip) mutation. Different variation operators may

be applied depending on the encoding of solutions. For a more comprehensive

introduction to EAs including an overview of different selection and variation

operators please refer to e.g. Bäck (1996); Bäck et al. (1997); Eiben and Smith

(2003); Reeves and Rowe (2003).

3.1.1 Analyzing and explaining evolutionary algorithms

Intheabove, wehave takenaratheroperationalviewtointroduceEAs. However,

we did not answer the question of how EAs work under the hood or how new, on


50 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

average, fitter individuals are evolved. There are several alternative viewpoints

on this subject and we introduce some of them in the following; for a more

comprehensive overview please refer to e.g. Reeves and Rowe (2003).

The schema theorem of Holland (1975) was the first attempt to explain EAs,

in particular genetic algorithms (GAs), from a theoretical point of view. This

theorem is based on the concept of schemata, which are subsets of solutions

that share some common properties. For instance, consider solution vectors to be

binary strings of length l = 5. Then, the schema H = (∗1 ∗ ∗0) describes the

set of all solutions with value x 2 = 1 at gene 2 and value x 5 = 0 at gene 5; the

∗ is a wildcard symbol, meaning that a gene can have any possible value (thus

in the binary case either value 0 or 1). The two properties of a schema are its

order o(H) and its length l(H), representing the number of defined genes and

the distance between the first and last defined gene position, respectively (Reeves

and Rowe, 2003). Thus, for the above example we have o(H) = 2 and l(H) = 3.

The schema theorem relies on the assumption of implicit (or intrinsic) parallelism

—theideathatinformationonmanyschematacanbeprocessedinparallel.

Based on this assumption, Holland argues that there are two competing forces in

the GA: the force of selection increases the representation of fit schemata in line

with their fitness, while the destructive force of crossover and mutation tends to

break schemata up, in particular schemata of long and high order. Technically,

we can define the theorem as

{ }

l(H)

E[N(H,t+1)] ≥ 1−p c

l −1 −p mo(H) r(H,t)N(H,t), (3.1)

where E[.] is the expectation, N(H,t) is the number of instances of schema H

at time t, p c and p m are the probabilities of applying crossover and mutation, respectively,

and r(H,t) the fitness ratio expressing the average fitness of schema H

relative to the average fitness of the population at time t. The theorem says that

the expected number of instances of a schema H at time t+1 is greater than at

time t if the schema is fitter than average, and its order and length are short. The

third idea related to the theorem is that the surviving schemata, which tend to be

fit schemata of short and low order, are recombined into larger schemata (building

blocks) of potentially higher fitness; Goldberg (1989) calls this idea the building

block hypothesis. Although Holland’s schema theorem has been sometimes criticized,

often because it was overinterpreted (Radcliffe, 1997), fundamental ideas


3.1. EVOLUTIONARY ALGORITHMS 51

Figure 3.2: The simplex represents the space of populations containing three individuals.

The flow indicates the movement of populations under the effect of selection using

a dynamical systems approach. The trajectory represents the movement of a single

population. Figure taken from Reeves and Rowe (2003).

have been proved to be valid. Some clarification of the schema theorem has been

given by the work of Altenberg (1995). He uses Price’s theorem (Price, 1970) to

show that macroscopic properties of the population, such as the population average

fitness, can be derived by accounting for properties related to the variation

and selection operators employed, and the encoding of solutions.

An alternative perspective on EAs is to view transitions between populations

as a stochastic process (Nix and Vose, 1992; Davis and Principe, 1993). In the

simplest case, the population at time t + 1 depends only on the population at

time t. This kind of stochastic process is referred to as Markov process, and the

sequence of population transitions forms a Markov chain. With this approach,

a transition matrix needs to be calculated to model the likelihood of an EA

moving from a population at time t to any other possible population at time

t+1 after applying the stochastic effects of selection, crossover and/or mutation.

Markov chains are a convenient tool for analyzing EAs, which we will consider in

Section 4.3 to investigate the impact of ephemeral resource constraints (ERCs)

on evolutionary search; this section will also give a more detailed introduction to

Markov chains.

A perspective on EAs related to Markov chains is to model EAs in terms

of a dynamical system (Whitley, 1993; Vose, 1995, 1999). In essence, dynamical

systems model the population as a state (expressed in terms of a probability

vector) whose trajectory through the state space (see Figure 3.2) evolves according

to deterministic, mathematical equations (defined by the effects of selection,

crossover and mutation). The trajectory of the population is exact in the infinite

population limit. However, as the number of equations to be solved depends on


52 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

the population size µ, in practice, this approach becomes impractical for large

µ. Techniques that approximate the working of an EA in terms of large-scale

or average properties of the system modelled, such as a statistical mechanics approach

(Prügel-Bennett and Shapiro, 1994; Rattray, 1996), can help circumventing

this complexity problem. Although, strictly speaking, this approach yields

approximate results only (Reeves and Rowe, 2003), predictions of EA dynamics

tend to be accurate, and this is also true for complex problems (Shapiro, 2001).

Variousotherperspective ontheworking principles ofEAscanbefoundinthe

literature. For instance, underlying working principles of an EA can be viewed

from an experimental design perspective (Reeves and Wright, 1995, 1996). The

perspective taken by Mühlenbein (1991) is that the schema theorem cannot be

used to explain the search strategy of a GA. Instead, he emphasizes the importanceofconstraining

selectionandmutationbyaspatial population structure, and

allow some or all solutions in the population to improve their fitness by local hillclimbing

(Mühlenbein, 1992); these concepts have been realized by Mühlenbein

et al. (1991) in the parallel GA. A perhaps more applied perspective is related

to how EAs balance exploration and exploitation necessary for effective optimization.

It is assumed that conventional EAs perform a highly explorative search at

the beginning of the optimization process, which, as the optimization progresses,

becomes a moreexploitative onefocusedonthe best solutions foundso far. When

balancing exploration against exploitation it is crucial to maintain a diverse population,

i.e. solutions with different properties, when performing an explorative

search (please refer to Section 3.4 for an overview of diversity-maintaining techniques).

However, there are several aspects that may complicate the process of

maintaining diversity in the population. Two common aspects are hitchhiking

and genetic drift, which we describe next.

Hitchhiking (Mitchell, 1996) describes the scenario where newly discovered

high-order schemata spread quickly through the population, slowing down the

discovery of schemata in other positions; especially those that are close to the

defined positions of highly fit schemata. Genetic drift (De Jong, 1975), on the

other hand, refers to the statistical drift of gene frequencies in a population due

to random sampling effects. It occurs because of the finite size of a population

and the stochastic choices made in selection, population updates and genetic

operations, whereby, duetoaccidentsofsamplingamongthesurvived individuals,

highly fit individuals may be lost from the population or the population may


3.1. EVOLUTIONARY ALGORITHMS 53

suffer from a loss of variety concerning some alleles. One effect of drift is that a

population might be pushed down a hill and finds itself in a less fit valley. Drift

in combination with selection enables a population to move uphill (through the

effect of selection) as well as downhill (through the effect of drift), and, of course,

whether the population will find its way back to the same hill is not guaranteed.

TheseconceptshavebeensuggestedbyWright(1932)andformpartofhisshifting

balance theory of evolution. Geographic isolation of individuals and speciation

are other parts of his model.

The choice of optimal parameter settings of EAs has also been approached

from several theoretical viewpoints. For example, Goldberg et al. (1992) suggest

thatalineardependence ofthepopulationsizeonthestringlengthisappropriate.

Empirical results of e.g. Grefenstette (1986), on the other hand, have indicated

that a population size of 30 is sufficient for many applications. The choice of the

mutation rate p m has also been the focus of many investigations. The theoretical

analysisofMühlenbein(1992)suggeststhatasettingofp m = 1/l(withlbeingthe

genotype length) is generally adequate. Similar theoretical analyses have been

conducted for the choice of the recombination rate and the selection pressure,

which refers to the bias towards selecting fit individuals. The general conclusion

is that the mutation and recombination rate should increase with the selection

pressure (Bäck, 1996; Ochoa et al., 2000).

Recent theoretical studies on EAs focus rather on estimating the running

time complexity of an algorithm as a function of the input space size l of a

problem (see e.g. thework of Doerret al. (2011);Auger and Doerr(2011); Oliveto

and Witt (2011)). Such analyses can be performed using, for example, a drift

analysis (Hajek, 1982; He and Yao, 2001).

Despite the drawbacks that some analytical methods might have, theoretical

investigations of EAs contribute to the understanding of evolutionary optimization

techniques. This in turn, helps us with the tasks of selecting and designing

good strategy variants and operators. In other words, an improved understanding

of evolutionary search allows for more effective tuning of EAs, a subject we

consider next.

3.1.2 Tuning evolutionary algorithms

The objective of tuning an EA is to enhance its search performance. This process

involves to select things like suitable operators and their parameter settings, a


54 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Parameter setting

before the run

(offline)

during the run

(online)

Parameter tuning

Parameter control

Deterministic Adaptive Self-adaptive

Figure 3.3: A classification of different parameter setting methodologies for EAs according

to Eiben et al. (1999). A similar classification can be made for other design

choices involved in EAs, such as operator selection, population size, etc.

representation of individuals, the population size and other EA parameters. We

will now review a number of techniques that facilitate this tuning process.

Before tuning of an EA can be commenced, however, the goal of the experiment

needs tobespecified, suchasdesigning amorerobust, fasterand/orcheaper

algorithm. This helps to identify the hypotheses to test, appropriate test problems,

and the performance measures to use. Let us assume that all these aspects

have been determined, and we are now faced with the design and tuning of an

efficient EA. We can adopt two fundamental methodologies to realize this tuning

step: (i) tuning offline before running an algorithm on the actual problem at

hand (this method is referred to as parameter tuning), and (ii) adapting parameter

values online during an algorithmic run (this method is referred to as parameter

control). Parameter control is sometimes further divided into deterministic,

adaptive, and self-adaptive approaches (Eiben et al., 1999); this classification is

also shown in Figure 3.3. In the following we will introduce commonly-used techniques

to each method but a more complete overview has been given e.g. by Lobo

et al. (2007); Fialho (2010); Eiben and Smit (2011).

A naïve technique for parameter tuning is to try out many different parameter

settings by hand and then select the best setup (De Jong, 1975). Such an ad hoc

approach, however, can be time-consuming and requires an a priori definition

of certain aspects, such as the number of different configurations to be tested

and the number of runs to be performed for each configuration. Statistical racing

methods (Maron and Moore, 1997; Birattari, 2005) aim at eliminating some

of these drawbacks. The idea here is to empirically evaluate a set of candidate

configurations and successively discard bad ones as soon as sufficient statistical


3.1. EVOLUTIONARY ALGORITHMS 55

Brute-force

Computational effort

Statistical racing

Parameter configurations

Figure 3.4: Comparison of the computational effort needed by a statistical racing

method (dashed region) and a brute-force approach (rectangular area) to find an optimal

parameter setting. While brute-force spends an equal amount of computation for

all parameter configurations, a statistical racing method discards poor configurations

as soon as sufficient statistical evidence is gathered against them. Figure reproduced

from Birattari (2005).

evidence is gathered against them (Birattari et al., 2002); a visualization of this

procedure is given in Figure 3.4. Another branch of techniques for parameter

tuning is inspired by the experimental design discipline introduced in Section 2.4.

Popular techniques adopted from this discipline include space-filling designs (Myers

and Hancock, 2001) and response surface methods (Coy et al., 2001; François

and Lavergne, 2001; Czarn et al., 2004; Bartz-Beielstein et al., 2004); in this context,

a response surface represents the performance of an algorithm as a function

of different parameter configurations. An approach that, similar to response surface

methods, aims at tackling the optimization of expensive functions has been

proposed by Corne et al. (2003); Rowe et al. (2006). The authors suggest to

model a landscape as a finite state machine, inferred from preliminary sampling

runs, and then run offline different EAs on this model to chose the most efficient

one. An early drawback of this approach was its limitation to algorithms

employing mutation only, but more recent versions of the technique also model

crossover (Rowe et al., 2010), allowing both mutation and crossover to be tuned

(in addition to population size and selection pressure). However, before (offline)

tuning can be started, one needs to conduct a certain number of (expensive)

evaluations on the real-world landscape to establish a sufficiently accurate model

of the landscape. Finally, some researchers, such as Grefenstette (1986); Nannen

and Eiben (2006); Branke and Elomari (2011), have formulated parameter tuning

as an optimization problem and employed an EA as a meta-algorithm to tackle


56 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

this problem. The use of multi-objective EAs is particularly convenient when the

aim is to find configurations that are optimal with respect to several performance

measures (Dréo, 2009; Smit et al., 2010).

A general property of methods for parameter tuning is that the parameter

configuration found (offline) is kept fixed during the optimization process of the

actual problem at hand. This property is a drawback if the dynamic of an evolutionary

search procedure is such that different parameter settings are more

suitable at different stages of the search procedure. As an example, one would

anticipate that a largemutation rateisbeneficial atthe beginning ofthe search to

facilitateexploration, while a small mutationratemight bemore suitable towards

theend ofasearch procedure topromoteexploitation ofpromising search regions.

Methods that control parameters (online) during a run account for such dynamic

scenarios. Forinstance, inthe motivating example given above, the mutationrate

may be changed according to some deterministic rule based on the evaluation or

generation counter (Holland, 1975; Fogarty, 1989) (deterministic parameter control).

More sophisticated techniques account for the state in which an algorithm

is, for example, by monitoring the population diversity or the fitness improvement

between generations, and then use these measurements to trigger changes

in the parameter settings according to some heuristic rule (Rechenberg, 1973;

Bäck, 1996) (adaptive parameter control); an example of an adaptive scheme is

thefamous“1/5success rule” ofRechenberg (1973). 1 Adifferent approach, which

has its origin in the evolution strategy community, is to consider parameters like

the mutation rate as additional genes in the chromosome and evolve them alongside

with the actual decision variables of a problem (Schwefel, 1977; Hansen and

Ostermeier, 2001; Beyer and Schwefel, 2002). This form of parameter control

can be seen as a self-adaptive scheme; the obvious advantage of this scheme is

that it does not require the algorithm designer to come up with a heuristic rule

according to which parameters are changed.

Anotherexampleofself-adaptiveparametercontrolisthedynamicselectionof

parameter configurations or operators based on their recent performance. Davis

(1989) developed the first approach of this kind. He assigned each operator a

probability of being selected in a reproduction process, and this probability was

proportional to the fitness of individuals created with a particular operator. That

1 This rule states that the ratio of successful mutations to all mutations should be 1/5; a

successful mutation is one where the offspring is fitter than its parent. If the ratio is greater

than 1/5, then the mutation step size should be increased, otherwise decreased.


3.1. EVOLUTIONARY ALGORITHMS 57

EA

AOS

Operator

selection

Impact

evaluation

Operator

Impact

(e.g. fitness

difference)

Operator

selection

quality op 1

quality op 2

.

Credit

assignment

Credit

(e.g. average

of fitness

differences)

Figure 3.5: The general scheme of an EA employing an adaptive operator selection

scheme: the EA asks for an operator and theoperator selection mechanisms returnsone

based on the empirical quality estimates of the operators; after applying the operator,

its impact is assessed (e.g. by the fitness difference between the generated individual

and its parent); the credit assignment scheme transforms the the impact into a credit,

which in turn is used to update the quality estimates of operators. Figure reproduced

from (Fialho, 2010).

is, operators that are more successful in creating fitter solutions are assigned a

greater probability for being selected in future. There is a variety of self-adaptive

parameter control approaches of this form, and, sometimes, they are also referred

to as adaptive operator selection (AOS) approaches (Fialho, 2010). AOS

approaches are defined by two concepts (see Figure 3.5): (i) the way credit is assigned

to operators upon generating an offspring (credit assignment mechanism),

and (ii) the way this credit is considered in the selection of an operator (operator

selection rule). The credit assigned to an operator can, for instance, be the fitness

difference between the generated offspring and its parents (Tuson and Ross,

1998; Lobo and Goldberg, 1997) or the current best individual (Davis, 1989). To

account for delayed credit effects, it has also been proposed to pass credit back

to operators involved in the generation of ancestors of the offspring (Davis, 1989;

Julstrom, 1995; Pearson et al., 2001). The operator selection rule first updates

the empirical quality estimates of operators based on the credits obtained (e.g.

by taking the average of credits obtained so far), and then uses these estimates to

select the next operator. Operators can be selected, for instance, with a probability

that is proportional to the quality estimates. Two popular operator selection

rules that follow this procedure are probability matching (Goldberg, 1990) and

adaptive pursuit (Thierens, 2005).

Other AOS approaches transform the operator selection rule into a multiarmed

bandit (MAB) problem (Mahajan and Teneketzis, 2008). A MAB problem


58 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

isanon-associative, purelyselectional learningproblemoriginallydesignedtotest

the ability of an algorithm to trade off exploration against exploitation (Robbins,

1952a); MAB problems are related to problems in the field of reinforcement learning,

which we introduce in Section 3.6. In a MAB problem one is faced with a

number of actions, each action being associated with an unknown reward distribution

from which a reward is drawn upon taking the action. The goal of the

algorithm is to maximize the sum of rewards received over a number of actions

taken. When considering operator selection as an MAB problem, the available

actions would correspond to the operators, and the reward to some credit associated

with the application of an operator. Fialho (2010) provides a comprehensive

review of AOS approaches that consider operator selection as an MAB problem.

In general, these approaches employ the same credit assignment schemes as other

AOS approaches. However, their operator selection scheme employs a bandit algorithm,

such as the upper confidence bound (UCB) algorithm (Auer et al., 2002).

We will introduce this algorithm in detail in Section 5.2.2, where we investigate

an AOS approach to deal with one of the resourcing issues (ephemeral resource

constraints) considered in this thesis.

Note that parameter tuning and control has not only been considered for EAs

but also for other metaheuristics, in particular local search algorithms. However,

although these metaheuristics share many common features with EAs, they are

not as deeply researched as EAs. In general, the field around self-tuning search

heuristicsisbecomingincreasinglypopularinrecentyears.Therecentself- ∗ search

workshop(Ochoaetal.,2010)andtrack(OchoaandSchoenauer,2011)heldatthe

international conferences PPSN and GECCO, respectively, are further evidence

of the burgeoning interest in this subject within the evolutionary computation

community.

The pure task of tuning an EA can be accompanied by the task of actually

trying to understand how different parameter configurations might affect search.

For instance, to investigate the effect of different representations one may look at

measures characterizing the problem difficulty (Rothlauf, 2006); examples of such

measures include the fitness-distance correlation (Jones and Forrest, 1995) and

the negative slope coefficient (Vanneschi et al., 2006). Distance measures that

account for the choice of operators have been analyzed e.g. by Altenberg (1997).

The concept of heritability (Mühlenbein and Paaß, 1996) is another measure

that has been employed in the analysis of operators. For detailed descriptions of


3.2. OPTIMIZATION IN UNCERTAIN ENVIRONMENTS 59

these concepts please refer to the references provided. Of course, an understanding

of the performance effect of operators and their parameter settings can also

be gained from a theoretical point of view; the previous section has introduced

various approaches that can be employed for this purpose.

Finally we want to remark that when tuning and comparing algorithms,

one must not forget that according to the no free lunch theorem (Wolpert and

Macready, 1997), all search algorithms have identical performance when their

performance is averaged over all possible search spaces. Although this implies

that there is no single general-purpose search algorithm, it also implies that,

over a restricted set or class of problems (one which is not closed under permutation)

(Schumacher et al., 2001; Igel and Toussaint, 2004), performance differences

must exist and therefore a best algorithm or set of algorithms must also exist.

In the following sections we will look at different problem classes, and successful

strategies employed to tackle these problems; we begin with problems subject to

some form of uncertainty.

3.2 Optimization in uncertain environments

Uncertainty in optimization can be due to a variety of sources. In this section, we

focus on sources related to noise, and the search for robust and reliable solutions.

3.2.1 Optimization subject to noise

Uncertainty in the form of noisy (inaccurate) fitness values is a common side

effect in problems where solutions are evaluated through physical experimentation

or computer simulations (e.g. Monte Carlo simulations). For instance, when

trying to find potent combinations of drugs, replicating the experiment of a drug

combination will yield variations in the fitness measurements even if identical

drugs and concentrations of drugs are used.

Noise can be attributed to a number of sources: Inaccuracies in realizing the

decision variables of a genotype, e.g. concentrations of drugs, can introduce noise

during the genotype-phenotype mapping, while measurement errors, human errors,

and sampling errors can introduce noise during the fitness measurement

process. Additional perturbations in the fitness may be due to uncontrolled environmental

factors, such as ambient temperature and deviations in the performance

of experimental equipment (e.g. instruments and robots).


60 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

f(⃗x)

⃗x local minimum ⃗x global minimum ⃗x

Figure 3.6: Visualization of the effect of a noisy fitness function f. The error bars

indicate the level of noise for different solutions ⃗x. There does not need to be a correlation

between the noise level and the fitness. The large noise level at the local

minima ⃗x local minima may mislead the search to this region and thus away from the

region containing the true optimal solution ⃗x global minimum .

Arguablythemost commonnoisemodel considered isadditivewhite Gaussian

noise (Beyer, 2000), where noise is added to the true fitness value according

to a Gaussian distribution N(0,σ) with zero mean and standard deviation σ.

Alternative noise models include uniform and Laplacian noise but these tend to

be less common in real-world systems.

The challenge associated with noise is that one cannot be sure whether an

improvement obtained by a decision variable change is a real improvement or the

effect of noise (see Figure 3.6). In the extreme case of large noise, there is no

useful selection information (because fitness improvements can be equally due to

parameter changes andnoise), causing the parameters to be optimized to perform

arandomwalk(Beyer,2000). However, noiseisnotalwaysbad. Acertainamount

of it might be helpful during the initial phases of the search (Rana et al., 1996)

as well as improve the reliability of convergence to the global optimum (Bäck

and Hammel, 1994; Levitan and Kauffman, 1995) (because noise can allow an

optimizer to escape from local optima).

The standardapproachforcopingwithnoisyfitness functionswithin EAsisto

repeat the evaluation of a solution several times (referred to as the sampling rate)

and then take the average over the fitness values to approximate its true fitness.

This approach is referred to as resampling (Fitzpatrick and Grefenstette, 1988)

in the evolutionary computation community; recall that the experimental design

discipline refers to the same approach as replication (see Section 2.4). Obviously,

an increase in the sampling rate results in a more accurate fitness approximation

but this comes at the cost of a larger number of (potentially expensive)


3.2. OPTIMIZATION IN UNCERTAIN ENVIRONMENTS 61

evaluations. Suggestions on how to set or adapt the sampling rate, such as increasing

the sample rate for individuals exhibiting high variances in their fitness

value, and more sophisticated techniques, have been proposed e.g. by Aizawa and

Wah (1994); Stagge (1998); Sano and Kita (2000); Branke and Schmidt (2004).

The alternative to obtaining highly accurate fitness values is to perform ‘rough’

evaluations but allow an EA to consider many more candidate solutions per generation;

this effect is achieved by an increase of the population size (Fitzpatrick

and Grefenstette, 1988). It is unclear which of the two alternatives, accurate or

rough evaluations, is more efficient, given a fixed number of fitness evaluations.

Theoretical and empirical studies analyzing the trade-off between the sampling

rate and the population size have been conducted e.g. by Fitzpatrick and Grefenstette

(1988); Beyer (1993); Bäck and Hammel (1994); Hammel and Bäck (1994);

Rattray (1996); Miller (1997); Beyer (2000).

The degree of the selection pressure (i.e. the degree to which fitter solutions

are favoured in parental selection) has a significant performance impact in the

presence of noise too. Several researchers, including Miller (1997); Beyer (2000),

have shown that fitness proportionate selection is less sensitive to the level of

noise than tournament selection. On the other hand, selection schemes with high

selection pressure, such as tournament selection with relatively large tournament

sizes (Beyer (2000)considers atournament sizeof5), canyield betterfinalresults.

Some researchers suggest to diverge from common parental and environmental

selection schemes in the presence of noise. Markon et al. (2001), for instance,

suggest to replace a parent by its offspring only if the offspring is fitter than

some pre-defined threshold. On the other hand, Hughes (2001) suggests not to

select the individual that has a higher mean fitness than some other individual

but the one that has a lower probability of being worse than some other individual.

In other words, Hughes recommends to minimize the probability of making

a wrong decision. He analyzed this approach in a multi-objective optimization

setup. Unfortunately, the number of studies considering noise and other forms

of uncertainty within a multi-objective optimization is limited, and so is also our

understanding in this domain.

The use of elitism to keep track of the best solutions found does not seem

to improve the convergence behavior of an EA in the presence of noise; at least

for some versions of evolution strategies (ESs) (Beyer, 1993, 2000). The reason

is the risk of overemphasising elite solutions, which may misguide the search.


62 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Figure 3.7: A (1,λ)-ES with λ = 5 using rescaled mutations. The five (gray) offspring

candidate solutions are generated by mutating the (black) candidate solution in the

center. All offspring are inferior to their parent. The parent of the next generation

is constructed by taking a reduced mutation step in direction of the best offspring

(circled). Figure reproduced from Arnold (2005).

To circumvent this issue, Büche et al. (2002) looked at various re-evaluation

techniques and strategies that limit an individual’s lifetime.

A noise-handling approach that has its origin in the ES community is the

method of rescaled mutations (Rechenberg, 1994). The idea of this approach

is to make large mutation steps when generating offspring in order to cancel

out the effect of noise and thus increase the chance of selecting the single, truly

best offspring (see Figure 3.7). Since the mutation steps are large, the offspring

are likely to have a poorer fitness than the parent. The hope is that a reduced

(rescaled) step towards the best offspring will yield a solution that is fit enough

to replace the current parent and be carried over to the next generation. The

obvious challenge with this approach is to appropriately set the rescaling factor;

theoretical andempirical work addressing thischallenge include e.g. (Beyer, 1998,

2000; Arnold, 2005).

Several other noise-handling strategies have been proposed over the years.

The interested reader is referred to Jin and Branke (2005) for a review.

3.2.2 Searching for robust and reliable solutions

Let us now have a closer look at uncertainties associated with the search for robust

and reliable solutions. We can define robust solutions as ones that are fitness

insensitive to small perturbations in the decision variables or environmental conditions

(e.g. manufacturing tolerances), and reliable solutions as ones that remain

feasible or safe under such variation (Branke, 2001; Knowles, 2009). Perturbations

in the optimization environment may result in a changing fitness landscape

and thus cause a shift in the optimal solution. It is the general focus of the field


3.2. OPTIMIZATION IN UNCERTAIN ENVIRONMENTS 63

of dynamic optimization (see Section 3.4) to track these shifting optima. However,

if shifts in global optima are random and small, one may prefer to seek a

fit robust solution instead of tracking an optimal solution that may be located

on a shaky peak (Branke, 2001). To search for robust solutions one can apply

similar strategies as for noisy optimization, due to the similarities between the

two optimization tasks (Jin and Branke, 2005): Taguchi’s robust design methods

are a popular approach originating from the experimental design discipline, while

averaging techniques, population up-sizing, and other approximation techniques

are more common within EAs. For an overview of approaches to robust optimization

and robustness measurement techniques please refer to Beyer and Sendhoff

(2007).

The search for reliable solutions is a task that has been considered only recently

by the evolutionary computation community. In fact, this task may arise

only when the optimal solution is located close to the constraint boundary. The

commonapproachtakeninthisdomainistoassumethattheuncertaintyofdesign

variables and problem parameters is known in form of some probability distribution.

This assumption allows then to transform the constraints into probabilistic

constraints, and consider a solution as feasible if its probability of satisfying each

constraint is greater than some pre-defined threshold. The advantage of this approach

is that the constraints can be considered as deterministic, and standard

techniques for coping with constraints (see Section 3.3) can be applied to find

reliable solutions. However, commonly one is interested in the overall reliability

(joint probability) of a solution in terms of satisfying all constraints rather than

each single constraint on its own. It remains a challenge to compute this joint

probability, and several approximation methods have been proposed to do this

calculation. The interested reader is referred to Deb et al. (2009) for an overview

of these techniques and the application of EAs to reliability-based optimization.

3.2.3 Further forms of uncertainties in optimization

There are two more main sources that can introduce uncertainty during an optimization

process. The first is related to the problem of tracking shifting optima,

mentioned above in the context of robust solutions. Here, uncertainty is due to

the fact that it is often unknown when the environment changes, and in which direction

the optimal solution shifts upon an environmental change. It is the focus

of the dynamic optimization field to deal with these issues, and we will introduce


64 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

False minimum

Figure 3.8: Visualization showing how an approximation error in a meta-model may

lead to a false minimum. The solid line represents the true fitness function, the dashed

line the approximated function, and the six dots the sample points. Figure reproduced

from Jin (2005).

this field in more detail in Section 3.4.

Another common source of uncertainty is associated with the application of

response surface techniques (also known as surrogates or meta-models in the

contextofevolutionarysearch). Rememberthatthemainpurposeofmeta-models

is to approximate the fitness function when fitness evaluations are expensive (see

Section 2.5 and 2.6). However, whilst evaluating the real fitness function returns

the true fitness of a solution, albeit sometimes with noise, fitness values taken

from the meta-model may be subject to approximation errors (see Figure 3.8).

For a review of techniques for coping with such approximation errors please refer

to e.g. Jin et al. (2002); Ong et al. (2004).

Having gained an overview of different forms of uncertainty that may be encountered

in optimization, we can generally conclude that in the presence of

limited evaluation numbers one has to accept a lower solution quality. The question

that arises is whether it is worth risking expensive evaluations to find the

best optimal solution, or is it rather advisable to search for a robust or reliable

but perhaps lower quality solution; work addressing this question include e.g. (Jin

and Sendhoff, 2003). Although many real-world problems are subject to different

forms of uncertainties simultaneously, there is unfortunately only little work in

the literature considering approaches for such problems.

3.3 Constrained optimization

So far, we did not pay much attention to how constraints onthe decision variables

are dealt with during an evolutionary search. Roughly speaking, an optimization

problem may be subject to two types of (static) constraints: box constraints

(also known as parametric constraints or domain constraints) on individual decision

variables, which are usually always present, and a set of linear and/or


3.3. CONSTRAINED OPTIMIZATION 65

non-linear constraints that further restricts the values of the decision variables.

The issue when applying EAs to constrained problems is that (simple) search operators

(and encodings) are blind to constraints and thus do not guarantee that

feasible parent individuals will provide feasible offspring. This issue led to the

development of many different constraint handling methods of which we briefly

discuss some, but for a more detailed description we refer e.g. to Michalewicz

and Schoenauer (1996); Coello (2002); Mezura-Montes (2009). Although dynamic

constraints, such as ERCs considered in this thesis, are materially different to

standard (linear and non-linear) constraints, as we shall see later, it is nevertheless

informative to consider how constraints in EAs are usually handled.

Methods based on preserving feasibility of solutions. Some methods

falling into this category preserve feasibility by making use of special representation

schemes (Bean, 1994) and/or special operators (suitable for these schemes)

(Michalewicz and Nazhiyath, 1995; Davis, 1991). For instance, Davidor (1991)

used an EA with a variable-length representation to generate trajectories of a

robot; and, to cope with this representation he designed an analogous crossover,

which defines crossover points in parent chromosomes based on their phenotypic

similarities. Other methods in this category utilize decoders (Koziel and

Michalewicz, 1999; Kime and Husbands, 1997), which aim at mapping the original

feasible region into another easier to optimize space, or scan the boundary

of the feasible region using some strategy that coordinates jumps between the

feasible and infeasible region (Glover, 1977; Schoenauer and Michalewicz, 1996).

Notethatthelatterapproachreliesonthe(oftentrue)assumption thattheglobal

optimal solution is located at the boundary of the feasible region. The drawback

of some of the strategies falling into this category is that they require an initial

feasible solution. Michalewicz and Nazhiyath (1995), for instance, find this solution

by iteratively generating random solutions in the search space until a feasible

one is found. It is easy to see that this method can be time-consuming for applications

where the feasible region is relatively small compared to the infeasible

region.

Methods based on penalty functions. Although preserving feasibility is

sometimes preferred, most evolutionary computation techniques use constrainthandling

techniques that are based on the concept of (exterior) penalty functions,

which in some way penalize infeasible individuals. The degree of the penalty


66 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

often depends on either the distance of an infeasible individual from the feasible

region or on the effort to repair an infeasible individual. Different designs

have been proposed for maintaining an appropriate balance between objective

and penalty function. The simplest designs always maintain a constant penalty

strength (static penalties) (see e.g. Homaifar et al. (1994); Michalewicz (1995)),

or simply reject an infeasible individual and then generate new individuals until

a feasible one is found (death penalties) (Rechenberg, 1973; Schwefel, 1975;

Bäck et al., 1991). Other designs use a more sophisticated approach and increase

the penalty strength gradually over time (dynamic penalties) (Joines and Houck,

1994; Kazarlis and Petridis, 1998), use concepts inspired by simulated annealing

(Kirkpatrick et al., 1983) (annealing penalties) (Michalewicz and Attia, 1994;

Carlson and Shonkwiler, 1998), or change the penalty strength depending on the

quality of individuals in the population (adaptive penalties) (see e.g. Bean and

Hadj-Alouane (1992)). And again another interesting approach, which is known

as segregated genetic algorithm (Riche et al., 1995), aims at approaching the feasible

optimal solution from both sides of the boundary of the feasible region by

evolving a population that maintains roughly two subpopulations, one containing

(more likely) infeasible individuals and the other one (more likely) feasible individuals.

Recently, a (static) penalty function based method has been proposed

that ranks individuals stochastically according only to their penalty function or

to their fitness values (Runarsson and Yao, 2000). To reduce the number of expensive

evaluations, Runarsson (2004) extended the stochastic ranking method

further by fitting a response surface to both the penalty and the fitness function.

Methods based on a search for feasible solutions. There are three main

techniques fallingintothiscategory. Onetechnique, knownasbehavioral memory

approach (Schoenauer and Xanthakis, 1993), tackles the constraints of a problem

one after another whereby it is always ensured that a new constraint is only

tackled if there is a sufficient number of individuals in the population satisfying

all previously tackled constraints. A second group of techniques (see e.g. Powell

and Skolnick (1993)) are similar to penalty functions and also penalize infeasible

individuals but at the same time strictly favor feasible individuals over infeasible

ones. The third main technique in this category incorporates the concept of repairing

infeasible individuals. To date, many different repairing schemes as well

as replacement strategies for dealing with the repaired (feasible) individuals have

been proposed (see e.g. Liepins and Vose (1990); Nakano and Yamada (1991);


3.3. CONSTRAINED OPTIMIZATION 67

Liepins and Potter (1991); Nakano (1991); Orvosh and Davis (1994)); we will describeonerepairingprocedure,

whichisemployed bytheGENOCOPIIIapproach

of Michalewicz and Nazhiyath (1995), in more detail in Section 3.4.1. Typically,

a replacement strategy is concerned with the question whether evolution should

consider only the fitness value of a repaired solution (Baldwinian evolution), or

thefitnessandalsotherepairedsolution(genotype)itself(Lamarckianevolution).

For a recent survey of repairing methods we refer to Salcedo-Sanz (2009).

Hybrid methods. Thisgroupofconstraint-handlingmethodscontainsEAsthat

are combined with other optimization techniques, which tend to be numericalbased.

Hybrid methods include, amongst others, EAs that make use of Lagrange

multipliers (see e.g. Adeli and Cheng (1994)), co-evolutionary models (Paredis,

1994), and concepts of ant colony optimization (Bilchev and Parmee, 1996).

Methods based on multi-objective optimization. Recently, the application

of multi-objective optimization methods to constrained problems has gained popularity.

Typically, these methods regard each constraint as an objective, which

are then optimized alongside the actual objective function using techniques from

multi-objective optimization (Montes, 2004). A comprehensive survey of multiobjective

EAs and their application to constraints has been given by Deb (2001).

Considering the above constraint-handling methods, it is apparent that infeasible

individuals (though unwanted) are often allowed to coexist with feasible

individuals. The reason for allowing infeasible individuals to enter a population is

that it may increase the probability of finding even fitter individuals, particularly

on complex problems with rugged fitness landscapes and disjoint feasible regions.

When dealing with problems where individuals violating a constraint cannot be

evaluated, such as problems subject to ERCs, these techniques need to be modified

or replaced by other techniques. Two simple techniques that do allow the

evaluation of feasible individuals only are the death penalty and repairing. Alternatively,

one can avoid evaluating an infeasible solution altogether and simply

assign a poor fitness value to it. We will investigate the performance of these

methods later in the presence of ERCs. Unfortunately, there is still little work

on guidelines suggesting how to select the most appropriate constraint-handling

technique and its parameters for a certain application.


68 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

f(⃗x)

t t+1

f(⃗x)

⃗x

⃗x global optimum ⃗x global optimum

⃗x

Figure 3.9: Visualization of how the location of the current best solution ⃗x global optimum

at time t (left) may shift at time t+1 (right) due to a change in the fitness function

(solid line). This change requires the population (indicated by the black dots) to track

the new optimal solution.

3.4 Dynamic optimization

The field of dynamic optimization is traditionally concerned with problems that

are subject to stochastically changing fitness functions (landscapes). Reasons for

a change in the fitness landscape can be, for instance, that new jobs to be scheduled

arrive in a dynamic job shop scheduling problem, or that new cities to be

visited have to be considered in a dynamic TSP. The natural objective when

dealing with dynamic problems is to track the potentially shifting optima upon

a fitness landscape change (see Figure 3.9). The book of Branke (2001) provides

a comprehensive overview of research on how EAs can be used to achieve this

objective. Let us briefly review some seminal work in this domain.

Tracking ofshifting optimaisinitiatedby achangeinthefitness landscape. In

applications like dynamic job shop scheduling and dynamic TSP (Larsen, 2000),

it is explicitly known when the landscape undergoes a change. However, there

are also applications where the point of time of an environmental change is not

known explicitly. As an example, consider deviations in the quality of raw materials

or wearing out of machines, both factors being common in engineering

and manufacturing applications. Gradually or more suddenly, optimal operating

conditions may shift unexpectedly in this case. Such applications require a mechanism

that detects changes in the fitness landscape (and subsequently triggers

the tracking of optima). A commonly-used mechanism to detect environmental

changes is to watch out for deteriorations in the time-averaged best performance

of the population (Cobb, 1990; Cobb and Grefenstette, 1993; Vavak et al., 1996).

Alternatively, one may re-evaluate several individuals at each generation, and assume

anenvironmental change when thefitness ofatleast oneofthese individuals


3.4. DYNAMIC OPTIMIZATION 69

has changed (Branke, 1999; Carlisle and Dozier, 2000).

Various strategies have been proposed to track potentially shifting optima

upon detecting an environmental change. A naïve one is to restart the optimization

process from scratch (Raman and Talbot, 1993; Kidwell and Cook, 1994). In

essence, thisstrategyturnsadynamicoptimizationprobleminto aproblemwhere

a sequence of independent static optimization problems is to be solved. Although

being simple, this strategy can be effective if the fitness landscape undergoes a

severe change; we will demonstrate this later in Section 6.1.5. However, most

of the time, a restart policy is rather impractical as reusing genetic information

obtained in the past can help to track optima quickly. There are many strategies

that aim to tackle exactly this drawback of the restart policy. Roughly speaking,

we can divide the working procedure of these strategies into three groups: diversity

control, memory-based and multi-population approaches. We introduce each

approach in more detail in the following.

Diversity control approachesaccountforachangingenvironmenteitherby

maintaining adiverse populationthroughoutthesearch, orbyincreasing diversity

whenever achangeinthelandscapeisdetected. Techniques fallingintotheformer

category include the set of niching techniques (Mahfoud, 1995). These techniques

are also often augmented on EAs to tackle static optimization problems because

they are supportive in detecting multiple optima. Examples of niching techniques

include fitness sharing (Goldberg and Richardson, 1987) and crowding (De Jong,

1975; Mahfoud, 1995). Sharing aims to discourage over-populated regions in the

searchspace. Itachieves thisbyreducing thefitness ofindividuals byafactorproportional

to the number of individuals located in the same region. Crowding aims

to achieve a similar objective but does this by allowing offspring to replace one of

their parents only if they are less-fit and similar to the offspring; Algorithm 3.2

shows the pseudocode of a simple but effective crowding procedure, deterministic

crowding. Another approach to constantly maintaining a diverse population is to

replace, at each generation, some part of the population with randomly generated

individuals, also referred to as random immigrants (Grefenstette, 1992).

A common procedure for introducing diversity into the population upon detecting

an environmental change is to temporarily increase the mutation rate.

While the method of Cobb (1990), triggered hypermutation, increases the mutation

rate drastically for a single generation, the variable local search method


70 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.2 A single generation of deterministic crowding (Mahfoud, 1995)

Require: f (objective function), µ (parent population size), Pop (current population)

1: for i = 1 to µ/2 do

2: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents, ⃗p (1) and ⃗p (2)

randomly from Pop, and then recombining and mutating them

3: if (distance(⃗p (1) ,⃗x (1) ) + distance(⃗p (2) ,⃗x (2) )) ≤ (distance(⃗p (1) ,⃗x (2) ) +

distance(⃗p (2) ,⃗x (1) )) then

4: if f(⃗x (1) ) > f(⃗p (1) ) then replace ⃗p (1) with ⃗x (1)

5: if f(⃗x (2) ) > f(⃗p (2) ) then replace ⃗p (2) with ⃗x (2)

6: else

7: if f(⃗x (2) ) > f(⃗p (1) ) then replace ⃗p (1) with ⃗x (2)

8: if f(⃗x (1) ) > f(⃗p (2) ) then replace ⃗p (2) with ⃗x (1)

of Vavak et al. (1996) increases the range of mutation (e.g. the number of decision

variables mutated) slowly, depending on the search progress made after

an environmental change. Several researchers including Angeline (1997); Bäck

(1998) have also investigated the application of self-adaptive mutation rates (see

Section 3.1.2) for dynamic optimization.

The idea of memory-based approaches is to augment an EA with the

ability to store genetic information, which has been of high-quality in the past,

and retrieve this information at some later stage of the optimization. Such an

approach can be beneficial when parts of the environment/problem can return to

a formerly visited state in which case the retrieved information may be used to

speed up tracking of optima. Memory can be used in two forms (Branke, 2001):

implicitly or explicitly. Implicit memory is provided through the use of a redundant

representation. A popular approach is to use a diploid representation, i.e.

one where each chromosome possesses two or more genes at each locus (one dominant

and one recessive), combined with a dominance scheme that governs the

expression of each allele in the phenotype (Goldberg and Smith, 1987; Ng and

Wong, 1995; Hadad and Eick, 1997). A potential drawback of implicit memory

is that the information stored may not be used in an efficient way. Approaches

employing explicit memory aim at overcoming this drawback. For instance, Trojanowski

et al. (1997) equips each individual of the population with the ability to

remember the genotype of a fixed number of its ancestors. These ancestors are

then re-evaluated upon an environmental change, and the fittest of them replaces

the current individual if it is fitter. Whilst this is a simple example of an EA with

explicit memory, it highlights some of the challenges associated with this form


3.4. DYNAMIC OPTIMIZATION 71

of memory: it requires the algorithm designer to define what information should

be stored in the memory, and what information should be retrieved under which

circumstances.

The purpose of multi-population approaches is to evolve multiple populations

simultaneously and thus explore multiple promising regions in the search

space at the same time. Niching approaches have a similar purpose but they

do not divide the population explicitly into multiple subpopulations. One of the

early multi-population approaches is the shifting balance GA of Oppacher and

Wineberg (1999). Inspired by the shifting balance theory of Wright (1932) (see

Section 3.1.1), this GA splits the population into one large core population and

a number of small colony populations (also referred to as demes). The purpose

of the demes is to facilitate genetic drift, which was postulated by Wright to

be necessary in the optimization of a fitness landscape. The core population is

responsible for exploiting a promising search region, and it sends (forces) the

colony populations out to explore new areas in the search space. Enforcement

is achieved by a mechanism that pushes a colony away from the core whenever

the two populations are too close to each other. In regular intervals, the core

gains information about the colonies through immigration of colony members,

allowing the core to shift its search focus to potentially more promising regions.

A potential downside of multi-population approaches is the definition of several

properties including the number and size of the different populations, the migration

scheme, and the merging/‘forking’ scheme of populations. These properties

tend to interact in sophisticated ways in modern multi-population approaches,

such as the self-organizing scouts approach of Branke et al. (2000).

3.4.1 Dynamically-constrained optimization

Recently, evolutionary search has also been considered for continuous problems

subject to dynamic constraints (Nguyen, 2011; Richter and Dietel, 2011; Richter,

2010). The purpose of these constraints is to simulate changes in the feasible

search region occurring as a function of the evaluation or generation counter.

As these constraints yield a moving feasible region, the location of the global

optimum can move as well. Notice that constraints that prevent the temporary

evaluation of solutions only, such as ERCs, do not cause a shift in the feasible

region nor in the global optimum; (ERCs can cause a shift in the set of evaluable


72 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

solutions only but this will become clearer in the next chapter).

Similarly with static constraints (see Section 3.3), maintaining infeasible solutions

in the population of an EA seems an important aspect in the presence of

dynamic constraints. In fact, Singh et al. (2009) ensure that some fraction of the

population consists of (fit) infeasible solutions by favouring them over feasible

ones in environmental selection. The resulting mixed population can accelerate

search along the constraint boundary.

An interesting approach for dealing with dynamic constraints has been proposed

by Nguyen and Yao (2009a). Rather than tracking the moving global optima,

it is proposed to track the moving feasible region by maintaining always

a reference set of feasible solutions. Solutions belonging to this set can then

be involved in the repairing procedure of infeasible solutions. To repair an infeasible

solution, Ngyuen and Yao borrow the repairing procedure of GENOCOP

III (Michalewicz and Nazhiyath, 1995). This procedure employs two stages: first,

a path between the infeasible solution and a feasible solution (from the reference

set) is created; then, along this path, solutions are generated at random until a

feasible one is found, which replaces the original infeasible solution. Clearly, the

drawback of this repairing approach is that it can be a time-consuming matter to

generate a feasible solution (at random). To ensure the existence of a reference

set of feasible solutions, solutions belonging to it are checked for feasibility every

generation and repaired (using the same approach as above) if needed. Unfortunately,

the authors do not specify how to proceed in the case where all solutions

of the reference set become suddenly infeasible. This situation does not occur in

GENOCOP III because the algorithm is designed for static constraints with the

assumption that the population contains at least one feasible solution.

Clearly, our understanding of the effect of dynamic constraints on optimization

is still limited, and more research is needed in this emerging field. The recent

studies of Jin et al. (2010); Oh et al. (2011) suggest that understanding the dynamics

associated with changing constraints can also be beneficial when dealing

with problems featuring naturally static constraints. The objective of these studies

is to efficiently tackle constrained problems; in particular, problems where the

constraints yield disconnected feasible regions. The authors propose a dynamic

approach that first builds anapproximate model for each constraint function, and

then uses this model to manipulate the feasible region. At the beginning of the

optimization, the model enlarges the original feasible region to facilitate a less


3.5. ONLINE OPTIMIZATION 73

constrained search. As the search proceeds, the feasible region is shrunk incrementally

to eventually match the original feasible region. The hope is that this

procedure enables an optimizer to jump more easily between the originally disconnected

feasible regions. Of course, the assumption made here is that solutions

that are infeasible to the original problem can be evaluated.

We will see later that ERCs are of dynamic nature too. In fact, rather than

being only changed based on some counter or periodically, ERCs may arise due

to some random event (e.g. a machine breakdown), or uncontrolled factors (e.g.

delays in the delivery of raw materials required in an experiment). Perhaps more

interesting is the scenario where ERCs arise due to previously made decisions.

Such time-linkage scenarios are common in online optimization, which we will

consider in more detail in the next section.

3.5 Online optimization

Online optimization deals with problems where, as time goes by, decisions have to

be made without any knowledge of future events (Borodin and El-Yaniv, 1998).

Generally, it is assumed that decisions cannot be revoked and thus that timelinkage

effects (Bosman, 2005) may arise (we explain this effect in more detail

below). For instance, in the online version of a TSP one needs to decide how

vehicles should be routed to serve a new customer; and, in caching problems it

needs to be decided which page should be removed from the memory (cache) if

the cache is full. An online algorithm makes these decisions based on past events

such that some cumulative score, e.g. miles driven by a vehicle or number of page

faults, is optimized.

Before we say something more about online algorithms, we want to describe

how the performance of these algorithms is measured in this domain as this is

different from standard optimization. Is it easy to see that the quality of an

online algorithm depends on the sequence of events it is faced with during an

optimization run. The standard approach to measure this quality is to compare

the performance of an online algorithm on each event sequence with that of

an optimal offline algorithm, which has complete knowledge of the future. An

online algorithm is said to be competitive when the ratio (also known as competitive

ratio) between its performance and that of the optimal offline algorithm


74 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

is bounded. This performance analysis method is known as competitive analysis

(Sleator and Tarjan, 1985), and it falls within the framework of worst-case

complexity (BorodinandEl-Yaniv, 1998). Alternatively, competitiveanalysis can

be viewed as a game between an online player and a malicious adversary. The

adversary (optimal offline algorithm) has knowledge about the online algorithm

run by the online player, and it aims to construct an event sequence that is costly

to process for the online player but inexpensive for the optimal offline algorithm.

An aspect worth mentioning is that randomized online algorithms, i.e. algorithms

that involve some random component in the decision making process to an event,

are more difficult to fool by an adversary than deterministic online algorithms. 2

The application of EAs to online optimization is not new. In fact, dynamic

optimization problems are tackled in an online fashion if the dynamical changes

are unknown beforehand. Note that an event sequence consists here of different

fitness landscapes and/or constraints rather than a sequence of page requests to

be dealt with. A challenge that has received less attention in the field of evolutionary

online (dynamic) optimization are time-linkage effects (Psaraftis, 1988;

Branke and Mattfeld, 2005; Bosman, 2005; Nguyen and Yao, 2009b), as Bosman

termed it. A time-linkage effect refers to a scenario where decisions made now

may influence the environment and score that can be obtained in future; in turn

this may affect the cumulative score that ought to be optimized. A simple example

of a time-linkage effect is that visiting customers on time in a routing problem

increases customer satisfaction and thus may result in additional customers (in

certain geographical regions) in future.

A promising approach to cope with time-linkage effects seems to be to anticipate

future changes. For instance, Branke and Mattfeld (2005) realizes anticipation

by searching for solutions that are not only fit but also flexible in the

sense that they can easily be adjusted to upcoming environmental changes. For a

dynamic job-shop scheduling problem where new jobs arrive stochastically during

the course of the optimization, the authors showed that the avoidance of early

idle times can support the flexibility of a solution. Consequently, to facilitate

the search for fit and flexible solutions, they extended the objective function to

additionally penalize early idle times.

TheapproachtakenbyBosmanandhiscollaboratorsBosman(2005);Bosman

2 There are subtleties associated with how competitive analysis should be carried out and

interpreted in the presence of randomized algorithms that we will not go into here.


3.5. ONLINE OPTIMIZATION 75

Reward

Current

time step

Past

Future

t

Decision events

Figure 3.10: A schematic illustration of the effect of anticipation. Anticipation may

help to make a decision that seems poor at the current time step but better in the long

run (see path of gray dots). Figure reproduced from Bosman and Poutré (2007).

and Poutré (2007); Bosman (2007) to tackle online time-linkage problems is to

predict (or anticipate) the futureby learning fromthepast. Technically speaking,

“the value of the dynamic objective function for future time steps is estimated by

minimizing the generalization error over the time period that contains the data

to learn from” (Bosman, 2005). The estimation is then used to select the search

path with the highest objective function value (see Figure 3.10). Hutzschenreuter

et al. (2010) applied this approach successfully to hospital resource management,

where the task is to efficiently allocate resources to units of a hospital. In this

application, the hospital environment was modelled by parametrized pathways of

groups of patients (Hutzschenreuter et al., 2009). Anticipation was realized by

simulating differentresourceallocationpoliciesonthismodeloffline(optimization

of policy parameters was done using an EA) and then making allocation decisions

onlinebasedonthepolicythatwasoptimalwithrespecttosomefuturetimespan.

Later, when we deal with ERCs, we will encounter time-linkage effects too.

However, there is a crucial difference between problems subject to ERCs and

standard online dynamic problems, such as dynamic job-shop scheduling and the

hospital resource management problem described above: while the objective in

standard online (dynamic) optimization is to optimize some cumulative score,

the objective in problems subject to ERCs is to find a single optimal solution.

Optimizing some cumulative score is also the goal in the field of reinforcement

learning (RL), which we introduce next, as a prelude to explaining how RL can

be used to help tune EAs for challenging domains, such as problems subject to

dynamic constraints.


76 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

state s t reward r t

Agent

action a t

Environment

Figure 3.11: The agent-environment interaction in reinforcement learning: at each

time step t, the agent receives some representation of the environment’s state s t and

the reward r t for reaching this state, and on that basis it selects an action a t .

3.6 Reinforcement learning

The field of reinforcement learning (RL) offers a variety of ideas and algorithms

that have been employed in the evolutionary computation community for tuning

and controlling of EA operators and their parameters. We give a brief introduction

to RL here, and review some work investigating evolutionary search in

combination with RL in the following section. The interested reader is referred

to the book of Sutton and Barto (1998) and the survey of Kaelbling et al. (1996)

for a more complete overview of the RL field.

Generally speaking, RL is a computational approach to learning from interaction

of an agent with an environment whereby the aim of an RL agent is to

learn some optimal policy π ∗ , a mapping from an environmental state s t ∈ S to

an action a t ∈ A(s t ) (see Figure 3.11), so as to maximize some discounted return:

R t =

∞∑

γ j r t+j+1 , (3.2)

j=0

where r t+1 ∈ R is a numerical reward received after taking action a t in state

s t , and γ (0 < γ ≤ 1) is the discount rate. The discount rate determines the

value of future rewards at the present time step. As the value of γ decreases,

the importance of future rewards reduces so that in the case where γ ≈ 0 the

RL agent is myopic and aims only at maximizing the immediate reward. In

learning problems where the agent-environment interaction can be divided into

sub-sequences (known as episodes), such as single runs of an EA, an agent aims

at maximizing the return at the end of each episode. Let us define the above

mentioned concepts including the assumptions made in RL in more detail.

The common assumption made in RL is that environmental states satisfy the


3.6. REINFORCEMENT LEARNING 77

Markov property, meaning thatthe stateand rewardat time t+1depends only on

the state and action at time t (rather than on the complete history of an agent),

or:

P(s t+1 = s,r t+1 = r|s t ,a t ,r t ,s t−1 ,a t−1 ,...r 1 ,s 0 ,a 0 ) = P(s t+1 = s,r t+1 = r|s t ,a t ).

(3.3)

It is common to assume the Markov property even when states do not summarize

all the relevant information accurately (Sutton and Barto, 1998). A learning task

that satisfies the Markov property is referred to as a Markov decision process

(MDP).MDPsextendMarkovchains(seeSection4.3.1)withthenotionofactions

and rewards. Equivalently, we can think of MDPs that feature a single action in

each state, andareward of zero, asMarkov chains. To keep the number of actions

and states in an MDP manageable, especially with respect to tabular-based RL

algorithms (of which more later), it is common to quantize the state and action

space if they are continuous.

In the presence of an MDP, some policy π, and the corresponding probability

π(s,a) of taking action a when in state s, we can calculate the expected return

received after starting in some state s t and then following π according to

{ ∞

}


V π (s) = E π {R t |s t = s} = E π γ j r t+j+1 |s t = s , (3.4)

whereV π iscalledthestate-value function for policyπ. Inasimilarfashionwecan

define the action-value function for policy π, Q π (s,a) = E π {R t |s t = s,a t = a},

as the expected return received after taking action a in state s and following π

thereafter. We can say that a policy π is better or equal to another policy π ′

if it has an expected return that is greater than or equal to that of π ′ for each

state s ∈ S, or V π (s) ≥ V π′ (s),∀s ∈ S. The goal of an RL agent is to find

(ideally quickly and reliably) a policy π ∗ that is at least as good as any other

policy; π ∗ is referred to as the optimal policy. The corresponding state-value

function for an optimal policy, also called optimal state-value function, is defined

by V ∗ (s) = max π V π (s),∀s ∈ S; we can define the optimal action-value function

analogously by Q ∗ (s,a) = max π Q π (s,a),∀s ∈ S,∀a ∈ A(s).

Many RL agents search for an optimal policy by finding the optimal value

function. If we have complete knowledge of an environment as an MDP, i.e.

we know π(s,a),∀s ∈ S,a ∈ A(s), and the expected rewards associated with

j=0


78 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.3 Tabular TD(λ) for estimating V π

1: V(s) = 0, e(s) = 0,∀s ∈ S

2: for each episode do

3: initialize s

4: repeat

5: take action a = π(s), and observe reward r and next state s ′

6: update e(s) for all s ∈ S according to Equation (3.7)

7: update V(s) for all s ∈ S according to V(s) = V(s)+α(r+γV(s ′ )−V(s))e(s)

8: s = s ′

9: until s is terminal state

any state-action transition, then we can find an optimal policy using dynamic

programming methods (see Chapter 4 in the book of Sutton and Barto (1998)).

However, although many RL methods aim to achieve the same effect as dynamic

programming methods, such methods are limited in their use because a complete

model of the environment is rarely available and because they tend to be computationally

expensive; thus we do not consider dynamic programming methods

any further here.

In the common case where the environmental dynamics are unknown, the

agent estimates the value function from experience it gains by interacting with

theenvironment. Apopularapproachforlearning fromexperience isthetemporal

difference (TD) method of Sutton (1988). This method updates the value of state

s t immediately after visiting that state. After taking action a t in state s t and

subsequently observing a reward r t+1 and the new state s t+1 , the TD method

updates the estimate V(s t ) by

V t+1 (s t ) = V t (s t )+α(r t+1 +γV t (s t+1 )−V t (s t )), (3.5)

where 0 < α ≤ 1 is the learning rate. As α increases the estimate V t+1 (s t ) relies

more on r t+1 +γV t (s t+1 ) and less on its previous estimate V t (s t ). 3

Equation (3.5) is the update rule used by a specific TD method, referred to

as TD(0). A drawback of TD(0) is that updating a state-value estimate V(s)

only upon visiting the state s is rather inefficient and may require the agent a

long time to estimate the correct value function. A more general version of the

algorithm, TD(λ) (see Algorithm 3.3), updates V(s) not only upon visiting s,

but also upon visiting any subsequent state (during that episode). To realize this

3 The term temporal difference refers to the difference between the two estimates r t+1 +

γV t (s t+1 ) and V t (s t ).


3.6. REINFORCEMENT LEARNING 79

procedure, TD(λ) maintains a kind of memory, called the eligibility trace e. The

task of the trace is to keep track of previously visited states that are eligible for

being updated based on currently obtained rewards. To account for the trace,

the update rule of V(s) changes to

V t+1 (s) = V t (s)+α(r t+1 +γV t (s t+1 )−V t (s t ))e t (s),∀s ∈ S. (3.6)

Eligibility traces can be updated in various ways. One example are replacing

traces, which update the trace according to


⎨ 1 if s = s t ,

e t (s) =

⎩γλe t−1 (s) otherwise,

(3.7)

where 0 ≤ λ ≤ 1 is the decay factor of the trace. For λ = 0 the update rule

is equivalent to Equation (3.5). As λ increases, more importance is given to

the actual reward received at each time step rather than the value estimates of

subsequent states. Another popular trace option is an accumulating trace. This

trace increases the eligibilities of states by some constant upon visiting them,

rather than resetting them to a constant.

Theupdaterulesintroducedabovetellushowtoestimate the state-value function

V for an arbitrarily deterministic policy π. Ideally we would like to control

(improve) a policy with the aim to approximate eventually the optimal policy π ∗ .

Oneway toimprove π isto select ineachstatestheactiona ∗ = argmax a Q π (s,a)

that is expected to yield greatest return according to the estimates Q π . Since

Q π (s,a ∗ ) ≥ V π (s),∀s ∈ S (Bellman, 2003), the new policy will be at least as

good as π. To be able to control a policy we thus must explicitly estimate the

value of an action, or more generally the action-value function Q π . 4

Selecting always the best action seems to be a good strategy at first glance

because it ensures that we optimally exploit the the current estimates. The drawback

of this strategy, however, is that by taking always the current best action we

will never be able to try new actions and thus limit our chances of finding an even

better policy. Also, our current estimates may be inaccurate due to early sampling

errors, preventing the truly best action from being selected. To overcome

4 Clearly, if we would know the next state s t+1 we are moving to after taking an action a t ,

then we do not need to estimate Q π (s t ,a t ). We could achieve the same effect by picking the

action a t that will yield the state s t+1 with the greatest state-value estimate V π (s t+1 ).


80 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

Algorithm 3.4 Tabular Sarsa(λ), an on-policy TD control method

1: V(s,a) = 0, e(s,a) = 0,∀s ∈ S,a ∈ A(s)

2: for each episode do

3: initialize s,a (the action a can be e.g. selected at random from A(s))

4: repeat

5: take action a = π(s) (e.g. ǫ-greedy), and observe reward r and next state s ′

// the initial policy π can be e.g. to select a random action a from A(s) in

each state s

6: choose a ′ = π(s ′ ) (e.g. ǫ-greedy)

7: update e(s,a) for all s ∈ S and a ∈ A(s)

8: update Q(s,a) for all s ∈ S and a ∈ A(s) according to

Q(s,a) = Q(s,a)+α(r +γQ(s ′ ,a ′ )−Q(s,a))e(s,a)

9: s = s ′ ,a = a ′

10: until s is terminal state

these issues an agent must explore new actions. A simple approach for trading off

the exploitation of our current knowledge and the exploration of new actions is to

select the currently optimal action a ∗ for most of the time and only occasionally,

say with probability ǫ, a suboptimal one; this action selection method is referred

to as ǫ-greedy. This way we allow each action to be selected with a non-zero

probability π(s,a) > 0 at any time. Theoretically, this property ensures that in

the limit as the number of actions increases, each state-action pair (s,a) will be

sampled an infinite number of times, guaranteeing that the estimates of Q(s,a)

converge to the true values; in turn this means we can approximate the optimal

policy (given we have infinite time). Exploration may also be encouraged by initializing

all entries of Q(s,a) with an optimistically large value (Schmidhuber,

1991). This way we ensure that all actions are selected at least once. One way

to test the ability of an action selection method to trade off exploration with exploitation

is to run it on a multi-armed bandit (MAB) problem (see Section 3.1.2

for a description of this problem).

AnexampleofaTDalgorithmwithapolicycontrolmechanism, Sarsa(λ)(Rummery

and Niranjan, 1994), is shown in Algorithm 3.4. This RL algorithm belongs

to the class of on-policy control method because it estimates the value of a policy

while using it for control. RL algorithms that use two different policies for these

tasks are known as off policy control methods; Q(λ) (Watkins, 1989) is a famous

example of this algorithm class. The advantage of off-policy methods is that they

allow learning of the value function for a policy even if experience is available

from another policy. The obvious drawback, however, is that this may reduce


3.6. REINFORCEMENT LEARNING 81

the learning speed. There are many more RL algorithms, each having its own

weaknesses and strengths; for a review please refer to Sutton and Barto (1998).

3.6.1 Reinforcement learning in evolutionary optimization

The application of RL in optimization goes back to the seminal work of Zhang

and Dietterich (1995, 1996); Zhang (1996), which investigated RL in the context

of job-shop scheduling. The aim of this work was to learn a repair-based scheduler

that incrementally repairs (resource) constraint violations with the goal to

obtain schedules of short durations. Zhang and Dietterich considered complete

schedules as states, and used domain-specific heuristics as the actions. The reward

was a hand-crafted function based on the number of violated constraints

and repairs performed. The RL agent learnt a policy offline on problems involving

a small number of jobs, which was then tested on larger problems. Zhang

and Dietterich demonstrated that the repair-based scheduler can outperform a

non-learning iterative repair algorithm.

Withintheevolutionarycomputationcommunity, RLhasbeenusedfortuning

and controlling of EA operators and their parameter values. Müller et al. (2002)

were one of the first to combine evolutionary search with RL. They employed RL

to learn online how to adapt the mutation step size within ESs depending on the

success rate achieved with a mutation step size. The success rate after a fixed

number of mutations represented the state, and the actions were the options to

increase, decrease, or keep the current step size. The conclusion of this study

was that the choice of the reward function as well as the lower and upper bounds

of the mutation step size is crucial to obtain optimal performance. Eiben et al.

(2007)investigated asimilar approachforonlineadaptationofvariousparameters

involved inaGA,including themutationandcrossover rate, tournament sizeused

in the selection of parents, and population size.

Similar to the study of Zhang and Dietterich, Pettinger and Everson (2003);

Pettinger (2002) learnt a policy offline using a test environment and then tested

it in a different environment. The goal here was to learn when to switch between

different crossover and mutation operators employed by an EA during an

optimization process. Interestingly, the RL agent was also permitted to select

the type of individuals (either a random individual from the fittest 10% of the


82 CHAPTER 3. EVOLUTIONARY OPTIMIZATION

population, or one from the remaining population) to which the operators are

applied. The (discretized) state space in this application was characterized by

the current time step, the average population fitness, and the diversity of the

population. And, the reward after applying an operator was the relative fitness

difference between the fittest offspring generated and the fittest parent.

Recently, reinforcement learning has gained popularity in the field of hyperheuristics

(Cowling et al., 2000; Burke et al., 2009). A hyper-heuristic can be

seen as a method that automates the process of selecting heuristics from a given

set of (low-level) heuristics with the goal to improve the search performance. The

heuristics may represent different operators or strategies needed to construct a

solution. RL has been employed in this domain to select online between these

heuristics; see e.g. Nareyek (2003); Meignan et al. (2010); McClymont and Keedwell

(2011).

According to the classification of parameter setting methods of Eiben et al.

(1999) (see Section 3.1.2), we can assign RL-based approaches that learn some

policy π offline and online to the branch of parameter tuning techniques, respectively,

adaptive or self-adaptive parameter control techniques.

3.7 Chapter summary

We have now reviewed a variety of techniques for analyzing and tuning of EAs

to the main problem classes related to closed-loop optimization problems with

resourcing issues. In particular, we looked at problems featuring uncertainty (e.g.

noise), constraints, changing landscapes, and online decision making. Understanding

the techniques reviewed is helpful when trying to tackle novel challenges

in optimization. As we will see in the remainder of this thesis, many of the strategieswe

consider forcoping withresourcing issues arerelated andinspired byideas

described in this chapter. The use of Markov chains for modeling EA dynamics,

for instance, is employed in the next chapter to theoretically analyze the effect of

ERCs on evolutionary search.


Chapter 4

Ephemeral Resource Constraints:

Definitions and Concepts

In the previous two chapters we have introduced the class of closed-loop optimization

problems, where solutions are evaluated by conducting experiments,

and reviewed how evolutionary algorithms (EAs) have been extended to cope

with some of the common challenges that arise in such problems and others, such

as noise, constraints, changing environmental conditions, and real-time decision

making. In this chapter we look at a related but novel challenge in closed-loop

optimization deriving from resourcing issues. We introduce the framework to describe

a particular resourcing issue: the temporary non-availability of resources

required in the evaluation process of solutions. This resourcing issue yields an optimization

problem with a static fitness function but dynamic constraints, which

we call ephemeral resource constraints (ERCs). After formally defining problems

subject to ERCs, we introduce a variety of ERC types that were encountered

by Knowles (2009) in his collaborative work and that seem to be common in

real-world applications. Finally, we present an initial theoretical investigation on

the impact of ERCs on evolutionary search. The aim of this investigation is to

gain initial insights into how ERCs can affect the performance of an EA for different

configuration choices. Our analysis looks at configuration choices related

to different selection and reproduction schemes used within a simple EA.

83


84 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

4.1 Ephemeralresource-constrainedoptimization

problems (ERCOPs)

Compared with standard optimization problems, defining an ephemeral resourceconstrained

optimization problem (ERCOP) is more involved and requires knowledge

of the closed-loop environment that is to be modeled. The reason is that

the dependency of the ephemeral resource-constraints (ERCs) on some variables,

such as time, function evaluations, costs and previous solutions evaluated, does

not fit into thestandarddefinition ofanoptimizationproblem. How exactly these

variables are handled or simulated thus needs careful consideration.

Consider again Figure 2.1 of Section 2.1, which illustrates the basic setup of a

closed-loop problem: Solutions are planned on a computer using an optimization

algorithm, e.g. an EA, and then submitted for evaluation to the experimental

platform. Depending on the platform, we may submit for evaluation a single solution,

or, if solutions can be evaluated in parallel, then an entire population

plan (i.e. a generation). When the experiment is completed, a generation of fitness

values is returned, and the loop is repeated. But this iterative process is

complicated by the fact that resources are needed to conduct the experiments,

and these might run out e.g. as a function of time, or previous actions taken (or

both). If the EA requests the evaluation of a solution that cannot be evaluated

then something must happen, and this something must be defined. In a real experiment,

a human might intervene to order some new resources, or allow the EA

to generate some alternative solution(s).

The purpose of an ERCOP is to simulate the above kinds of experimental

scenarios, including the way that non-evaluable solutions arise (i.e. as a function

of parameters such as time, search history, or costs), and how they are to be

handled.

4.1.1 General problem definition of an ERCOP

To define ERCOPs formally let us recall the definition of a generic optimization

problem of Section 2.1, and add to it the notion of a time-ordered search, and

the concept of a non-evaluable solution, as follows.


4.1. ERCOPS 85

An optimization problem of generic form can be defined as

maximize y = f(⃗x) (4.1)

subject to ⃗x ∈ X,

where ⃗x = (x 1 ,...,x l ) is a solution vector and X a feasible search space. The

static objective function (also known as fitness function) f : X ↦→ Y represents a

mapping from X into the objective space Y ⊂ R.

A black box optimization algorithm a, e.g. an EA, for solving the above problem

can be represented as a mapping from previously visited solutions to a single

new solution in X, as suggested by Wolpert and Macready (1997). Formally,

a : φ(t) ↦→ ⃗x t , where the search history φ(t) = {(⃗x 1 ,y 1 ),...,(⃗x t−1 ,y t−1 )} denotes

the time-ordered set of solutions visited until time step t−1, and ⃗x i and y i indicate

the X value, respectively, the corresponding Y value of the ith successive

element in φ(t). We augment this notion of a search algorithm with the ability

to visit a null solution, x null ∉ X, with the effect that the algorithm can ‘wait’

for a time step without evaluating a solution. An optimizer might submit null

solutions, for example, if it wishes to wait until a missing resource is again available.

In fact, this approach has been employed by Schwefel (1968) in his flashing

nozzle problem (see preface of Chapter 1).

NowweareabletodefinethegeneralERCOP. Whileinstandardoptimization

problems, the objective value f(⃗x t ) of a feasible solution ⃗x t ∈ X is y t = f(⃗x t ), in

an ERCOP, we have


⎨f(⃗x t ) if ⃗x t ∈ E t (σ t ) ⊆ X

y t =

⎩null

otherwise,

(4.2)

where E t (σ t ) represents the set of evaluable solutions (or evaluable search region)

at time step t. In our case, E t (σ t ) is defined by a set of schemata into which

solutions have to fall in order to be evaluable; 1 we introduced the concept of

schemata in Section 3.1.1 in the context of the schema theorem. 2 The set E t (σ t )

1 Notice that if schemata represent evaluable solutions, then E t is the union of schemata.

If schemata represent non-evaluable solutions, as it will be the case with random ERCs (see

Section 4.2.4), then E t is the complement of the union of schemata.

2 Notice that in the context of ERCs, schemata are not related to the Schema Theorem or

schema processing per se; here we use the concept of schemata merely as a convenient way to

identifyanysubspaceofthesolutionspaceinordertospecifyregionsthatbecomenon-evaluable.


86 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

X

4

E t

t

1

X

4

t+1

E t+1

1

2

5

6

2

3

7

5

7

6

3

8

8

Figure 4.1: A schematic diagram showing how a population consisting of the solutions

1,2,3,4,5,6,7,and 8 might be distributed across the feasible search space X and the

evaluable search space E t . At time step t, only the solutions 2,5,6, and 7 can be evaluated

while the solutions 1,3,4, and 8 must be repaired to be evaluable. Each evaluation

might cause a change in E t .

may change over time depending on a set of problem-specific and time-evolving

parameters σ t . The availability of resources required for the evaluation of solutions

depends on these parameters. Thus, the set σ t may include parameters such

as various types of counters (e.g. cost, time and evaluation counters), the search

history (which may be used to encode non-availabilities of resources due to previously

made decisions), random variables (which may encode random events like

machine breakdowns), and so forth. How exactly the set E t changes depending

on the parameters σ t is modeled by the ERCs.

Note that the objective function f (thus also the global optimum) is static

anddoes not change over time inastandard ERCOP; it is just theset of solutions

evaluableateachtimesteptthatmaychange. Inthiscontext, repairingasolution

means to modify, through some repair function, the genotype of a solution that

is not in E t such that it is forced into E t ; i.e. the outcome of a repairing step is a

solution that falls into all schemata that define E t (assuming that the schemata

are non-contradictory; i.e. E t = ∅).

Compared to standard (dynamic) constraints, the meaning of ERCs is different:

a solution ⃗x that violates an ERC at time t, or ⃗x /∈ E t , is not infeasible but

non-evaluable at time step t. That is, the experiment that is associated with ⃗x —

be it involved in the genotype-phenotype mapping or the fitness assessment of the

phenotype — cannot be conducted and thus the objective function is undefined

(or null) for solution ⃗x at time t. 3 Figure 4.1 illustrates a common situation in an

ERCOP with respect to the distribution of solutions across E t and the feasible

3 Notice the difference between a null solution and a solution with a fitness value of null: A

null solution is submitted with the purpose to skip an evaluation, while a solution with fitness

null is submitted with the purpose of being evaluated but then is not due to a lack of resources.


4.1. ERCOPS 87

search space X.

Time in an ERCOP can be seen as the simulated time defined by the real

closed-loop experimental problem that is to be simulated. Hence, time may refer

not only to function evaluations of single solutions, as is the case in standard

optimization problems, but also e.g. to real time units (e.g. seconds) or cost units

(e.g. pounds). This notion of time allows ERCs to be dependent amongst others

on the number of evaluated solutions, expenses, or a certain date, such as days

of the week (independent of the number of evaluated solutions).

Thenormalassumptionisthatallevaluationstakeequaltimeorresources, but

this need not be the case. Generally, experiments may be of different durations

and have non-homogeneous costs in terms of the financial or temporal resources

they require.

4.1.2 Further comments on ERCOPs

ThedefinitionofERCOPswegaveinSection4.1.1representsthetypeofERCOPs

we consider in this thesis. There are several ways in which this definition can be

extended.

We consider ERCOPs that feature a static objective function f. However, it

is also realistic for ERCOPs to feature a dynamic function f (see Section 3.4) in

addition to dynamic constraints. For instance, resources in form of raw material

or instruments may be replaced with new ones during the optimization process,

causing a change in f and potentially a shift in the global optimum. We will

consider optimization problems with a dynamic function f when investigating

problems subject to changing variables in Chapter 6, but this will not be done

within an ERCOP framework.

Another extension to ERCOPs is to optimize not only a single objective f

but several f 1 ,...,f m (Deb, 2001). For instance, in a drug discovery problem one

may not only aim to find potent drug mixtures but also ones that are unlikely

to cause side effects, and involve a small number of drugs (as this reduces costs

when manufacturing drug mixtures in mass quantities).

Moreover, the objective function f is typically noisy in closed-loop scenarios

(see Section 3.2), and there may be standard constraints defining the feasible

search space X (see Section 3.3). All these aspects are important to be accounted

for but, for the sake of understanding the effect of ERCs on EAs, we believe it is

best to start with a standard problem setup, meaning non-noisy objective values


88 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

and unconstrained search spaces.

Finally, we remark that an ERCOP can be cast as a reinforcement learning

problem rather than a pure optimization problem. For various reasons that

will become clear later, we postpone a discussion related to this subject until

Chapter 7.

4.2 Specific types of ephemeral resource constraints

(ERCs)

In this section we introduce four ERC types that were encountered by Knowles

(2009) in his collaborative work and that seem to be common in real-world applications:

commitment relaxation ERCs, periodic ERCs, random ERCs, commitment

composite ERCs; Table 4.1 gives examples of these constraints (more

detailedexamplesaregiveninthesectionsintroducing therespective ERCtypes),

and the Nomenclature section summarizes notation related to ERCs, which we

will introduce in more detail next. Before we define each ERC type, we want to

introduce three elements that are common to all ERC types considered. These

elements are the constraint time frame, the activation period and the constraint

schema.

4.2.1 Fundamental elements of ERCs

The activation period k(ERC i ) of ERC i ,k ∈ Z + , is the number of counter units

for which that constraint remains active, once it is ‘switched on’. Similar to time

steps, counter units may refer to function evaluations of a single solution, a set

of solutions (in case experiments are conducted in parallel), real time units (e.g.

seconds), acalendarperiod(e.g.Tuesday2-4pm), orsomethingelse. Inthisthesis,

they refer to function evaluations of a single solution.

The constraint time frame (ctf) of ERC i is {t|t start

ctf (ERC i ) ≤ t < t end

ctf (ERC i)}

where t represents some counter unit, as above. 4 The constraint ERC i may be

active only during the ctf and not outside of the ctf. That is, if we assume an

ERCOP to be subject to a single constraint, then we have E t (σ t ) ⊆ X,∀t ∈ ctf,

and E t (σ t ) = X,∀t /∈ ctf. The period of time 0 ≤ t < t start

ctf and t end

ctf ≤ t ≤ T (T is

4 We are looking at the optimization scenario where the ctf is a single continuous period of

time but it also realistic to have a ctf that is separated by unconstrained periods.


4.2. SPECIFIC TYPES OF ERCS 89

Table 4.1: Different types of ephemeral resource constraints (ERCs) and examples.

ERC

Commitment relaxation ERC

(models the temporary commitment

to a certain resource upon

using it)

Periodic ERC (models availabilities

of resources at regular time

intervals only)

Random ERC (models temporary

non-availabilities of randomly

selected resources)

Commitment composite ERC

(models scenario where variables

of a solution define a composite,

e.g. a tyre, that needs to be

locally available for the solution

to be evaluated. Composites

need to be ordered in advance,

are associated with a reuse

number and shelf life, and have

to be kept in capacity limited

storage.)

Examples

• Commitment to a certain instrument setting

for some time once it is selected.

• Commitment to a certain drug for some time

(e.g. until the drug is used up) once it is given

to the robot for mixing.

• Periodic availability of certain engineers required

to operate certain instruments.

• Periodic availability of certain drugs due to

logistic limitations.

• Non-availabilityofcertaininstrument settings

due to machine/instrument failures.

• Delays in the delivery of drugs.

• When optimizing the crash test abilities of a

vehicle, the tyre setup may be subject to tuning.

A tyre would represent a composite that

costs money and must be ordered in advance,

used up within a certain period, and involved

in at most a certain number of tests; storage

space for tyres may be limited too.

• Drugs that must be ordered in advance and in

batches only, have shelf lives, and need to be

kept in capacity-limited storage.

the total optimization time) is the preparation period, respectively, the recovery

period (see Figure 4.2). The duration of these two periods has a significant effect

on the performance of an EA, as we will see later.

The restriction imposed by an ERC during the activation period can be of

different forms. Inourcase, resources areassociatedoftendirectlywithindividual

solution variables, which allows us conveniently to use the notion of schemata

to describe availabilities of resources. We say that solutions have to fall into

a particular constraint schema H(ERC i ) (associated with a constraint ERC i )

in order to be evaluable. From Section 3.1.1, where we introduced the concept

of schemata in the context of the schema theorem, we know that a schema H

represents aparticularsubset ofsolutionsthatsharesomecommonproperties. To


90 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

preparation period constraint time frame

recovery period

0

t start

ctf

t end

ctf

T

t

Figure 4.2: An illustration of how the available optimization time T can be divided

into the preparation period 0 ≤ t < t start

ctf

, the constraint time frame t start

ctf

≤ t < t end

ctf ,

and the recovery period t end

ctf

≤ t ≤ T.

make it more specific, let us consider a drug discovery problem with the objective

being to identify the most potent mixtures of drugs drawn from a library of size

l = 5; a decision variable is binary indicating whether a drug is present (1) or

absent (0) in a drug mixture, which is an entire bit string. In this example,

the constraint schema H = (∗1 ∗ ∗0) would describe the set of all solutions

(drug mixtures) for which the drugs associated with the second and fifth bit

position are present and absent, respectively. The wildcard symbol ∗ means that

we do not care whether the drug associated with that bit position is present or

absent in a mixture (i.e. bit values can be either 0 or 1). Recall also the two

general properties of a schema, its order o(H) and its length l(H), representing

the number of defined bit positions and the distance between the first and last

defined bit position, respectively; for the above example we have o(H) = 2 and

l(H) = 3. In the presence of multiple constraints ERC i , solutions have to fall

into the union of the schemata ⃗x ∈ ⋃ i H i associated with the constraints. 5 The

meaning of the ERC related parameters is also summarized in the Nomenclature

section.

In this thesis we consider discrete search spaces, mainly of pseudo-boolean

nature or X ∈ {0,1} l . In non-discrete ones, H might restrict solution parameters

to lie within or out of certain parameter value ranges rather than to take specific

parameter values. This scenario may occur, for example, when optimizing the

concentration of drugs involved in a drug cocktail. Here, we may have available

only a limited amount of the individual drugs, causing the concentration of a

drug to be upper bounded.

5 Alternatively, theschemataH i mayrepresentresourcesthatarenotavailableforevaluation,

thus define the evaluable region E t negatively. In this case, solutions are evaluable if they are

a member of the space spanned by the complement of the union of schemata or ⃗x ∈ ¬( ⋃ i H i).


4.2. SPECIFIC TYPES OF ERCS 91

V

0

k(1) k(2) ...

k(6)

T

t

Figure 4.3: An illustration of how a commitment relaxation ERC may partition the

optimization time into epochs of length V, and how it may be potentially activated.

The activation period k(j) during the jth epoch is represented by the dashed part.

4.2.2 Commitment relaxation ERCs

The first type of ERCs we introduce are commitment relaxation ERCs. Generally

speaking, this ERC forces or commits an optimizer to a specific variable value

combination (i.e. constraint schema) for some (variable) period of time whenever

it uses this particular combination. Forcing a variable or linked combination of

variables to be fixed for some time models real-world problems involving changeover

costs of one sort or another. In particular, if changing a variable’s value

would incur some (large) change-over cost, such as a cleaning step, a component

replacement, or a testing phase, then such changes to the variable may be made

taboo for some period. Often, the change-over is much cheaper if done at a particular

time step immediately after component replacements or cleaning (which

is commonly done routinely rather than reactively), and so the variable can be

allowed to change at that point.

A commitment relaxation ERC simulates the above scenario, so that some

variable(s) setting (or schema) H, is forbidden from changing during a period of

time we refer to as an epoch. We denote by V the duration of the epoch, and

define the activation period k(j),0 ≤ k(j) ≤ V, to be the duration of the period

of time we have to commit to a particular setting H during the jth epoch. Note,

the length of the activation period may change with each new epoch depending

on when the particular setting H is selected by the optimizer. To describe the

setting H we can conveniently use a constraint schema. For example, we would

use H = (∗1∗∗0) to state that a commitment is associated with the instrument

setting for which the value of bit position 2 and 5 is set to 1 and 0, respectively.

Figure 4.3 illustrates the partition of the optimization time into epochs, and

a possible distribution of activation periods. Imagine the six epochs illustrated

by the figure to represent six working days, each consisting of V = 9 hours (assuming

working hours to be from 8am to 5pm). The limitation that causes the

commitment relaxation ERC to arise, can be:


92 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.1 Implementation of a commitment relaxation ERC

Require: t start

ctf

(startofctf), t end

ctf

(end ofctf), V (duration ofanepoch), H (constraintschema)

1: commitmentRelaxationERC(⃗x,t){

2: if t = 0 then

3: last activation = 0; k = 0 // initialize auxiliary variables

4: if t ∈ ctf then

5: if t−last activation ≥ k then

6: if ⃗x ∈ H then

7: last activation = t; k = V −t mod V

8: return true // ⃗x is evaluable

9: else if ⃗x /∈ H then

10: return false // ⃗x is not evaluable

11: else

12: return true // ⃗x is evaluable

13: else

14: return true } // ⃗x is evaluable

“In an optimization problem involving the selection of instrument settings, the

configuration, c, once set, cannot be changed during the remainder of the working

day.”

In the above example, the constraint schema H represents the parameter

combination that corresponds to instrument configuration c. The length of an

activation period is bounded by 0 ≤ k(j) ≤ 9. For instance, imagine we select

instrument configuration c in the middle of the day, say at 1pm, as indicated by

epoch j = 1 in the figure. This will activate the ERC for a period of k(1) = 4

(=5pm-1pm) hours (indicated by the dashed part). Activating the ERC later,

earlier, or not at all during a working day changes k(j) accordingly. Clearly,

whether we actually evaluate solutions during an activation period (and thus

use the setting c), or submit null solutions during this period, depends on the

constraint-handling strategy employed; it is only that if we decide to evaluate a

solution during an activation period, then it needs to be from H.

From Figure 4.3 it is also apparent that the total number of constraint activations

during the optimization can vary between 0 ≤ j ≤ ⌈T/V⌉. That is, we

might be lucky and the ERC may be never activated, e.g. if solutions belonging

to H do not lie on an optimizer’s search path, but already one activation may introduce

enough solutions from H into the population such that future activations

might be more likely.


4.2. SPECIFIC TYPES OF ERCS 93

Algorithm 4.2 Illustration of how we manage ERCs and the evaluation of solutions

Require: ERC 1 ,...,ERC r (set of ERCs), f (objective function), T (time limit)

1: t (global time variable representing the current time step)

2: functionWrapper(⃗x,t){

3: y t =null

4: if t < T then

5: if ⃗x is a null solution then

6: ⃗x t = ⃗x

7: else

8: if ⃗x satisfies the ERCs ERC 1 ,...,ERC r then

9: ⃗x t = ⃗x; y t = f(⃗x t )

10: else

11: ⃗x t =repair(⃗x,t); y t = f(⃗x t )

12: t++

13: return ⃗x t and y t }

The corresponding implementation of a commitment relaxation ERC is defined

by Algorithm 4.1. The method commitmentRelaxationERC(⃗x,t) is defined

by the parameters t start

ctf ,t end

ctf

,V, and H, and it takes as input a candidate solution

⃗x that is to be checked for evaluability and the current (global) time step t. The

output is a boolean value indicating whether ⃗x is evaluable or not. The method

maintains two auxiliary variables, last activation and k, required to update the

internal state of the constraint: Line 5 to 7 are responsible for the activation

of the ERC and the setting of the activation period, while Line 9 ensures that

solutions have to be in H during an activation. In future, we will denote a commitment

relaxation ERC of this form by commRelaxERC(t start

ctf ,t end

ctf

,V,H). The

meaning of the parameters related to this and other ERC types is summarized in

the Nomenclature section too.

An extension to this simple commitment relaxation ERC is to maintain not

only one but several commitment relaxation ERCs with different constraint

schemata H i . In this case, we need to consider three aspects: (i) a solution is

non-evaluable if it violates at least one ERC, (ii) a repaired solution has to satisfy

all activated ERCs and not only the ones that were violated, and (iii) it needs

to be checked whether a repaired solution activates an ERC that was not activated

before. We will consider this extension later in Section 5.1.4. At this point,

we would like to describe how we manage an ERC method, such as the method

commitmentRelaxationERC(⃗x,t). To keep things simple, we design a function

wrapper, as shown in Algorithm 4.2, that takes care of the management of ERCs


94 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

and the evaluation of solutions. Calling this wrapper is similar to calling the objective

function f in a standard optimization problem. The wrapper is defined by

the set of ERCs, ERC 1 ,...,ERC r , an ERCOP is subject to, the objective function

f, and the time limit T. It takes as input a solution ⃗x to be evaluated, and the

current (global) time step. If the solution is a null solution (which is checked at

Line 5), then the time counter is only incremented. If the solution is not a null

solution and satisfies all ERCs (i.e. all ERC methods return true at Line 8), then

it is immediately evaluated, otherwise it is first repaired and then evaluated. The

output of the wrapper is the (repaired) solution and its fitness; the fitness is null,

if a null solution is submitted. The repairing method repair(⃗x,t) needs to be implemented

by the optimizer. The structure of the function wrapper depends on

the ERCOP to be simulated and thus needs to be specified alongside the ERCs.

4.2.3 Periodic ERCs

A periodic ERC models the availability of a specific resource, represented by a

constraint schema H, at regular time intervals. That is, the ERC is activated

every P time steps (period length) for an activation period of exactly k time steps

(see Figure 4.4). As the ERC models the availability of resources, an individual

has to be a member of H during the activation period. An example of a periodic

ERC is:

“In an optimization problem requiring skilled engineers to operate instruments,

on Mondays, only engineer eng i is available.”

In the above example, the activation period is k = 1 (assuming a time step

is a day), the period length is P = 7 (i.e. a week), and the constraint schema

H represents the parameter combination that corresponds to the instruments (or

their settings) operated by engineer eng i .

The corresponding implementation of a periodic ERC is defined by Algorithm

4.3. The method periodicERC(⃗x,t) is defined by the parameters t start

ctf ,t end

ctf ,

k,P, and H, and it takes as input a candidate solution ⃗x that is to be checked for

evaluability and the current (global) time step t. The output is a boolean value

indicating whether ⃗x is evaluable or not.

In future, we will denote periodic ERCs by perERC(t start

ctf ,t end

ctf

,k,P,H). In

Algorithm 4.2, the method periodicERC(⃗x,t) is called at Line 8. A potential


4.2. SPECIFIC TYPES OF ERCS 95

P

constraint time frame

0

k

t start

ctf

t end

ctf

T

t

Figure 4.4: An illustration of a periodic ERC perERC(t start

ctf

,k,P,H). The ERC

is activated every P time steps for an activation period of always k time steps.

ctf

,t start

extension of a periodic ERC is that the period length and the activation period

refer to different counter units. For example, consider the maintenance of machines.

While maintenance might take hours (i.e. k might be measured in real

time units), machines might need to be maintained after using them a certain

number of times (i.e. P is measured in function evaluations).

4.2.4 Random ERCs

A random ERC models the temporary non-availability of randomly selected resources,

whereby all resources correspond to constraint schemata of pre-determined

fixed order o(H). In other words, if a resource becomes unavailable, then

we activate a new (plain) ERC for an activation period of k time steps and associateitwithaconstraint

schema H forwhichtheo(H)order-definingbitpositions

and their values are selected at random. As the ERC models the non-availability

ofaresource, anevaluablesolution cannot be a member ofH; noticethedifference

to periodic and commitment relaxation ERCs where solutions have to be from H

during activation periods. We allow multiple resources to be non-available at the

same time, meaning an evaluable solution must not be a member of any of these

constraint schemata. Let us denote the set of ERCs that is activated at time step

t by ActERCs t . We remove ERCs with an expired activation period from this set.

So far, we have not specified how a resource can become unavailable and thus

how a new ERC is activated. Obviously, there are many options but since we

want to use this ERC type to model random effects, such as unexpected machine

breakdowns, we activate a new ERC at each time step with an independent

probability of p; we call this probability the activation probability.

We want to avoid situations in which there is no evaluable solution. This

is simply achieved by removing any ERCs from the set ActERCs t that have

a constraint schema that is contradictory to the constraint schema of a newly

activated ERC. Clearly, this can be realized differently. For example, by not


96 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.3 Implementation of a periodic ERC

Require: t start

ctf

length), H (constraint schema)

1: periodicERC(⃗x,t){

(start of ctf), t end

ctf

(end of ctf), k (length of the activation period), P (period

2: if t ∈ ctf ∧ (t−t start

ctf

) mod P < k ∧ ⃗x /∈ H then

3: return false // ⃗x is not evaluable

4: else

5: return true } // ⃗x is evaluable

allowing a contradictory ERC to enter ActERCs t in the first place. Alternatively,

we may allow contradictions and consequently be perhaps unable to evaluate any

solution from time to time. Figure 4.5 shows how the set ActERCs t may vary

over time and the effects of enforced ERC removals using our approach.

A simplified real-world example of a random ERC is:

“In a multi-stage production process, a manufactured product runs through a series

of stages, where, at each stage, the product is processed by a single machine

which progresses it towards its final manufactured state. We wish to optimize the

production facility/process in the following sense. For each stage there is a choice

from among several machines (or machine types) capable of advancing the product

to the next stage; we wish to select the best machine to use at each stage.

Each machine may breakdown at any time step with a probability of 2% in which

case it needs to be repaired; standard repairing takes 5 hours including test runs.

Repairing time may increase if machine failures are more severe than expected

(A in Figure 4.5) but, if necessary, a machine can be used before the test runs are

finished (B in Figure 4.5).”

In this example we have an activation probability of p = 0.02, an activation

period of k = 5 (assuming a time step is one hour) and a fixed order of any

constraint schema of o(H) = 1 (because a single machine might break down).

It is easy to see that a random ERC with a small p and k is unlikely to harm

the optimization much (even if o(H) high) as machines rarely break down and in

case they do, they are quickly repaired. Clearly, the opposite is the case if either

of the parameters is large; in this case, even a low order o(H) can severely harm

the optimization.

The corresponding implementation of a random ERC is defined by Algorithm

4.4. The method randomERC(⃗x,t) is defined by the parameters t start

ctf ,t end

ctf ,


4.2. SPECIFIC TYPES OF ERCS 97

constraint time frame

0

H = (1∗)

H = (0∗)

H = (∗1)

H = (∗0)

t start

ctf

B

k

...

A

t end

ctf

T

t

Figure 4.5: A random ERC with an order of o(H) = 1, an activation period of k time

steps and an activation probability of p ≈ 2/k (k ≥ 2) is applied to binary strings

of length l = 2. The 4(=2 o(H)( l

o(H))

) possible constraint schemata are represented on

the left side. Each dashed block represents the activation period of an ERC in the set

ActERCs t . One can see that multiple ERCs can coexist, including ones with identical

constraint schemata (A), and that an ERC is removed from ActERCs t if its activation

period expires or if its constraint schema contradicts the one of a newly activated ERC

(B).

k,o(H), and p. The method takes as input a candidate solution ⃗x that is to

be checked for evaluability and the current (global) time step t, and outputs a

boolean value indicating whether ⃗x is evaluable or not. Inside the method, we use

H i ,i = 1,...,|ActERCs t | to refer to the constraint schema of the ith ERC in the

set ActERCs t at time step t. The call to rand(0,1) returns a random, uniformly

distributed real value between 0 and 1.

In future, we will denote random ERCs by randERC(t start

ctf ,t end

ctf ,k,o(H),p).

In Algorithm 4.2, the method randomERC(⃗x,t) is called at Line 8. We want to

point out here that we do not analyze the effect of random ERCs inthis thesis; we

included the definition of the ERC for completeness. For an analysis of random

ERCs on evolutionary search we refer to a technical report (Allmendinger and

Knowles, 2010).

4.2.5 Commitment composite ERCs

The last type of ERCs we introduce here are commitment composite ERCs; this

ERC type is slightly more complex than the other types because it combines

several real-world limitations. A commitment composite ERC occurs when some

variables of a candidate solution define a composite that requires resources to be

locally available (e.g. in a cache) in order for the solution as a whole to be realized

and/orevaluated. Wecanconveniently usethenotionofschemata todescribe the

resource-requiring composite part of a solution. For example, assuming a binary

representationofsolutions, wewoulduseH # = {∗∗###∗∗∗∗∗##} tostatethat


98 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.4 Implementation of a random ERC

Require: t start

ctf

(start of ctf), t end

ctf

(end of ctf), k (length of the activation period), o(H) (fixed

order of a newly arising constraint schema), p (activation probability)

1: ActERCs t = ∅ (set of ERCs activated at time step t)

2: randomERC(⃗x,t){

3: if t ∈ ctf then

4: remove any ERCs from ActERCs t whose activation period is expired

5: if rand(0,1)< p then

6: H ′ = createH(o(H)) // create a new ERC of order o(H) at random at random, and

time stamp the ERC with t to keep track of its activation period; add the ERC to

the set ActERCs t , and remove any ERCs from this set whose constraint schema is

contradictory to H

7: for i = 1 to |ActERCs t | do

8: if ⃗x ∈ H i then

9: return false // ⃗x is not evaluable

10: return true // ⃗x is evaluable

11: else

12: return true } // ⃗x is evaluable

bit positions 3, 4, 5, 11 and 12 define a composite; we refer to the bit positions

denoted by # as the composite-defining bits, and the order o(H # ) to be the

number of composite-defining bits in the schema (we refer to H # as the high-level

constraint schema). In the problems tackled in this thesis the composite-defining

bits are static, and form a part of the ERC problem definition.

Whenasolutionistobeevaluated,wemustlookatthecomposite-definingbits

ofitsgenotype andcomparethem toalocalcache ofcomposites (inAlgorithm4.5

this cache is represented by the variable SCInfo). Each composite in the local

cache is indexed by a bit-string of the same length as the order of the high-level

constraint schema. If there is a match, the solution can be evaluated; if not, the

solution may not be evaluated at the current time step.

We define the cache to be made up of a number of storage cells, #SC. Typically,

the number of storage cells is smaller than the space of possible composites,

which is 2 o(H #) in a binary search space. A composite available in a storage cell

may be used in the evaluation of more than one solution: each composite may be

used up to RN (reuse number) times and has a shelf life of SL time steps, and

we assume SL ≥ RN. Finally, the composites available in the cache at time t are

a function of previous purchase orders made, and a fixed time lag TL between

a purchase being made and it arriving. When composites arrive at a particular

time, they are immediately put in a storage cell (and any existing composite in

that cell is discarded); which storage cell is selected is defined either at the time

of purchase or at the time of arrival (see Line 4 and 5 of Algorithm 4.5).


4.2. SPECIFIC TYPES OF ERCS 99

# Cell

1 2 3 4

001

101

000

SL = 2 SL = 3 SL = 7

RN = 5

111

SL = 1

RN = 4 RN = 1 RN = 6

Order 011 and 110

(c = c+2×c order )

t

Experiment

⃗x = (10101)

f(⃗x)

EA

001

SL = 1

RN = 5

101

SL = 2

RN = 3

000

SL = 6

RN = 1

111

SL = 0

RN = 6

Composites 011

and 110 arrrived

Shelf life is over,

empty cell

Update storage cells

and queue of not arrived

composite orders

store 011 in cell 4

and 110 in cell 1

EA

t+1

(c = c+c time step )

110

SL = 20

RN = 10

101

SL = 2

RN = 3

000

SL = 6

RN = 1

011

SL = 20

RN = 10

...

...

Figure 4.6: A visual example of the commitment composite ERC

commCompERC(H # = {### ∗ ∗},#SC = 4,TL = 1,RN = 10, SL = 20);

each composite order and time step costs c order and c time step units, respectively. The

evaluation step at time step t reduces the reuse number of the composite in cell 2.

At the same time step, the shelf life of the composite in cell 4 expires, and two new

composites are ordered. One time step later, t+1, the ordered composites arrive and

put into cells determined by the EA.

To make the constraint more realistic we associate costs of c order and c time step

units with each submitted composite order and time step, respectively. The available

budget, which cannot be exceeded, is denoted by C; any composite can be

purchased as often as desired, as long as we are within the budget. Figure 4.6

gives a visual example of the ERC.

An example of a commitment composite ERC is:

“In an optimization problem involving the selection/design of vehicle parts least

harmful to pedestrians in case of a crash, we wish amongst others to identify the

most suitable configuration for the tyres of the vehicle. A tyre is made up of

several parameters, such as size, thickness, and rubber material. Upon defining

these parameters, we order the tyres, which is associated with a fixed cost of 500


100 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Algorithm 4.5 Implementation of a commitment composite ERC

Require: t start

ctf

(start of ctf), t end

ctf

(end of ctf), H # (high-level constraint schema), #SC (number

of storage cells), TL (time lag), RN (reuse number), SL (shelf life), c order (cost per

submitted composite order), c time step (cost per time step), C (budget)

1: c = 0 (cost counter), SCInfo = ∅ (storage cell information), QInfo = ∅ (composite queue

information) // Initialize global state variables

2: commitmentCompositeERC(⃗x,t){

3: if t ∈ ctf then

4: collect composite orders if within budget, and increment c by c order for each submitted

order; add all ordered composites to the queue QInfo and time stamp them with t; let

the optimizer choose the storage cell indexes into which the ordered composites should

be stored upon their arrival

5: empty the storage cells in SCInfo that belong to composites whose shelf life is over;

remove any composites from QInfo whose time lag is over, and add them to the storage

by updating SCInfo; if the optimizer did not specify previously the storage cell indexes

into which arriving composites should be put, then let it choose the cell here

6: if c < C then

7: P = ∅ (auxiliary set containing the indexes of storage cells that contain a composite

that matches the composite required by the solution⃗x; this set is only important when

⃗x requires a composite that is available in storage multiple times)

8: for i = 1 to |SCInfo| do

9: if ⃗x ∈ H i then

10: P = P ∪i

11: if |P| = 1 then

12: increment the reuse number of the composite stored in the storage cell with the

index represented by the element of P; if the composite is used up, then empty the

corresponding storage cell

13: return true // ⃗x is evaluable

14: if |P| > 1 then

15: let the optimizer decide from which storage cell the composite should be taken, and

subsequentlyincrementthereusenumberoftheselectedcomposite; ifthecomposite

is used up, then empty the corresponding storage cell.

16: return true // ⃗x is evaluable

17: return false // ⃗x is not evaluable (because the composite needed is not available)

18: return false // ⃗x is not evaluable (because we are above the budget C)

19: else

20: return true } // ⃗x is evaluable

Pounds and a delivery period of 3 days. To allow for a valid assessment of tyres,

a set of tyres can be involved in at most 5 crash test trials, and can be kept in

storage for not more than 1 month. The storage itself is limited in size to 10

sets of tyres. Every day of crash testing involves a fixed charge of 3000 Pounds

including things like labor, rent of venue, and electricity.”

In this example a composite is a tyre and the composite-defining bits are the

variables defining a tyre. Ordering tyres is associated with a time lag of TL = 3

(assuming a time step is one day), and tyres have a reuse number of RN = 5 and


4.3. THEORETICAL ANALYSIS 101

a shelf life of SL = 30 (assuming one month consists of 30 days). The number of

storage cells is #SC = 10, and the costs associated with a composite order and

time step are c order = 500 Pounds and c time step = 3000 Pounds, respectively.

The corresponding implementation of a commitment composite ERC is defined

by Algorithm 4.5. The method commitmentCompositeERC(⃗x,t) is defined

by the parameters t start

ctf , t end

ctf ,H #,#SC,TL,RN,SL,c order ,c time step , and C.

It takes as input a candidate solution ⃗x that is to be checked for evaluability

and the current (global) time step t, and outputs a boolean value indicating

whether ⃗x is evaluable or not. The variables representing the cost counter c, storage

cell information SCInfo (contents in form of schemata, remaining shelf lives

and remaining reuse number), and the queue QInfo of submitted but not arrived

composites, are global state variables and visible to the optimizer.

In addition to repairing, three reactive tasks need to be defined by the optimizer

to deal with this ERC type: (i) ordering of composites (Line 4), (ii) assigning

storage cell indexes to arriving composites (Line 4 or 5), and (iii) deciding

which composite to use if multiple identical ones are available (Line 15). With

this ERC, repairing involves modifying a non-evaluable solution such that it falls

into the constraint schema of a composite that is currently in storage.

In future, we will denote a commitment composite ERC of this form by

commCompERC(H # ,#SC,TL,RN,SL). 6 InAlgorithm4.2,themethodcommitmentCompositeERC(⃗x,t)

is called at Line 8.

4.3 Theoretical analysis

Having defined ERCOPs and several ERCs, we are now ready to analyze their effect

on evolutionary search. We begin with an initial theoretical investigation

on the impact of periodic ERCs on two selection and reproduction schemes

commonly-used within EAs; the next chapter will continue with an empirical

analysis. The theoretical analysis uses the concept of Markov chains, which we

first mentioned in Section 3.1.1. The next section gives a more detailed introduction

to Markov chains. We then derive the Markov model (transition probabilities)

that account for periodic ERCs in Section 4.3.2, and analyze the simulation

results in Section 4.3.3. Finally, Section 4.3.4 summarizes the insights gained

6 We leave out the variables t start

ctf ,t end

ctf ,c order,c time step , and C from commCompERC(...) for

ease of presentation. They will be specified where appropriate.


102 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

from the theoretical study.

4.3.1 Markov chains

A Markov process is a random process that has no memory of where it has been

in the past such that only the current state of the process can influence the

next state. If the process can assume only a finite or countable set of states,

then it is usual to refer to it as a Markov chain (Norris, 1998). In other words,

assuming that X t ,t = 0,1,... are random variables with variable X t depending

on X t−1 only, then we can think of a Markov chain as a sequence X 0 ,X 1 ,X 2 ,...

of random events occurring in time (Reeves and Rowe, 2003). Suppose S 0 ,...,S µ

are the µ+1 possible values that each of the variables X t can take. Then, a chain

moves from a state S a at time t, to a state S a ′ at time t+1 with a probability of

p a,a ′ = P(X t+1 = S a ′|X t = S a ). The probabilities p a,a ′ (a,a ′ = 0,...,µ) are called

transition probabilities and form the µ+1×µ+1 matrix P, the transition matrix.

Thus, the probability that the chain is in state S a at time t is the ath entry in

the probability vector

⃗u t = ⃗u 0 P t , (4.3)

where ⃗u 0 is the (µ+1)-dimensional probability (column) vector that represents

the initial distribution over the set of states.

WhenanEAismodeledbyaMarkovchainitiseasytoseethatthepopulation

is the natural choice for describing a state. The transition probabilities then

express thelikelihoodsthatanEAchangesfromacurrent populationtoanyother

possible population after applying the stochastic effects of selection, crossover

and/or mutation. It is also possible to consider other effects such as noisy fitness

functions (Nakama, 2008), niching (Horn, 1993) and elitism (He and Yao, 2002).

Once the transition matrix is calculated it can be used to calculate a variety of

measurements, suchasthefirsthittingtimeofaparticularstateortheprobability

of hitting a state at all. An overview of tools of Markov chain analysis can be

found inany general textbook onstochastic processes, such as thebooks of Norris

(1998); Doob (1953).

The drawback of modeling EAs with Markov chains is that the size of the

required transition matrix grows exponentially in both the population size and

string length. To keep Markov chain models manageable it is therefore common

to use small population sizes and string lengths (Goldberg and Segrest, 1987;


4.3. THEORETICAL ANALYSIS 103

Horn, 1993). Other options, which allow the modeling of more realistic EAs, are

to make simplifying assumptions about the state space (Mahfoud, 1991) or to

use matrix notation only (Vose and Liepins, 1991; Nix and Vose, 1992; Davis and

Principe, 1993).

4.3.2 Modeling ERCs with Markov models

In this section we derive the transition probabilities for EAs optimizing in the

presence of periodic ERCs. Our Markov chain model is based on the model

of Goldberg and Segrest (1987), which considers a simple environment composed

of two individual types: Type A has always a fixed objective value (or fitness) of

f(A), while type B has a fitness of f(B). This limitation allows for an intuitive

definition of states. Forafixed populationsize ofµ, there areµ+1possible states,

where state S a represents a population with a type A individuals and µ−a type

B individuals. Furthermore, in this simple EA model we do not apply mutation

and crossover such that an offspring shall be simply a copy of the selected parent.

Goldberg and Segrest (1987) used this model to investigate the effect of drift

for a simple EA that used a generational reproduction scheme combined with

fitness proportionate selection. They also extended the model to include mutation.

Horn (1993) extended it further to include niching. We extend it to include

periodic ERCs and use the resulting model to analyze the impact of the ERC on

two selection strategies, fitness proportionate and binary tournament selection,

and two reproduction schemes, generational and steady state reproduction, both

without elitism.

4.3.2.1 Selection probabilities

Under fitness proportionate selection (FPS)wechooseanindividual ofthecurrent

population to serve as a parent (inour environment, to be inthe next population)

with a probability that is proportional to its (relative) fitness. In our simple environment,

the probability of choosing a type A individual for the next population

while being in a state S a is simply

P a (A) =

af(A)

af(A)+(µ−a)f(B) . (4.4)


104 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

As there are only two individuals types in total, the probability of choosing a

type B individual is P a (B) = 1−P a (A). From the above equation it is apparent

that once a uniform population is reached, i.e. a = 0 or µ, there is no chance of

selecting individuals from the other type. Thus, the two corresponding states S 0

and S µ are absorbing states.

Under tournament selection we first randomly select a number of individuals

from the population (with replacement) and then perform a tournament among

them with the fittest one serving subsequently as a parent. It is common to use

a tournament size of two, which will also be used here; this selection strategy is

known as binary tournament selection (BTS). The result of a tournament is clear:

the individual with the higher fitness wins the tournament; there is a draw if an

individualmeetsanotherindividual withthesamefitnessinwhichcasethewinner

is randomly determined; and an individual will be the winner of a tournament

with itself. We distinguish two cases regarding the fitness of the individual types:

(i) f(A) = f(B) and (ii) f(A) > f(B). For case (i) the probability of selecting a

type A individual is

( ) 2 a

P a (A) = + a(µ−a) , (4.5)

µ µ 2

where the first term is the probability of selecting two type A individuals, and

the second term the probability of selecting one type A and B individual (which

is 2 a(µ−a) ) accounted for by the fact that the type A individual will win the

µ 2

tournament on average in half of the cases only (i.e. 2 a(µ−a) /2 = a(µ−a) ). If

µ 2 µ 2

f(A) > f(B), then a type A individual will win all tournaments it is involved in,

resulting in following selection probability:

4.3.2.2 Transition probabilities

( ) 2 a

P a (A) = +2 a(µ−a) . (4.6)

µ µ 2

In our environment, the transition probabilities depend on the selected reproduction

scheme, which in turn depends on the selected selection strategy. We

first consider a generational reproduction scheme as already used in the original

genetic algorithm of Holland (1975); we denote this scheme by GGA. With

GGA, the entire current population is replaced by the offspring population. That

is, µ selection steps are carried out per time step (with replacement). Using the

selection probability P a (A) either for FPS or BTS, the transition probabilities


4.3. THEORETICAL ANALYSIS 105

p a,a ′ = P(X t+1 = S a ′|X t = S a ) for GGA of moving at time t from a state S a with

a type A individuals, to a state S a ′ with a ′ type A individuals at time t+1, are

defined as follows:

For a = 0

p a,a = 1 (4.7)

p a,a ′ = 0, a ′ = 1,...,µ.

For 0 < a < µ and 0 ≤ a ′ ≤ µ

p a,a ′ =

( µ

a ′ )

P a (A) a′ (1−P ′ a(A)) µ−a′ .

For a = µ

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

With steady state reproduction, the population is updated after each selection

step. Usually, an offspring individual replaces the worst individual in the population.

This replacement strategy, however, is elitist and ensures that the number

of the less fit individual type in the population does not increase. Thus, to allow

for a fair comparison with GGA, an offspring does not replace the worst individual

in the population but a randomly chosen one regardless of its fitness; we

denote this reproduction scheme by SSGA (rri), where ‘rri’ refers to replacing a

random individual. It has been shown elsewhere (Syswerda, 1991) that GGA and

SSGA (rri) yield similar performance. Bearing in mind that one time step corresponds

to one selection step with SSGA (rri), we obtain the following transition

probabilities:

For a = 0

p a,a = 1 (4.8)

p a,a ′ = 0, a ′ = 1,...,µ.


106 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

For 0 < a < µ

p a,a ′

= 0, a ′ = 0,...,a−2

p a,a−1 = (1−P a (A)) a µ

p a,a = P a (A) a µ +(1−P a(A)) µ−a

µ

p a,a+1 = P a (A) (µ−a)

µ

p a,a ′

= 0, a ′ = a+2,...,µ.

For a = µ

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

The transition probabilities of either GGA or SSGA (rri) will be the entries

of the transition matrix P.

4.3.2.3 Constrained transition probabilities for a periodic ERC

We have mentioned in the previous section that GGA performs µ selection steps

per time step, while SSGA (rri) performs one selection step per time step. To be

able to compare the effect of an ERC on the two reproduction schemes, we thus

express ERCs in this section in terms of selection steps rather than time steps.

Let us now derive the transition probabilities in the presence of a periodic

ERC. For this, consider the general periodic ERC, perERC(iµ,(i+1)µ,k,µ,H =

(A)) (i ∈ N,0 ≤ k ≤ µ), which is activated at selection step iµ for a period of µ

selection steps, i.e. one time step (or generation) for GGA and µ time steps for

SSGA (rri). During the activation period of k ≤ µ selection steps, we can only

select (and evaluate) type A individuals. Let us assume that if we select a type

B individual during this period, this individual is repaired by simply forcing it

into the right schema; i.e. it is converted into a type A individual. This repairing

procedure is a simple constraint-handling strategy for dealing with non-evaluable

solutions; alternative constraint-handling strategies will beintroduced in thenext

chapter. Before we derive the constrained transition probabilities for GGA we

want to point out a few aspects:


4.3. THEORETICAL ANALYSIS 107

• If we are in state S 0 and the ERC is activated, then S 0 is not an absorbing

state anymore and we move directly to state S k .

• As a population contains at least k type A individuals after lifting the

constraint, we are not able to move to a state S a ′ with a ′ < k during the

constrained generation (time step).

• The ERCreduces thenumber offreely selected offspring downto µ ′ = µ−k.

• Moving to astate S a ′ witha ′ > k isalready achieved by selecting a ′′ = a ′ −k

(instead of a ′ ) type A individuals from the current population.

Considering these points, we derive for the time step for which the ERC is activated

the following constrained transition probabilities for GGA:

For a = 0

p a,a ′ = 0, a ′ = 0,...,k −1,k +1,...,µ (4.9)

p a,k = 1.

For 0 < a < µ and 0 ≤ a < k

p a,a ′ = 0

For 0 < a < µ and k ≤ a ′ ≤ µ

For a = µ

p a,a ′ =

( µ


a ′′ )

P a (A) a′′ (1−P a (A)) µ′ −a ′′ .

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

The above periodic ERC is set such that the activation period of k selection

steps is upper bounded by the population size µ, and, in the case of GGA,

starts and ends within a single time step (generation). This does not need to

be necessarily the case. In fact, a periodic ERC can feature an activation period

k that is so long that it constrains selection steps within two or more successive

generations, or so short that several activation periods may start during a


108 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

single generation. In such scenarios, one needs to constrain all generations that

are subject to constrained selection steps. The number of constrained selection

steps within a generation, referred to as k in Equation (4.9), is then simply the

sum of all selection steps that happen to be constrained during any particular

generation. That is, depending on the ERC, the number of constrained selection

steps may change between generations.

With SSGA (rri), the population is updated after each selection step, which

remember is a single time step with this scheme. This means that we need to

determine for each selection step (time step) separately whether it lies within the

activation period and thus is constrained or not. During the activation period,

the periodic ERC of above prevents us from moving from a current state S a

to a state S a−1 , which can only be reached if a type B individual replaces a

type A individual. As above, if the constraint is active, then the state S 0 is not

an absorbing state anymore, and we move directly to state S 1 . We obtain the

following new transition probabilities for each of the k constrained time steps:

For any a = 0

p a,a ′ = 0, a ′ = 0,2,3,...,µ (4.10)

p a,1 = 1.

For any 0 < a < µ

p a,a ′

= 0, a ′ = 0,...,a−1

p a,a = a µ

p a,a+1 = µ−a

µ

p a,a ′

= 0, a ′ = a+2,...,µ.

For any a = µ

p a,a ′ = 0, a ′ = 0,...,µ−1

p a,a = 1.

We will denote the transition matrix with the constrained transition probabilities

by P c .


4.3. THEORETICAL ANALYSIS 109

4.3.2.4 Calculating proportions of individual types in a population

One way to analyze the impact of an ERC on different selection and reproduction

schemes is to monitor the proportion of the two individual types in a population.

To do so one needs to first calculate the probability of ending up in any of the

possible states S i ,i = 0,...,µ after t time steps. In an unconstrained environment,

this can be done according to Equation (4.3) (see Section 4.3.1) using the

transition matrix P derived in Section 4.3.2.2; in this equation, the µ+1 state

probabilities at time t are represented in form of the probability vector ⃗u t . In

a constrained environment we cannot use the transition matrix P across all t

time steps but have to swap it with the constrained transition matrix P c (see

Section 4.3.2.3) for time steps that consist of constrained selection steps; this dependence

of the transition matrix on time makes it a non-homogeneous Markov

chain. Let us consider the same periodic ERC as in the previous section but this

time with a constraint time frame spanning over g ∈ N periods (as opposed to

exactly one), i.e. perERC(iµ,(i+g)µ,k,µ,H = (A)). For this periodic ERC we

can calculate the probability vector at any time step t for GGA as follows:

⃗u t = ⃗u 0 P t , 0 ≤ t < i,

⃗u t = ⃗u 0 P i P t−i

c ,

i ≤ t < g +i,

⃗u t = ⃗u 0 P i P g cP t−g−i , g +i ≤ t,

where the entries of the transition matrices P and P c are calculated using Equations

(4.7) and (4.9), respectively. The probability vector ⃗u 0 of the initial state

distribution has a value of 1 at the sth entry and a value of 0 in the others, if we

want to start with a population of exactly s type A individuals.

One time step with GGA corresponds to µ time steps with SSGA (rri). To

compute the probability vector ⃗u for SSGA (rri) we thus need to look at the state

distributions at time step tµ:

⃗u tµ = ⃗u 0 P tµ , 0 ≤ t < i,

⃗u tµ = ⃗u 0 P iµ (P k c Pµ−k ) (t−i)µ , i ≤ t < g +i

⃗u tµ = ⃗u 0 P iµ (P k cP µ−k ) g P (t−g−i)µ , g +i ≤ t,

where the transition matrices P and P c are calculated according to the Equations

(4.8) and (4.10), respectively.


110 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Having obtained the probabilities of ending up in all the different states,

we can calculate the expected proportions c t (A) and c t (B) of type A and B

individuals in a population at time step t (or tµ in the case of SSGA (rri)) as

follows (note that this calculation does not depend on whether a problem is

subject to ERCs or not):

c t (A) = 1 µ

µ∑

ju j t , c t (B) = 1−c t (A), (4.11)

j=0

where u j t is the jth entry of the probability vector ⃗u t .

4.3.3 Simulation results

This section uses the measure of the expected individual type proportion to analyze

the impact of period ERCs on two selection strategies, FPS and BTS, and

two reproductionschemes, GGAandSSGA(rri). Weconsider first thecasewhere

both individual types have equal fitness values, and then the case where they are

different. If not otherwise stated, the population size is set to µ = 50.

4.3.3.1 Identical fitness values: f(A) = f(B)

Inthiscasethereisnoselection pressure andthusbothselectionstrategies behave

identically. Ideally, an EA maintains an equal proportion of the two individual

types in the population. However, because of genetic drift this is impossible and

an EA eventually converges to a uniform population (S i = 0,n). As the probability

of ending up in one of the two states is proportional to the initial state, the

expected individual type proportion is identical to the initial proportion, which

is specified by ⃗u 0 . Thus, for a random initialization, the expected proportion is

0.5.

FromFigure4.7wecanseethatanexpected proportionof0.5isachieved until

selectionstep400atwhichweactivatetheperiodicERC,perERC(400,450,20,50,

H = (A)), which has a unique activation period of k = 20 selection steps. 7 This

ERC forces us to evaluate k = 20 type A individuals and subsequently, reduces

(increases) the proportion of type B (A) individuals in the population. After the

7 Note, in an EA performing optimization of a function, the number of performed selection

steps displayed on the x-axes of Figure 4.7 would be equivalent to the number of performed

function evaluations.


4.3. THEORETICAL ANALYSIS 111

perERC(400,450,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA - constrained, real proportion

GGA - constrained, expected proportion

SSGA (rri) - constrained, real proportion

SSGA (rri) - constrained, expected proportion

0 200 400 600 800 1000 1200 1400

#Selection steps

Figure 4.7: A plot showing the proportion of type B individuals c t (B) for GGA

and SSGA (rri) as a function of the number of selection steps for the ERC

perERC(400,450,20,50, H = (A)). Both individual types have equal fitness and the

constraint settings used are given above the plot. The terms real and expected refer to

proportions obtained by actually running the EA, respectively, by running the Markov

chain. The EA results are averaged across 500 independent runs.

ERC is lifted at selection step 420, the expected individual type proportion does

not get back to the initial proportion. Although this effect can be put down to

the specifics of the model (no selection pressure towards either individual type),

we will see in the following theoretical and experimental studies, several results

which display a similar pattern. That is, a constraint can have a permanent or

long-lived effect on search performance even if it was active for a short time only.

From the figure we can also see that the proportion is affected more severely

for GGA than for SSGA (rri). The reason that SSGA (rri) is more robust is

that with this reproduction scheme there is a chance that an offspring of type A

replaces another type A individual that is currently in the population. Of course,

if an offspring replaces a solution of the same type, then this will not affect the

proportion. By contrast, with GGA, all offspring are carried over to the population

of the next generation. This causes the proportion of type B individuals in

the population to be a linear function of the activation period. This effect is also

apparent from Figure 4.8, where the performance of both reproduction schemes

is shown as a function of the activation period k. From the figure one can see

that SSGA (rri) is able to maintain a proportion of around c t (B) = 0.2 after an

activation period of k = 50, which is equal to the population size. On the other

hand, GGAcannot maintain a single type B individual in the population because

of its linear dependence on k. Note, in the case where k > 50, the constraint is

activated for more than one time step when using GGA. For example, for k = 70


112 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

Proportion of type B individuals c t (B)

0.5

0.4

0.3

0.2

0.1

perERC(50,200,k,150,H=(A))

GGA

SSGA (rri)

0

0 25 50 75 100 125 150

Activation period k

Figure 4.8: A plot showing the proportion of type B individuals c t (B) for GGA and

SSGA (rri) at selection step 200 as a function of the activation period k for the ERC

perERC(50,200,k,150, H = (A)). Both individual types have equal fitness.

the constraint restricts all 50 selection steps within one time step and 20 selection

steps within the subsequent one.

AstheMarkovchainresultsareexactwewillomittheexperimentallyobtained

proportions in the following plots.

4.3.3.2 Different fitness values: f(A) ≠ f(B)

When both individual types have different fitness values, the aim of an EA is to

converge as quickly as possible to a population state consisting only of the fitter

individual type. We focus our investigations mainly on the more interesting case

where an ERC has a negative effect on the convergence behavior. Hence, the

fitness of the individual type that we have to select during the activation period,

in our case type A, needs to be lower than the fitness of type B individuals. If

not otherwise stated, the fitness values are set to f(A) = 1.0 and f(B) = 1.3,

meaning there is little selection pressure towards selecting the fitter individual

type (which slows down convergence speed); later we will investigate the impact

of the fitness ratio between type A and B individuals in more detail.

As the basis for our analysis we use the periodic ERC perERC(50,400,20,50,

H = (A)). This ERC is activated after the initialization (i.e. at selection or

evaluation step 50) for seven periods, each consisting of P = 50 selection steps

whereby k = 20 of them are constrained. Figure 4.9 shows the impact of the periodic

ERC on the expected proportion c t (B) for all combinations of the selection

and reproduction schemes: GGA with FPS and SSGA (rri) with FPS (top plot),


4.3. THEORETICAL ANALYSIS 113

perERC(50,400,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with FPS, unconstrained

SSGA (rri) with FPS, unconstrained

GGA with FPS

SSGA (rri) with FPS

0 200 400 600 800 1000 1200 1400

#Selection steps

perERC(50,400,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with BTS, unconstrained

SSGA (rri) with BTS, unconstrained

GGA with BTS

SSGA (rri) with BTS

0 200 400 600 800 1000 1200 1400

#Selection steps

Figure 4.9: Plots showing the proportion of type B individuals c t (B) for FPS (top)

and BTS (bottom) as a function of the number of selection steps for the ERC

perERC(50,400,20,50, H = (A)). The term unconstrained refers to the proportions

obtained in an ERC-free environment.

and GGA with BTS and SSGA (rri) with BTS (bottom plot). 8

We want to point out that during activation periods, SSGA (rri) with BTS

and FPS perform identically, since independently of selection type, anAoffspring

will replace an individual selected at random. But during the inactive periods,

the stronger selection pressure of BTS recovers more of the B-to-A replacements,

so that overall BTS maintains a higher proportion of Bs. This behavior can be

seen in the zigzag shape, where there is the same steep fall off of fitness in both

methods, but a steeper recovery for BTS. Overall, the same is true for GGA,

(BTS is better for the same reason) but it is not possible to see this so clearly in

the plots.

Figures 4.10 to 4.12 indicate how the proportion of type B individuals is

8 We get the zigzag-shaped line for SSGA (rri) during the constraint time frame because

c t (B) is plotted after each time step containing here of one selection step. For GGA the change

in c t (B) is smooth because a time step consist of µ selection steps.


114 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

start start

perERC(t ctf ,tctf +350,20,50,H=(A))

perERC(50,400,k,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with BTS

0.2

GGA with FPS

SSGA (rri) with BTS

0.1

SSGA (rri) with FPS 0

0 200 400 600 800 1000

start

Start of constraint time frame t ctf

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

GGA with BTS

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0 10 20 30 40 50

Activation period k

Figure 4.10: Plots showing the proportion of type B individuals c t (B) at selection

step 1500 as a function of the start of the constraint time frame t start

ctf

(left) and the

activation period k (right) for the ERCs perERC(t start

ctf

,t start

ctf

+ 350,20,50, H = (A))

and perERC(50,400,k,50, H = (A)), respectively.

affected when altering the constraint parameters. We can observe that:

• Longer activation periods degrade the performance of all EAs (see right

plot of Figure 4.10).

• Fixing theconstraint time frameduration, but translating it (see left plot of

Figure 4.10), yields a non-monotonic effect on performance (of all EAs, but

most apparently with FPS): more preparation time gives more time to fill

the population with fit individuals, whereas little recovery time detriments

final fitness. These two effects trade off against each other.

• Increasing the duration of the constraint time frame (see Figure 4.11) degrades

performance.

• Changing the fitness ratio (see Figure 4.12) has only a switching effect on

BTS (when the fitter individual changes), but for FPS the ratio smoothly

affects final proportion up to a saturation point.

• Overall, comparing GGA with SSGA we see that SSGA achieves the higher

proportion of fit individuals during the constraint time frame, and it recovers

more rapidly after the constraint is lifted, but its rate of recovery

does not reach the rate achieved by GGA, and ultimately GGA reaches a

higher proportion (see Figure 4.8 and 4.9). This can be explained by the

replacement strategy of SSGA (rri): offspring may replace individuals in

the population that are from the same type. During the activation period,


4.3. THEORETICAL ANALYSIS 115

end

perERC(0,t ctf ,20,50,H=(A))

Proportion of type B individuals c t (B)

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with BTS

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0 200 400 600 800 1000 1200 1400

end

End of constraint time frame t ctf

Figure 4.11: A plot showing the proportion of type B individuals c t (B) at selection

step 1500 as a function of the end of the constraint time frame t end

ctf

for the ERC

perERC(0,t end

ctf

,20,50, H = (A)).

this is beneficial as the number of poor type A individuals in the population

does not increase linearly with the activation period. However, during

the unconstrained selection steps, this may be disruptive in the sense that

fit type B offspring may replace other type B individuals of the current

population, which slows down the convergence.

4.3.4 Summary of theoretical study

We used Markov chains to analyze the impact of periodic ERCs for a simple

environment and EA model. The environment was composed of only two individual

types and the EA model applied only a selection operator. In the EA

modelweconsideredtwoselectionstrategies, FPSandBTS,andtworeproduction

schemes, GGA and SSGA (rri). We observed that for one and the same reproduction

scheme, BTS is more robust than FPS due to its independence to the

fitness value of the individual types. However, FPS was able to match and even

outperform the performance of BTS if the ratio of the individual type fitnesses

was high, i.e. if alarger selection pressure thanfor BTSwas obtained. The crucial

difference between the two reproduction schemes we considered is that GGA carries

out many selection steps before the population is updated, while SSGA (rri),

or steady statereproduction ingeneral, carries out onlyasingle one. This enables

SSGA (rri) during the activation periods to replace less fit individuals with other

less fit individuals of the current population, but also prevents SSGA (rri) on

the long run from a quicker convergence in the remaining periods. By contrast,

the performance of GGA depends linearly on the activation period but there are


116 CHAPTER 4. ERCS: DEFINITIONS AND CONCEPTS

perERC(50,400,20,50,H=(A))

perERC(50,550,25,50,H=(A))

1

1

Proportion of type B individuals c t (B)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Proportion of type B individuals c t (B)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

GGA with BTS

0.2

GGA with FPS

SSGA (rri) with BTS

0.1

SSGA (rri) with FPS 0

GGA with BTS

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Fitness ratio f(B)/f(A)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Fitness ratio f(B)/f(A)

Figure 4.12: Plots showing the proportion of type B individuals c t (B) at selection step

1500 as a function of the fitness ratio f(A)/f(B) for the ERCs perERC(50,400,20,50,

H = (A)) (left) and perERC(50,550,25,50, H = (A)) (right).

now drawbacks if the ERC is not activated. This crucial difference between the

reproduction schemes means that SSGA (rri) is able to outperform GGA during

the activation period and in situations where the advantage over GGA gained

in the activation period(s) can be maintained until the next activation period or

until the end of the optimization. In terms of the constraint parameters, this

occurs when there is a long activation period, a short recovery period, and the

constraint time frame is set late.

4.4 Chapter summary

Ephemeral resource-constrained optimization problems (ERCOPs) are problems

where feasible solutions can be temporarily non-evaluable due to a lack of resources

required for their evaluation. In the first part of this chapter, we formally

defined such problems, and introduced several types of real-world ephemeral resource

constraints (ERCs).

In the second part of the chapter, we have conducted an initial theoretical

analysis on the effect of one of the ERC types, periodic ERCs, on two selection

andtwo reproductionschemes commonly-used withinEAs. Weemployed thetheory

of Markov chains to model a simple optimization environment, and yielded

two main conclusions: (i) binary tournament selection tends to be more robust

andallowsforaquicker convergencetowardsabetterpopulationstatethanfitness

proportionate selection, and (ii) a non-elitist steady state reproduction scheme

tends to be more robust than a generational reproduction scheme during periods


4.4. CHAPTER SUMMARY 117

where the ERC is active, while the opposite is the case during inactive periods

of the ERC. Although a larger theoretical investigation is necessary to make

definite conclusions regarding other ERC types and more complex optimization

environments, our findings indicate that the choice of algorithm setup is important

to obtain optimal performance. Moreover, the analysis indicated that the

optimal algorithm setups depends on the parameter settings of an ERC. 9 While

this chapter has dealt with basic EAs only, the next chapter proposes and empirically

analyzes the performance of a number of constraint-handling strategies

(augmented on basic EAs) for coping with the different ERC types introduced in

this chapter.

9 In (Allmendinger and Knowles, 2010)we have empirically shown that the findings gained in

our theoretical investigation can be valid also for more complex optimization environments, EA

models, and other ERC types. We do not include this particularempirical analysis in this thesis

but instead focus on the development and empirical analysis of different constraint-handling

strategies.


Chapter 5

Strategies for Coping with

Ephemeral Resource Constraints

In the previous chapter we formally defined ephemeral resource-constrained optimization

problems (ERCOPs), and introduced several types of ephemeral resource

constraints (ERCs). We also presented a theoretical study of the effect of

one of the ERC types, periodic ERCs, on the choice of simple EA configurations,

and observed that algorithm setup is crucial to obtain optimal performance. In

this chapter we continue our analysis of ERCOPs using an empirical approach.

Our objective is to develop and empirically analyze the performance of various

strategies for coping with the ERCs introduced in the previous chapter. We first

proposeandinvestigate static constraint-handling strategies fordealing with nonevaluable

solutions in Section 5.1. We then investigate whether learning online

and offline when to switch between these static strategies during an optimization

process is more efficient than using the static strategies themselves (Section 5.2).

Finally, we develop online resource-purchasing strategies for coping with commitment

composite ERCs in Section 5.3.

5.1 Static constraint-handling strategies

In this section, we introduce and analyze five static constraint-handling strategies

for dealing with non-evaluable solutions in an ERCOP. We first introduce

the strategies in Section 5.1.1. Then we outline the experimental setup as used

in the analysis of the strategies in Section 5.1.2, and conduct the analysis itself

in Section 5.1.3. The analysis will consider commitment relaxation ERCs and

118


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 119

X

⃗x 1

⃗x 2

⃗x 3

⃗x t−1

⃗x t

⃗x t,repaired

⃗x optimal

E t

⃗x local optimal

Figure 5.1: An illustration of how the targeted search direction of an optimizer (indicated

by the arrow starting at ⃗x t ), which might be towards an optimal solution ⃗x optimal ,

may change due to ERCs (which define the evaluable search space E t ). The reason is

that repairing a solution causes a kind of ‘drift’ effect that alters the search direction,

which, depending on the fitness landscape, may be towards a local optimum; the

solutions ⃗x 1 to ⃗x t−1 indicate the search history.

periodic ERCs (both ERC types were introduced in Section 4.2.2 and 4.2.3, respectively)

but the strategies are applicable (in similar form) also to other ERC

types. In Section 5.1.4 we present a case study that illustrates one way in which

one might select a suitable strategy for an ERCOP with largely unknown search

space properties, which is a common situation in the real world. In Section 5.1.5

we draw together the findings from the experimental analyses.

5.1.1 Specific static constraint-handling strategies

This section introduces five constraint-handling strategies for dealing with nonevaluable

solutions in an ERCOP. Three of the strategies (forcing, regenerating,

and the subpopulation strategy) actually apply repairing (i.e. modify the genotype

of a solution) and two (waiting and penalizing) avoid it in some way in order

to prevent drift-like effects in the search direction; see Figure 5.1 for a graphical

illustration of how repairing can trigger a drift in the search direction.

In the description of the strategies we assume that multiple ERCs of a particular

ERC type (commitment relaxation or periodic ERC) with non-overlapping

(or non-contradictory) constraint schemata H i ,i = 1,...,r, may be activated at a

given time step. That is, there is always an evaluable solution, or ∃⃗x ∈ ⋃ i H i.

We also assume that we know which resources are available and thus that the

schemata H i are known to the optimizer.

1. Forcing: This strategy simply forces a non-evaluable solution ⃗x into the

constraint schemata H i of all activated ERCs. In other words, all bits that do


120 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

not match the order-defining bit values of the schemata H i of all activated ERCs

are flipped, and the solution so obtained is returned for evaluation. We employed

this strategy in the theoretical analysis presented in Section 4.3 (to force type B

individuals into type A individuals during activation periods of a periodic ERC).

Repairing strategies of this kind have been used previously, particularly, in the

optimization of constrained combinatorial problems (see Section 3.3).

A potential drawback of this strategy is that enforcing a change in the values

of some solution bits may destroy a potentially good combination of bit values

between the changed and the remaining bits present in the original solution.

Later, we will investigate this aspect more closely.

2. Regenerating: The aim of this strategy, which is similar to the death penalty

approach (see Section 3.3) often used in the evolution strategies community, is

to overcome the potential drawback of forcing. In fact, as the name of the strategysuggests,

uponencountering anon-evaluablesolution, regenerating iteratively

generates new solutions from the empirical distribution of the current offspring

population (i.e. it generates new offspring from the current parent set) until it

generates one that is evaluable, i.e. falls into the schemata H i of all activated

ERCs, or until L trials have passed without success. In the latter case, we select

the solution, generated within the L trials, that is closest to the schemata H i of

all activated ERCs and apply forcing to it. Here, closest refers to the solution

with the smallest sum of Hamming distances to the schemata H i of all activated

ERCs; 1 ties between several equally-closest solutions are broken randomly. Thus

the method always returns an evaluable solution (except in the ‘deadlock’ situation

where multiple ERCs with overlapping H i are activated simultaneously, in

which case no solution is evaluable).

A potential drawback of this strategy is that for large L it can be computationally

and/or resource expensive if checking of constraints is costly, while for

small L, it could be that it reduces often to the forcing strategy.

3. Subpopulation strategy: Let us assume there is only one ERC, i.e. r = 1.

In this case, alongside the actual population, we maintain also a subpopulation

SP of maximum size J that contains the fittest solutions from H 1 evaluated so

far. A non-evaluable solution is then dealt with by generating a new solution

based on this subpopulation. If the maximum population size of SP, J, is not

1 Notice that the Hamming distance between a solution⃗x and a schemaH is calculated based

on the order-defining bits of H only (as the non-order-defining bits can be either 0 or 1-bits).


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 121

reached, then a new solution from H 1 is generated at random, otherwise we apply

one selection and variation step using the same algorithm as the one we augment

the constraint-handling strategies on; if the new solution is non-evaluable, which

may happen due to mutation, we apply forcing to it. To update the subpopulation

upon evaluating a solution from H 1 we use a steady state or (J + 1)-ES

reproduction scheme. We use this reproduction scheme because, depending on

the ERC, the number of evaluated solutions from H 1 might be small, in which

case a generational reproduction scheme is likely to result in a slow convergence.

Generally, a drawback of the subpopulation strategy is that if we have more

than one ERC, i.e. r > 1, then the number of subpopulations we need is upperbounded

by 2 r , the power set of the total number of ERCs. With multiple ERCs,

we generate a solution using the subpopulation that is defined by the (set of)

schemata H i of all activated ERCs.

4. Waiting: This strategy does not repair but it waits with the evaluation of

a non-evaluable solution and the generation of new solutions until the activation

periods of all ERCs that are violated by the solution have passed; i.e. the

optimization freezes. The freezing period is bridged by submitting as many null

solutionsasrequireduntil thesolutionbecomesevaluable. 2 Theeffect ofthiswaiting

strategy is identical to the way Schwefel (1968) handled unavailable conical

rings in his flashing nozzle design problem (see preface of Section 1).

The advantage and disadvantage of waiting is obvious: it should prevent driftlike

effects in the search direction caused by ERCs, but the waiting might result

in a smaller number of solutions being evaluated (this depends upon whether

time is a limiting factor).

5. Penalizing: Like waiting, this strategy does not repair. However, instead

of freezing the optimization, a non-evaluable solution is penalized by assigning a

poor objective value c to it. The effect is that evaluated solutions coexist with

non-evaluated ones in the same population. However, due to selection pressure

in parental and environmental selection, non-evaluated solutions are likely to be

discarded as time goes by. As we will use the strategy within an elitist EA, and

2 We want to point out that in the implementation of this strategy we use here we do not

submit null solutions to bridge the freezing period. Instead, we bridge the freezing period by

setting the (global) time counter t directly to the end of the longest activation period of all

currently violated ERCs. The effect of this is identical to submitting null solutions, but this

approachallowsforacleanerpresentationof the EAand the function wrapperin Algorithm5.1;

in this algorithm, the bridging of the freezing period is performed in Line 25.


122 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

because we use a c that is the minimal fitness in the search space, a non-evaluated

solution will never be inserted in a population (that is filled with previously

evaluated solutions) in the first place.

This kind of penalizing strategy has been used by Schwefel, and Reynolds

and Corne to cope with execution errors occurring in simulations (we mentioned

this application in the preface of Section 1). But the approach is also generally

popular in the genetic algorithm (GA) community; here, this penalizing approach

can be seen as a static penalty function method (see Section 3.3).

The advantage of penalizing over waiting is that the optimization does not

freeze upon encountering a non-evaluable solution; i.e. the solution generation

processcontinuesandthussolutionsmightactuallybeevaluated(withoutneeding

to penalize them) during anactivationperiod. However, since solutions evaluated

during an activation period will have to fall into the schemata H i of all currently

activated ERCs, penalizing might be subject to drift-like effects, thus potentially

losing the advantage of waiting.

Figure 5.2 visualizes how the strategies forcing, regenerating, and the subpopulation

strategy may repair a non-evaluable solution.

5.1.2 Experimental setup

This section describes the test functions f, the EA on which we augment the

different constraint-handling strategies, and the parameter settings as used in the

subsequent experimental analysis, which investigates the impact of commitment

relaxation and periodic ERCs.

5.1.2.1 Search algorithm

We augment the different static constraint-handling strategies on a standard EA.

The EA uses a standard (µ + λ)-ES reproduction scheme, an elitist approach,

which we believe would be generally applicable in this domain. The algorithm

uses binary tournament selection (with replacement) for parental selection, which

has shown to be a robust operator in the theoretical study (see Section 4.3.3).

The algorithmalso uses uniformcrossover (Syswerda, 1989)andbit flipmutation.

Algorithm 5.1 shows the pseudocode of the EA including the function wrapper

as employed with the different static constraint-handling strategies. Additional

performance-enhancing mechanisms commonly used in EAs, such as diversity


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 123

Offspring

Regenerating

Forcing

X

⃗x t

⃗x t,repaired

Member of Pop

⃗x t,repaired

⃗x t,repaired

E t

Member of SP

Member of both

Pop and SP

Offspring individual and the

potential repaired versions of it

Subpopulation strategy

Figure 5.2: A depiction of the current population Pop (filled circles and squares) and

an offspring individual ⃗x t , which is feasible but not evaluable (because it is in X but

not in E t ). Solutions indicated by the filled squares coexist in both the actual EA

population Pop and the population SP maintained by the subpopulation strategy. The

three solutions ⃗x t,repaired indicate repaired solutions that might have resulted after applying

one of the three ‘repairing’ strategies to ⃗x t : while forcing simply flips incorrectly

set bits of ⃗x t and thus creates a repaired solution that is as close as possible to ⃗x t but

not necessarily fit, regenerating creates a new solution in E t using the genetic material

available in Pop. Similarly, the subpopulation strategy creates also a new solution

but it uses the genetic material available in the subpopulation SP (empty and filled

squares), which contains only solutions from E t .

preservation techniques or adaptive parameter control (see Section 3.1.2), may

affect the results, but are not considered in this particular study. Similarly, since

we want to investigate the performance of a standard EA, the algorithm does not

check whether a currently non-evaluable solution has been evaluated previously,

i.e. identical solutions may be evaluated multiple times.

5.1.2.2 Test functions f

Sinceouraiminthisstudyistounderstandtheeffect ofERCsonEAperformance

on real closed-loop problems (ultimately), it might be considered ideal to use,

for testing, some set of real-world ERCOPs, that is: real experimental problems

featuring real resource constraints. That way we could see theeffects ofEAdesign

choices (the constraint-handling strategy used) directly on a real-world problem

of interest. But even granting this to be an ideal approach, it would be very

difficult to achieve in practice due to the inherent cost of conducting closedloop

experiments and the difficulty of repeating them to obtain any statistical


124 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Algorithm 5.1 Generational EA with static constraint-handling strategies

Require: ERC 1 ,...,ERC r (set of ERCs), f (objective function), T (time limit), µ (parent populationsize),λ(offspringpopulationsize),

#Strategy(numberofselectedconstraint-handling

strategy; see Section 5.1.1)

1: t = 0 (global variable representing the current time step), Pop = ∅ (current population),

OffPop = ∅ (offspring population)

2: while |Pop| < µ ∧ t < T do

3: generate solution ⃗x at random

4: ⃗x = functionWrapper(⃗x,t)

5: Pop = Pop∪{⃗x}; t++

6: while t < T do

7: OffPop = ∅

8: repeat

9: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

10: OffPop = OffPop∪{⃗x (1) }∪{⃗x (2) }

11: until |OffPop| = λ

12: for i = 0 to |OffPop| do

13: if t < T then

14: ⃗x i = functionWrapper(⃗x i ,t) // ⃗x i represents the ith solution of OffPop

15: t++

16: form new Pop by selecting the best µ solutions from the union population Pop ∪ OffPop

17: functionWrapper(⃗x,t){

18: y t = null

19: if ⃗x satisfies the ERCs ERC 1 ,...,ERC r then

20: ⃗x t = ⃗x; y t = f(⃗x t )

21: else

22: if #Strategy = 1∨#Strategy = 2∨#Strategy = 3 then

23: ⃗x t = repair(⃗x,t); y t = f(⃗x t )

24: if #Strategy = 4 then

25: t = t+δ; ⃗x t = ⃗x // δ is the number of time steps we have to wait until the activation

periods of all ERCs that are currently violated by ⃗x have passed

26: if t < T then

27: y t = f(⃗x t )

28: if #Strategy = 5 then

29: ⃗x t = ⃗x; y t = c // c is a constant, representing poor fitness

30: return ⃗x t and y t }

confidence inresultsseen. Forthisreason, mostofourstudywill usemorefamiliar

artificial test problems augmented with ERCs (although in the case study, in

Section 5.1.4, we do use data and constraints from a real closed-loop problem.)

The use of test functions to understand aspects of EA performance has long

been a staple of EA empirical studies. It is a good approach generally, and appropriate

also here, because problem features can be controlled and experiments

can be repeated many times, allowing a thorough analysis of results. Our set of

selected test functions comprises: (i) OneMax, (ii) a competing optima problem,


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 125

TwoMax, and (iii) a problem instance with many local optima, MAX-SAT. In

the subsequent case study we will also see a variant of NK landscapes (NKα

landscapes) being used to tune (offline) our method; we will introduce this test

function here too. The use of these different landscape types should be sufficient

to draw some tentative conclusions about the effects of ERCOPs generally,

depending on results observed.

OneMax: The OneMax function is a bit counting function that is used frequently

in the theoretical analysis of GAs (see e.g. Mühlenbein and Schlierkamp-

Voosen (1993); He and Yao (2002)). For a binary solution vector ⃗x ∈ {0,1} l , it

takes the sum over the bit values of all bit positions

maximize f(⃗x) =

l∑

i=1

x i

and has its optimum f = l for the bit string consisting only of 1-bits. Although

the OneMax problem features a unimodal landscape and so is supposed to be relatively

easy to solve for anEA, it helps us to gaindetailed insights into things like

how ERCs affect the convergence speed and may cause premature convergence.

As we will see in the experimental study, these insights form the foundation for

understanding the impact of ERCs on more complex fitness landscapes.

TwoMax: The TwoMax function is a bimodal function that can be seen as the

multimodal counterpart of OneMax. It has two local optimal solutions: solutions

consisting only of 1-bits and 0-bits. In its original version, the slopes leading

to these two solutions are symmetric. As the OneMax function, this function

has been intensively studied in theoretical works on the analysis of GAs (see

e.g.PelikanandGoldberg(2000);Hoyweghenetal.(2002)).TomaketheTwoMax

function more realistic and interesting, its symmetric property has often been

broken (see e.g. Pelikan and Goldberg (2000); He and Yao (2002); Friedrich et al.

(2008)), turning it into a function with a local and a global optimal solution.

The two common modifications are to change either the steepness of either slope

or to simply increase the fitness value of either optimal solution. We opt for the

former, making the slope leading to the solution consisting only of 1-bits steeper.

Hence, if #1s denotes the number of 1-bits in a solution vector ⃗x and b > 1 the

factor by which the global optimal solution shall be fitter than the local optimal


126 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

bl

Fitness

l

l/2

0 l/2 l

Figure 5.3: A TwoMax function with a local optimal solution at #1s = 0 and a global

optimal solution at #1s = l. The parameter b > 1 specifies the factor by which the

global optimal solution shall be fitter than the local optimal solution.

#1s

solution, then the TwoMax function is defined by


⎨l−#1s if #1s≤ l

maximize f(⃗x) =

, 2

⎩b#1s

otherwise.

Figure 5.3 illustrates this TwoMax function. Our main reason for using an asymmetric

version of the TwoMax function is to understand how much drift, coming

from an ERC, is required to cause an optimizer that uses a specific strategy to

climb up the suboptimal slope. Analyzing this property will help us to understand

better the impact of ERCs on more challenging multimodal test problems,

such as the MAX-SAT problem.

MAX-SAT: Given a boolean expression consisting of a set of clauses, which

in turn are formed by binary variables x i ,i = 1,...,l, the maximum satisfiability

(MAX-SAT) problem (Zhang, 2001; Coppersmith et al., 2004) asks for the maximum

number of clauses that can be satisfied by any assignment ⃗x. A MAX-SAT

problem is the optimization version of the decision problem, satisfiability (SAT),

which only asks whether there is a satisfying assignment at all or not. MAX-SAT

and SAT problems are of great theoretical interest, of course, due to SAT being

the original problem proved NP-Complete by Cook, and still at the heart of

attempts to determine the answer to the P vs NP problem. But it is of high practical

relevance too, as many challenging real-world optimization problems can be

efficiently formulated in SAT form (De Jong and Spears, 1989), e.g. hardware

and software verification problems, and routing in FPGAs.

The instance considered is a uniform random 3-SAT problem and can be


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 127

downloaded online. 3 The instance has l = 50 variables and 218 clauses and is

satisfiable. We treat this 3-SAT instance as a MAX-SAT optimization problem,

with fitness calculated as the proportion of satisfied clauses.

Note, although there are local search algorithms for SAT or MAX-SAT problems,

such as greedy SAT (Selman et al., 1992; Dechter, 2003), that can solve

small problem instances easily, for optimizers that do not explicitly make use of

the structure of the problem, such as standard EAs, the instances are generally

hard to solve (due to the presence of many local optimal solutions) (Zhang, 2001;

Prügel-Bennett, 2004; Qasem and Prügel-Bennett, 2010).

Apart from the relevance of MAX-SAT problems in theoretical and practical

analysis, another reason for choosing this test problem is the convenient property

of a backbone (the backbone of a MAX-SAT instance is the schema into which all

the optimal solutions fall). We also conducted experiments on other challenging

multimodal test problems, such as NKα landscapes (Kauffman, 1989), which we

introduce next.

NKα landscapes: The general idea of the NKα landscapes (Hebbron et al.,

2008) is to extend Kauffman’s original NK model (Kauffman, 1989) to model

epistatic network topologies that are more realistic in mapping the epistatic connectivity

between genes in natural genomes. The NKα model achieves this by

affecting the distribution of influences of genes in the network in terms of their

connectivity, through a preferential attachment scheme. The model uses a parameter

α to control the positive feedback in the preferential attachment so that

larger α result in a more non-uniform distribution of gene connectivity. There are

three tunable parameters involved in the generation of an NKα landscape: the

total number of variables N (in our notation this variable is denoted as l), the

number of variables that interact epistatically at each of the N loci, K, and the

model parameter α that allows us to specify how influential some variables may

be compared to others. Larger values of the parameter K yield more epistatic

(rugged) landscapes, while larger values of α assign more influence (on the fitness)

to a minority of variables; 4 for a value of α = 0, the NKα model reduces to

3 http://people.cs.ubc.ca/~hoos/SATLIB/benchm.html; the name of the instance is

“uf50-218/uf50-01.cnf”.

4 We can think of the loci as nodes in a network with the number of edges going into a node

and out of it representing the in-degree and out-degree of a node, respectively. Each node has

the same in-degree of K +1 (including the self-connection) but the out-degree approximates a

power-law (with the parameter α being in the exponent) as a consequence of the preferential

attachment scheme.


128 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Table 5.1: EA parameter settings as used in the study of static constraint-handling

strategies.

Parameter

Setting

Parent population size µ 50

Offspring population size λ 50

Per-bit mutation probability 1/l

Crossover probability 0.7

Table 5.2: Parameter settings of static constraint-handling strategies.

Strategy Parameter Setting

Regenerating

Penalizing

Subpopulation

strategy

Number of regeneration

trials L

Fitness c assigned to

non-evaluable solutions

Maximal size of

subpopulation SP, J

10000

0

30

Kauffman’soriginalNK modelwithneighborsbeingselectedatrandom. TheNK

model has already been used previously to analyze certain aspects of real-world

closed-loop problems (see e.g. Thompson (1996b, 1997)).

5.1.2.3 Parameter settings

The parameter settings of the EA and the strategies are given in Table 5.1

and 5.2, respectively. The settings of the test functions are outlined in Table 5.3.

We choose these search space sizes l as they correspond to typical search space

sizes we have seen (e.g. a drug combinations problem with a library of about

30 drugs (Small et al., 2011)). The reason for setting the scaling factor b of the

TwoMax function so small is that it makes the problem more challenging due to

the low selection pressure to climb up the optimal slope. Regarding the optimization

time T, remember that one time step corresponds to a function evaluation

of a single solution or a single submitted null solution. In an ERC-free search,

the optimization times as specified in Table 5.3 are usually not sufficient for an

optimizer to locate a global optimal solution reliably. The reason we set T this

way is that it gives us the opportunity to assess both positive and negative effects


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 129

Table 5.3: Parameter settings of test functions f as used in the study of static

constraint-handling strategies.

Test function Parameter Setting

OneMax

TwoMax

MAX-SAT

NKα landscapes

Solution parameters l 30

Optimization time T 700

Solution parameters l 30

Scaling factor b 1.1

Optimization time T 700

Solution parameters l 50

Optimization time T 800

Solution parameters N = l 15

Neighbors K {2,6}

Model parameter α {0,2}

Optimization time T 2250

of an ERC on the convergence speed and the solution quality obtained at the end

of an algorithm run. In addition to the parameters above, we analyze the impact

of the preparation and recovery time by considering different settings for T than

specified in Table 5.3, but this will be pointed out where applicable.

Any results shown are average results across 500 independent algorithm runs.

To allow forpaired testing ofthe strategies, we use a different seed for therandom

number generator for each EA run but the same seeds for all strategies.

5.1.3 Experimental results

The performance of a strategy depends inter alia on the potential impact of an

ERC on the population diversity and the optimization direction. To assess the

impact on these two factors one has to consider: (i) what genetic material represented

by a constraint schema H needs to be introduced into a population to

cause a performance impact, (ii) how much of it, or, rather, how many individuals

of a constraint schema need to be introduced into a population to cause a performance

impact, (iii) at what stage during a run does it need to be introduced to

yield a performance impact, and (iv) the effects of the preparation and recovery

durations. We give detailed observations on these effects here, and summarise

the key findings in Section 5.1.5.


130 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

5.1.3.1 Commitment relaxation ERCs

We first analyze the case where a constraint schema H represents poor genetic

material. For OneMax and TwoMax, this means that the order-defining bits of

H are set to 0. For the MAX-SAT instance, we represent a poor bit by flipping

a randomly selected bit from the backbone of our instance 5 ; for ease of presentation,

also on this problem, we will write 1-bits to refer to ‘good’ bits, which are

randomly selected unflipped bits from the backbone, and 0-bits to refer to their

complements. 6

Figure 5.4 gives an impression of how the final average best solution fitness is

affected for the different strategies on OneMax. The results obtained on TwoMax

and the MAX-SAT instance are very similar and are shown in Appendix A.1. For

ease of presentation we normalize the fitness values of all test functions so that

they lie in the range [0,1]. We make the following observations from the figure.

• Generally, the ERCs affect search negatively, and strategy choice is important.

• The subpopulation strategy tends to perform better than forcing and regenerating

(which perform similarly) for the majority of constraint parameters.

Thereasonisthatthesubpopulationstrategygeneratesfittersolutions

from H and thus allows the EA to converge more quickly to a (suboptimal)

population state containing many (copies of) optimal solutions from H.

The subpopulation performs poorly for constraint settings that can cause

a premature convergence towards search regions covered by H (see range

4 ≤ o(H) ≤ 8 in the top left plot).

• With respect to the order of the constraint schema o(H) (top left plot), we

observe that there is a value that has the largest negative effect on the optimization;

it is around 4 for the ‘repairing’ strategies (forcing, regenerating

or the subpopulation strategy), and lower for penalizing and waiting.

5 We identified the backboneofourMAX-SAT instancefrom optimal solutionsobtained from

running a generational GA for 1000 generations, 500 times independently. The backbone is of

order 44.

6 If not otherwise stated, then the order-defining bits of a constraint schema are chosen at

random for each algorithm run; if an order-defining bit is a 1-bit, then the position of this bit

is also chosen at random among the order-defining bits; i.e. a constraint schema denoted by

H = (010∗∗∗∗) might actually be e.g. H = (∗∗1∗0∗0) or H = (∗∗00∗∗1) in an algorithm

run. Nevertheless, all strategies will be optimizing subject to the same constraint schemata.


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 131

Average best solution fitness

1

0.95

0.9

commRelaxERC(0,700,15,H=(0 o(H) ***...))

Forcing

0.85

Regenerating

Waiting

Subpopulation strategy

Penalizing

0.8

0 2 4 6 8 10 12 14

Order of constraint schema H, o(H)

Average best solution fitness

commRelaxERC(0,700,V,H=(00***...))

1

0.98

0.96

0.94

0.92

0.9

0.88

0.86

0.84

0 5 10 15 20

Epoch duration V

1

commRelaxERC(0,700,15,H=(00***...))

start start start

commRelaxERC(t ctf ,tctf +700,15,H=(00***...)), T = tctf +700

1

Average best solution fitness

0.98

0.96

0.94

0.92

Average best solution fitness

0.98

0.96

0.94

0.92

0.9

700 800 900 1000 1100 1200

Optimization time T

0.9

0 100 200 300 400 500

start

Start of constraint time frame t ctf

Figure 5.4: Plots showing the average best solution fitness found and its standard error

on OneMax as a function of the order of the constraint schema o(H) (top left), the

epoch duration V (top right), the optimization time T (bottom left), and the start of

the constraint time frame t start

ctf

(bottom right). Note, while the optimization time in

the top plots is fixed to T = 700, as specified in Table 5.3, the parameter T varies in

the bottom plots.

The non-monotonic performance impact on the repairing strategies, and

partially on penalizing, is due to two competing forces: (i) the probability

of activating a constraint, which decreases exponentially with o(H), and

(ii) the probability that a constraint activation causes a shift in the search

focus, which is greater for low orders. With respect to these two forces, an

order of o(H) ≈ 4 tends to have the worst trade-off. Penalizing performs

better than the repairing strategies in the range 2 < o(H) < 8 because

the probability of having to penalize solutions increases exponentially with

o(H). We remark that (results not shown) the order for which the worst

trade-off is obtained is a function of the string length l and the population

size µ. In general, the worst trade-off shifts to only a slightly higher order

than 4 as l and/or µ increase; the shift is only little because the probability

of activating the ERC decreases exponentially with the order.


132 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Forwaiting, theperformanceonlydependsontheprobabilityofactivatinga

constraint, causing the performance to be poorest at o(H) = 1 and improve

exponentially thereafter.

• Longer epoch durations (larger V) degrade performance of all methods because

of potentially longer activation periods during epochs (see top right

plot). With penalizing, forcing, regenerating, and the subpopulation strategy

a saturation point is reached beyond which further increases in the

epoch duration have no effect. With waiting there is no saturation point

because an increase in V results in longer waiting periods and thus a poorer

performance. The reason that waiting performs best for small V (see range

0 < V ≤ 14) is that the waiting periods are short in this regime, allowing

an optimizer to converge quickly away from search regions covered by H

and prevent future constraint activations.

• When providing some recovery time, all strategies improve in performance

(see bottom left plot). The recovery speed depends on how much time is

required to introduce first diversity among the previously constrained bits

before one can generate better solutions.

• With later start times of the constraint time frame, or, equivalently, longer

preparation times, there is a positive effect on the performance of all strategies

(see bottom right plot). This is because with a commitment relaxation

ERC the later in the optimization one is, the less likely it is to enter a poor

schema (a schema not on the optimization path) and activate a constraint;

also, due to elitism, repaired solutions are less likely to be inserted into the

population the later a constraint is activated. Thus later constraints are

less disruptive.

Let us now analyze how constraint schemata that represent genetic material

of different qualities affect the performance obtained with theconstraint-handling

strategies. Figure5.5shows theeffect ofdifferent constraint schemata ontheaverage

best solution fitness (left plots) and the number of times the optimal solution

has been evaluated (right plots) for forcing (top plots) and waiting (bottomplots)

on OneMax. The plots on the left-hand side indicate the effect of an ERC on the

convergence speed towards an optimal solution.

Although ERCs can have large effects on performance, one can see from the

plots of Figure 5.5 that the majority of the constraint schemata do not have an


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 133

#1s in H

20

15

10

5

0

Forcing - Average best solution fitness

Picking schemata

at random

0 5 10 15 20

1

0.99

0.98

0.97

0.96

0.95

0.94

0.93

0.92

0.91

0.9

#1s in H

20

15

10

5

0

Forcing - Average number of times the

optimal solution has been evaluated

Picking schemata

at random

0 5 10 15 20

90

80

70

60

50

40

30

20

10

0

Order of constraint schema H, o(H)

Order of constraint schema H, o(H)

#1s in H

20

15

10

5

0

Waiting - Average best solution fitness

Picking schemata

at random

0 5 10 15 20

1

0.98

0.96

0.94

0.92

0.9

0.88

0.86

0.84

0.82

0.8

#1s in H

20

15

10

5

0

Waiting - Average number of times the

optimal solution has been evaluated

Picking schemata

at random

0 5 10 15 20

8

7

6

5

4

3

2

1

0

Order of constraint schema H, o(H)

Order of constraint schema H, o(H)

Figure 5.5: Plots showing the average best solution fitness obtained (left) and the

average number of times the optimal solutions has been evaluated during the optimization

(which is an indication of the convergence speed) (right) by forcing (top)

and waiting (bottom) on OneMax as a function of the order of the constraint

schema o(H), and the number of order-defining bits in H with value 1 for the ERC

commRelaxERC(0,700,15,H). The straight line represents the expected performance

when picking a schema (i.e. the order-defining bits and their values) with a particular

order at random. The performance obtained in an unconstrained environment is

represented by the square at o(H) = #1s = 0.

impact on the performance at all compared to the unconstrained performance

(which is represented by the square at o(H) = #1s = 0). These are schemata

that are unlikely to cause an activation at all because they either do not lie on

an optimizer’s search path (schemata with few 1-bits) or are associated with a

generally low probability of being met by any individual (higher order schemata


134 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Waiting )

20

15

10

5

0

Picking schemata

at random

0 5 10 15 20

Order of constraint schema H, o(H)

10 4

10 2

10 0

10 -2

10 -4

10 -6

10 -8

10 -10

10 -12

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy )

20

10 2

10 1

Picking schemata

15 at random

10 0

10 -1

10

5

0

0 5 10 15 20

Order of constraint schema H, o(H)

10 -2

10 -3

10 -4

10 -5

10 -6

10 -7

Figure 5.6: Plots showing the ratio P(f(x) > f Penalizing )/P(f(x) > f Waiting ) (left) and

P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy ) (right) on OneMax as a function of the

order of the constraint schema o(H), and the number of order-defining bits in H with

value 1 for the ERC commRelaxERC(0,700,15,H); here, x is a random variable that

represents the best solution from a set of solutions drawn uniformly at random from

the search space and f ∗ the average best solution fitness obtained with strategy ∗. If

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a higher average

best solution fitness than strategy ∗ and a greater advantage of ∗∗ is indicated by a

darker shading in the heat maps; similarly, if P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is

better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

around the straight line). Constraint schemata that represent poor genetic material

(i.e. consist of many 0-bits) have only an impact if their order is low because

anoptimizer is searching in a different direction. Hence, constraint schemata that

have a significant effect on the performance of both strategies are either of low

order or contain many 1-bits (schemata along and near the diagonal). For these

constraint setting regimes, we observe a similar non-monotonic effect on the performance

of both strategies as we have seen previously for schemata representing

poor genetic material only (indicated here by the row of squares with #1s = 0).

The difference is that the more 1-bits there are in H (i.e. as we go up the rows of

squares), the less apparent becomes this non-monotonic (negative) performance

effect. From Figure 5.6 it is apparent that the performance differences between

the different strategies observed previously is also maintained largely (as we go

up the rows of squares).

On the MAX-SAT instance, the range of constraint schemata causing a performance

impact is smaller (see Figure 5.7). From the plot it is apparent that

while again low-order schemata affect the performance significantly, higher-order


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 135

#1s in H

Subpop. strategy - Average best solution fitness

20

0.982

0.981

Picking schemata

0.98

15 at random

0.979

10

5

0

0 5 10 15 20

Order of constraint schema H, o(H)

Figure 5.7: A plot showing the average best solution fitness obtained by the subpopulation

strategy on a MAX-SAT problem instance as a function of the order of the

constraint schema o(H), and the number of order-defining bits in H set correctly for

the ERC commRelaxERC(0,800,15,H). The straight line represents the expected

performance when picking a schema (i.e. the order-defining bits and their values) with

a particular order at random.

0.978

0.977

0.976

0.975

0.974

0.973

0.972

schemata that represent near-optimal or optimal genetic material have only a

little or no effect; the reason is that good genetic material is difficult to detect on

this challenging problem, particularly within 800 time steps.

5.1.3.2 Periodic ERCs

With the insights we gained about the strategies when applying them to commitment

relaxation ERCs, we can understand their behavior in the presence of a

second type of ERC, periodic ERCs, more easily.

Figure 5.8 shows how the performance of the different strategies is affected

by various constraint parameters of a periodic ERC on OneMax when H represents

poor genetic material. Again, the results obtained on TwoMax and the

MAX-SAT instance are similar and are shown in Appendix A.2. In comparison

to commitment relaxation ERCs, the main difference we observe from Figure 5.8

is that waiting performs poorly and is also clearly dominated by penalizing for

the majority of constraint settings. This is due to the fact that activation periods

are set deterministically with periodic ERCs. In essence, waiting is likely to

freeze the optimization during each activation period because of the low probability

of generating solutions from H (regardless of the quality of the genetic

material represented by H). With penalizing one is also unlikely to evaluate any

solutions during an activation period. However, the fact that the optimization is


136 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Average best solution fitness

1

0.99

0.98

0.97

0.96

0.95

0.94

perERC(0,700,20,50,H=(0 o(H) ***...))

Forcing

Regenerating

Waiting

Subpopulation strategy

Penalizing

Average best solution fitness

1

0.95

0.9

0.85

0.8

0.75

0.7

perERC(0,700,k,50,H=(0000***...))

0.93

0 2 4 6 8 10 12 14

Order of constraint schema H, o(H)

0.65

0 5 10 15 20 25 30 35 40 45

Activation period k

1

perERC(0,700,20,50,H=(0000***...))

start start start

perERC(t ctf ,tctf +700,20,50,H=(0000***...)), T = tctf +700

1

Average best solution fitness

0.99

0.98

0.97

0.96

0.95

0.94

Average best solution fitness

0.99

0.98

0.97

0.96

0.95

0.94

0.93

700 800 900 1000 1100 1200

Optimization time T

0.93

0 100 200 300 400 500

start

Start of constraint time frame t ctf

Figure 5.8: Plots showing the average best solution fitness found and its standard error

on OneMax as a function of the order of the constraint schema o(H) (top left), the

activation period k (top right), the optimization time T (bottom left), and the start of

the constraint time frame t start

ctf

(bottom right).

not frozen is beneficial because offspring are generated using a more up-to-date

parent population during unconstrained optimization periods.

The fact that activation periods are set deterministically with periodic ERCs

means also that high-order constraint schemata have an impact on the performance

and this is the case regardless of the genetic material they represent. In

fact, from Figure 5.9 we see that the average best solution fitness obtained with

the subpopulation strategy on OneMax decreases rather smoothly for all orders

as the quality of the represented genetic material worsens. Comparing this average

best solution fitness with the fitness obtained by penalizing (left plot of

Figure 5.10), we observe that repairing is particularly beneficial for high-order

constraint schemata that represent very good genetic material. Waiting, in turn,

is inferior to penalizing across all the different constraint schemata but in particular

for low-order schemata representing very good genetic material (see right

plot of Figure 5.10).


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 137

Subpop. strategy - Average best solution fitness

20

1

#1s in H

15

10

5

0

Picking schemata

at random

0 5 10 15 20

Order of constraint schema H, o(H)

0.99

0.98

0.97

0.96

0.95

0.94

Figure 5.9: A plot showing the average best solution fitness obtained by the subpopulation

strategy on OneMax as a function of the order of the constraint schema

o(H), and the number of order-defining bits in H with value 1 for the ERC

perERC(0,700,20,50,H). Thestraight line represents the expected performancewhen

picking a schema (i.e. the order-defining bits and their values) with a particular order

at random.

On the MAX-SAT instance, results not shown here, one makes similar observations

as on OneMax. However, because of the difficulty of finding good genetic

material, even when setting a small number of bits correctly, a smooth decrease

in the average best solution fitness, and differences between the performance of

strategies, aremoreobviousforschemata ofhigherorders. ComparedtoOneMax,

there are also small differences apparent in the results obtained on TwoMax; we

show the results and discuss these differences in Appendix A.2.

5.1.4 Case study

In this section we demonstrate one way in which an appropriate constrainthandling

strategy for an ERCOP may be selected in a real-world application.

For this we use the same experimental setup as used in the instrument configuration

application of O’Hagan et al. (2005, 2007). We mentioned this application

in our review of closed-loop problems in Section 2.2, but give a more detailed

description of the problem here.

Application description: The application is concerned with detecting as many

metabolites in a biological sample as possible by optimizing the configuration of

the instrument the samples are run through. Methods currently in use to analyze


138 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy )

20

10 5

15

10

5

Picking schemata

at random

10 4

10 3

10 2

10 1

10 0

#1s in H

Ratio P(f(x) > f Penalizing )/P(f(x) > f Waiting )

20

15

10

5

Picking schemata

at random

10 0

10 -1

10 -2

10 -3

10 -4

0

0 5 10 15 20

Order of constraint schema H, o(H)

10 -1

0

0 5 10 15 20

Order of constraint schema H, o(H)

10 -5

Figure 5.10: Plots showing the ratio P(f(x) > f Penalizing )/P(f(x) > f Subpop. strategy )

(left) and P(f(x) > f Penalizing )/P(f(x) > f Waiting ) (right) on OneMax as a function of

the order of the constraint schema o(H), and the number of order-defining bits in H

with value 1 for the ERC perERC(0,700,20,50,H); here, x is a random variable that

represents the best solution from a set of solutions drawn uniformly at random from

the search space and f ∗ the average best solution fitness obtained with strategy ∗. If

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a higher average

best solution fitness than strategy ∗ and a greater advantage of ∗∗ is indicated by a

darker shading in the heat maps; similarly, if P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is

better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

samples are ones that employ a separation step coupled to mass spectrometric

detection. As the separation method, gas chromatography is considered as a

highly resolving technique and the method of choice. The challenge with this

approach, however, is that many instrumental parameters are involved in the

process and that small changes in these parameters may have significant effects

on the chromatographic performance. Variables or instrumental parameters represent

things like the sample volume, split ratio, oven temperatures, ramp rates,

and other settings of a gas chromatograph; in this application, we have l = 15

integer variables or instrument parameters making up a search space of 7.32×10 9

different instrument configurations. The fitness of a configuration is determined

by running a biological sample throughit, and measuring properties of the instrument’soutputsignal,

suchasthenumber ofpeaks(ormetabolites) detectedinthe

chromatogram, the signal-to-noise ratio of the peaks, and the run time. O’Hagan

et al. (2005, 2007) cast this application as a multi-objective optimization problem

but here we consider only one objective, namely the number of peaks detected.

Ingeneral, whendealingwithcomplex andsensitive instruments, suchasmass

spectrometer/gas chromatography equipment, the stability of the instrument and


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 139

the results can be affected when changing instrument parameters across wide

ranges. In particular, if the oven temperature of the instrument is set low in one

experiment, then it isexpected that some metabolites remain stuck to the column

(a tube in which the sample is passed through solvent in order to separate the

elements within the sample) and may elute from it in a later experiment where

the temperature is set higher unless the column is cleaned or replaced between

experiments (Dunn, March 2011). To avoid this, the column should ideally be

cleaned or replaced whenever an instrument configuration with a low oven temperature

is tested. However, usually, cleaning or replacing of instrument parts is

done overnight because it incurs additional costs in time and money when done

after every analysis run. Hence, using a low oven temperature enforces a temporary

commitment to this temperature for the rest of the day, or, equivalently,

activates a commitment relaxation ERC; as the application of O’Hagan et al.

(2005, 2007) makes use of two ovens, the optimization is subject to two commitment

relaxation ERCs. In the particular application of O’Hagan et al. (2005,

2007), however, the experimentalists avoided the ERCs by setting a smaller parameter

range of the oven temperature than would have been ideal to optimize

this instrument parameter (Dunn, March 2011). We want to investigate how the

performance might have been affected in the presence of the two ERCs.

ERCs: To keep things simple, we assume that the maximal number of instrument

configurations that can be tested on a day is fixed at V = 15, and the

total number of days available for the optimization is 150, resulting in T =

15×150 = 2250 available time steps or fitness evaluations. We set the first two

variables to represent the oven temperatures, and the value 0 to represent the low

temperatures. Hence, we have the following two commitment relaxation ERCs:

commRelaxERC(0,2250,15,H = (0 ∗ ∗ ∗ ...)) and commRelaxERC(0,2250,15,

H = (∗0∗∗∗...)). Little is known of the fitness landscape before optimization

begins but, as in (O’Hagan et al., 2007), it would be expected that there is some

degree of epistasis in the problem. We would not know which of the two schemata

represent good or poor instrument configurations.

Offline testing: As algorithm designers, we are now faced with the challenge

to select an optimization algorithm or constraint-handling strategy for the above

described ERCOP. The common approach is to first design appropriate problem

functions that simulate the problem at hand, and then to test several algorithms

offline on these functions and use the best one for the real-world problem. In this


140 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Average best solution fitness

Average best solution fitness

0.86

0.84

0.82

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=2, α=0.0

Forcing

Regenerating

Waiting

Subpopulation strategy

Penalizing

0 500 1000 1500 2000

Time counter t

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=2, α=2.0

0.86

0.84

0.82

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

0 500 1000 1500 2000

Time counter t

Average best solution fitness

Average best solution fitness

0.8

0.78

0.76

0.74

0.72

0.7

0.68

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=6, α=0.0

0.66

0 500 1000 1500 2000

0.8

0.78

0.76

0.74

0.72

0.7

0.68

Time counter t

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...)), N=15, K=6, α=2.0

0.82

0.66

0 500 1000 1500 2000

Time counter t

Figure 5.11: Plots showing the average best solution fitness obtained on NKα landscapes

with N = 15 and K = 2,α = 0.0 (top left), K = 6,α = 0.0 (top right),

K = 2,α = 2.0 (bottom left), and K = 6,α = 2.0 (bottom right) as a function of

the time counter t; results are averaged over 500 independent runs using a different

randomly generated problem instance for each run. All instances were subject to the

two commitment relaxation ERCs commRelaxERC(0,2250,15,H = (0∗∗∗...)) and

commRelaxERC(0,2250,15,H = (∗0∗∗∗...)).

case study, we use NKα landscapes as the test problems because they allow us to

modeldifferentdegreesofepistasis. WeintroducedthisprobleminSection5.1.2.2,

andalsoprovidedthesettingsoftheproblemparametersN,K,andαinTable5.3.

The (four) selected settings will give us some insights into how the landscape

topology and the degree of epistasis may influence the performance. To cope

with the integer representation, we need to modify the mutation operator, which

shall now select a random setting from the set of possible ones for the instrument

parameter that is to be modified. Otherwise, we canuse the same algorithmsetup

as previously (see Algorithm 5.1).

Let us now analyze how the different constraint-handling strategies perform

on the four NKα landscapes. Figure 5.11 shows the average best solution fitness

obtained by the different strategies on the landscape models as a function of the


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 141

time counter (we do not show the standard error as it was negligible). The plots

confirm what we observed in the experimental study that similar patterns are

obtained for different landscapes. In fact, we observe that a repairing strategy

(forcing, regenerating, or the subpopulation strategy) should be clearly favoured

over a waiting or penalizing strategy. More precisely, a trend is apparent that the

subpopulation strategy and regenerating perform best in the initial stages of the

optimization, while forcing is slightly better in the final part of the optimization.

Also, the performance advantage of forcing at T = 2250 over the other two

repairing strategies tends to increase with K and/or α. The waiting strategy

does not perform well because the likelihood that either or both of the ERCs is

active is relatively high, causing the optimization to freeze for too long. Although

penalizingperformssignificantlybetterthanwaiting, theprobabilityofpenalizing

many solutions and thus making only little or no progress in the optimization is

too high to match the performance of the repairing strategies.

Based on these results, if a single strategy is to be chosen, then we would

select the strategy, forcing, for the real-world instrument optimization problem

as it performs best after 2250 time steps.

Testing on real-world landscape: Totest thischoice nowonthereal problem,

we cannot run it on the real closed-loop problem (and certainly we would not be

able to compare different policies). However, we are able to do the next best

thing. Since we have available the actual fitness values collected during the real

experimental trials reported by O’Hagan et al. (2005, 2007) (i.e. the number of

peaks detected) for around 315 instrument configurations tested, we can use this

datatoconstructaninterpolatedfitnesslandscapeusing, forexample, theKriging

approach (Cressie, 1993). 7 In comparison to the NKα landscapes considered for

offline testing, the interpolated landscape is smoother and contains significantly

fewer localoptima; 8 bothaspects areattributedto thelow number of data points.

Nevertheless, asit isapparent fromFigure5.12, theperformance ofthe strategies

on the interpolated landscapes is largely in alignment with the findings made

on the NKα landscapes. In particular, the repairing strategies tend to perform

7 In essence, Kriging is a technique that interpolates the fitness value of an unobserved data

point from observations of values of nearby data points. To generate the fitness landscape we

used a Kriging function Krig() from the fields package of the statistical software, R.

8 Toobtain this information, we followedthe experimentalmethods employingadaptive walks

already used by Hebbron et al. (2008).


142 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

5500

commRelaxERC(0,2250,15,H=(0***...)),

commRelaxERC(0,2250,15,H=(*0***...))

Average best solution fitness

5000

4500

4000

Forcing

Regenerating

3500

Waiting

Subpopulation strategy

Penalizing

3000

0 500 1000 1500 2000

Time counter t

Figure 5.12: A plot showing the average best solution fitness obtained on a

fitness landscape interpolated from real-world data as a function of the time

counter t. The optimization was subject to the two commitment relaxation ERCs

commRelaxERC(0,2250,15,H = (0 ∗∗∗...)) and commRelaxERC(0,2250,15,H =

(∗0∗∗∗...)).

better than waiting and penalizing, and, while the subpopulation strategy performs

best at the beginning of the optimization, all repairing strategies tend to

perform identically at the end of the optimization run. Clearly, in reality one is

usually able to perform only a single optimization run meaning that the result

might be different from the one we obtained from averaging over many runs.

Nevertheless, this case study demonstrates how one can approach and solve an

ERCOP beginning with the definition of the ERCs, modelling the simulated environment,

selection of appropriate test functions, and finally comparing different

optimizers and selecting the most suitable one to be used in the real-world application.

5.1.5 Summary and conclusion

In this study, we have proposed and analyzed various static constraint-handling

strategies for dealing with non-evaluable solutions arising in ERCOPs. We considered

two types of ERCs, commitment relaxation ERCs and periodic ERCs,

and four test functions, OneMax, TwoMax, a MAX-SAT problem instance, and

NKα landscapes. In addition, a demonstration of how one may approach and

solve a newERCOP inthecommon case where knowledge of thefitness landscape

is poor has been given in the form of a case study (see Section 5.1.4).

We made several key observations from the experimental analysis that may

determine how one should proceed with an ERCOP. Generally, ERCs affect the


5.1. STATIC CONSTRAINT-HANDLING STRATEGIES 143

performance of an optimizer, and clear patterns emerge relating ERC parameters

to performance effects. The later the constraint time frame of an ERC begins,

the less disruptive is the impact on search. This could mean that investment

in resources at early stages of the optimization should be preferred, where possible.

For commitment relaxation constraints, the probability of activating the

constraint is dependent on the order and the quality of the genetic material represented

by the constraint schemata. Thus, to some degree, we may be able to

predict the extent of impact if information about these schemata is available.

Although periodic constraints are activated at regular time intervals (independently

of the constraint schemata defining them), their impact on search still

depends upon the order and quality of the constraint schemata in predictable

ways. We also clearly see that the impact on EA performance is modulated by

the choice of constraint-handling strategy adopted. Which choice of strategy is

best is dependent on the details of the ERC, as we have set out in our results.

Importantly, we also observed that the patterns of performance impact seen

on the same ERC type are quite similar across different search problems with

different types of fitness landscape. Whereas, between the two ERCs, even on the

same problem, the impact on performance is quite different. If this pattern turns

out to be more generally true, then it is good news because we usually have more

knowledge about the ERCs than about the fitness landscape. Therefore we would

not need to be ‘right’ about the fitness landscape in order to choose the right

strategy. Nevertheless, as indicated in the case study, an a priori analysis of the

problem at hand can be beneficial when it comes to selecting a suitable strategy.

With respect to the impact of the individual ERC types, our analysis concluded

that with commitment relaxation ERCs, we would tentatively say that

repairing strategies should not be used (i.e. the genotype of a solution should not

be modified) if non-evaluable solutions are unlikely to be encountered, resources

are non-available for only short periods, or much recovery or optimization time

is available. With periodic ERCs, resources become non-available at regular time

intervals. Because of this, solutions are likely to be non-evaluable during activation

periods, which seems to cause strategies that do not repair to perform quite

poorly. An exception may be the situation where the available resources are poor

because, there, repairing solutions and inserting them into the population may

cause an EA to prematurely convergence to a suboptimal population state.


144 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

For both ERC types, in situations where it should be repaired, we can tentatively

suggest thatastrategythataimsatcreating fitrepairedsolutionsshouldbe

preferred over naïve forcing strategy. An exception may be again the case where

the available resources are poor. In the latter situation, the fitter a repaired solution

is, the more likely it is to be inserted into the population and thus cause

further constraint activations. Ultimately, this may increase the probability of

prematurely converging to a suboptimal population state.

Although we are able to draw these conclusions, our study has of course

been very limited, and there remains much else to learn about the effects of

ERCs and how to handle them. The next section takes a step further and investigates

learning-based strategies for switching between the static constrainthandling

strategies introduced here during an optimization process.

5.2 Learning-based constraint-handling strategies

The previous section provided evidence that it is possible to select a suitable

(static) constraint-handling strategy for anERCOP offline if the ERCs are known

in advance. Inspired by this observation, this section investigates whether an EA

that learns offline when to switch between the static constraint-handling strategies

during the optimization process is more efficient than the static strategies

themselves. Offline learning is performed by a reinforcement learning (RL) agent,

and this agent aims at optimizing the average fitness of the final population (reward)

by selecting the most suitable static strategy (which serve as the actions)

in a given state during the optimization. The next section introduces this offline

learning strategy in more detail. We also consider an EA that learns the same

switching task as the RL agent but online using a multi-armed bandit algorithm;

Section 5.2.2 will introduce this approach. In Section 5.2.3 we experimentally

compare the two learning-based strategies against the static constraint-handling

strategies themselves for commitment relaxation ERCs. Section 5.2.4 draws together

the findings from the experimental analyses.


5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 145

5.2.1 Offline learning-based strategy

To learn offline when to switch between static constraint-handling strategies during

an optimization run we use the tabular RL algorithm, Sarsa(λ) (see Algorithm

3.4 in Section 3.6). We employ Sarsa(λ) in combination with replacing

eligibility traces (see Equation (3.7)), and the ǫ-greedy action selection method.

Our learning task is episodic with each episode representing a single EA run.

We reset the eligibility trace to e = 0 at the beginning of each episode, meaning

we forget about the state-action pairs visited in the previous episode (see

Line 11 of Algorithm 5.2). This modification to Algorithm 3.4 resulted in a

slightly better performance in our case. To encourage exploration we use optimistic

initial values for the action-value estimates Q(s,a); we set all values to

Q(s,a) = 1.0,∀s ∈ S,a ∈ A(s) (1.0 is the maximal expected return in our case).

To characterize a state s we use the current population average fitness and the

current time step. The fitness values are normalized so that they lie in the range

[0;1], while the optimization time is limited by T. Each of the two variables

is binned into 5 equally-sized intervals, resulting in the intervals ( j

; ]

( ]

j+1

5 5 and

T·j

; T·(j+1) ,j = 0,1,2,3,4, respectively. That is, we have 25 states in total,

5 5

and5actions(thestaticconstraint-handlingstrategies)areavailableineachstate.

Our learning goal is to achieve a high population or individual fitness at

the end of the search. Consequently, the only reward we provide is the average

fitness of the population at the end of an episode; i.e. r i = 0 for i = 1,...,T, and

r T+1 ∈ [0,1]. Alternatively, the reward may be the best solution fitness found.

There are some further aspects related to the particular learning problem of

switching betweenconstraint-handlingstrategies. First, notethatanon-evaluable

solution may be encountered at different stages of the optimization process or

not at all in a single EA run. In other words, the number of states visited in an

episode, and the number of non-evaluable solutions encountered in a state, may

vary between episodes and states, respectively. In the case where a non-evaluable

solution is encountered, the first action selected in a particular state is applied

to all non-evaluable solutions encountered in this state (see Line 17 and 18 of

Algorithm 5.2). This selection approach tends to perform better than allowing an

optimizer to reselect actions when in one and the same state, because it is more

direct in terms of credit assignment.


146 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

5.2.2 Online learning-based strategy

To learn online when to switch between static constraint-handling strategies, we

use an adaptive operator selection method known as the dynamic multi-armed

bandit (D-MAB) algorithm (Hartland et al., 2006, 2007; Costa et al., 2008). 9

The idea of D-MAB is to consider the learning problem as a multi-armed bandit

problem with the static strategies serving as independent arms. D-MAB extends

the upper confidence bound 1 (UCB1) algorithm (Auer et al., 2002) with the

statistical Page-Hinkley (PH) test (Page, 1954), which has the purpose to detect

changes in the sequence of rewards obtained, and then to restart the multi-armed

bandit. The algorithm, D-MAB, is described in detail in Appendix B.

D-MAB requires that the play of an arm is followed by a subsequent reward.

We provide a reward immediately after the play of an arm, and it is the (normalized)

raw fitness of the resulting solution. Note that if the arm associated

with the strategy, waiting, is played, then the reward may be available only a

few time steps later. Also, in the case of penalizing, the reward will always be

some poor fitness value c. However, this does not mean that penalizing is never

selected because (i) UCB1 maintains always a certain degree of exploration, and

(ii) a restart of the multi-armed bandit triggered by the PH test puts all arms

in the same initial position. Alternative credit assignment schemes and UCB algorithms

were tested but the combination used in this chapter tends to perform,

on average, best and most robustly on the test problems considered.

We want to point out here that certain credit assignment schemes cannot be

readily applied when dealing with ERCs. For instance, a common scheme is to

use a credit based on the fitness improvement of an offspring compared to its

parent after applying a variation operator to it. In our scenario the parent would

be the individual that is to be repaired and the offspring the repaired individual

after applying a constraint-handling strategy to the parent. As we do not know

the fitness of the parent because it is non-evaluable, we cannot quantify by how

much its fitness differs from the one of the repaired individual.

9 In (Hartland et al., 2006, 2007), various versions of a D-MAB algorithm are proposed but

the version we use here is referred to as γ-restart.


5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 147

Table 5.4: Parameter settings of static and learning-based constraint-handling strategies.

Strategy Parameter Setting

Regenerating

Penalizing

Subpopulation

strategy

RL-EA

D-MAB

Number of regeneration

trials L

Fitness c assigned to

non-evaluable solutions

Maximal size of all

subpopulations SP h , J h

10000

0

25

Decay factor λ 1.0

Discount rate γ 1.0

Learning rate α 0.1

Probability ǫ of selecting

a random action

0.1

#Training episodes 5000

#Testing episodes 100

Threshold parameter λ PH 0.1

Tolerance parameter δ 0.01

Scaling factor C 1

5.2.3 Experimental analysis

Experimental setup: Here, we use a slightly different experimental setup

than we used in the analysis of the static constraint-handling strategies (see Section

5.1.3). The changes concern the reproduction scheme of the EA on which

we augment the constraint-handling strategies, and the maximal size of the subpopulation

SP,J, used within the subpopulation strategy. The reason for using

a modified setup is that we specifically tuned the EA and constraint-handling

strategies to perform well on the test problems considered.

Rather than using an elitist generational reproduction scheme, we employ an

EA with a steady state or (µ+1)-ES reproduction scheme. Algorithm 5.2 shows

the interplay between thestatic andlearning-based constraint-handling strategies

within this EA. The method currentState(Pop,t) (Line 17 and 18) called by the

RL-based approach returns the current state s t based on the state variables Pop

and t; the method giveAction(s t ,rand(0,1)) (Line 18) returns the action a t based

on the ǫ-greedy method.

The parameter settings of the EA are identical to the settings used in the

previous study (see Table 5.1) with the difference that λ = 1. Table 5.4 shows


148 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Algorithm 5.2 Steady-state EA with static and learning-based constrainthandling

strategies

Require: ERC 1 ,...,ERC r (set of ERCs), f (objective function), T (time limit), µ (parent population

size), offlineLearning (boolean variable indicating whether offline learning is applied

(true) or online learning (false))

1: t = 0 (global time counter), Pop = ∅ (current population), s t = −1 (variable representing

current state), #Strategy = -1 (variable indicating the number of the static constrainthandling

strategy currently in use)

2: while |Pop| < µ ∧ t < T do

3: generate solution ⃗x at random

4: ⃗x = functionWrapper(⃗x,t)

5: Pop = Pop∪{⃗x}; t++

6: while t < T do

7: generate two offspring by selecting two parents from Pop, recombining and mutating

them; pick one of two offspring ⃗x at random

8: ⃗x = functionWrapper(⃗x,t)

9: formnewPop byselectingthe best µsolutionsfromthe unionpopulationPop∪{⃗x}, t++

10: if offlineLearning then

11: set reward r T+1 = Pop and update the function Q(s,a) (see Line 8 of Algorithm 3.4);

clear trace e(s,a) = 0,∀s ∈ S,a ∈ A(s), and reset s t = -1 and #Strategy = −1

12: functionWrapper(⃗x,t){

13: y t = null

14: if ⃗x satisfies the ERCs ERC 1 ,...,ERC r then

15: ⃗x t = ⃗x; y t = f(⃗x t )

16: else

17: if offlineLearning ∧ s t ≠ currentState(Pop,t) then

18: s t = currentState(Pop,t); a t = #Strategy = giveAction(s t ,rand(0,1)); r t+1 = 0;

update trace e according to Equation (3.7) and the function Q(s,a) (see Line 8 of

Algorithm 3.4)

19: if !offlineLearning then

20: #Strategy = giveBestArm() // Select the best arm using the UCB1 algorithm

21: if #Strategy = 1∨#Strategy = 2∨#Strategy = 3 then

22: ⃗x t = repair(⃗x,t); y t = f(⃗x t )

23: if #Strategy = 4 then

24: t = t+τ; ⃗x t = ⃗x; y t = f(⃗x t ) // τ is the number of time steps we have to wait until

⃗x is evaluable

25: if #Strategy = 5 then

26: ⃗x t = ⃗x; y t = c // c is a constant, representing poor fitness

27: if !offlineLearning then

28: normalize y t and assign it as reward to arm j = #Strategy; update the variables

n j , ¯p j ,m j , and M j , and perform the PH test; if change-point detected, then set

n j , ¯p j ,m j , and M j (j = 1,...,J) to 0

29: return ⃗x t and y t }

the parameter settings of the constraint-handling strategies. For the RL-based

strategy, denoted in Table 5.4 by RL-EA, we use a training and testing scheme

(similar to Pettinger and Everson (2003)). In the training phase, the RL agent

estimatesthefunctionQ(s,a), while, inthetestingphase, theQ-functionisfrozen


5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 149

and the greedy actions a ∗ are always selected; i.e. during the testing phase we set

ǫ = 0. As specified in Table 5.4, the testing phase involves 100 episodes or EA

runs, and this is also the number of runs for which we run the other constrainthandling

strategies; i.e. any results shown are average results across 100 EA runs.

To allow for a fair comparison of the strategies, we use a different seed for the

random number generator for each EA run but the same seeds for all strategies.

We also use a different seed for each episode of the training phase of RL-EA.

Experimental results: Let us imagine that we are faced with a similar closedloop

scenario as in the case study of Section 5.1.4. Suppose our optimization

problem is binary and subject to the following two, a priori known, commitment

relaxation ERCs: commRelax- ERC(0,2000,20,H = (10101∗∗∗...)) and

commRelaxERC(0,2000,20,H = (∗... ∗ ∗101)). That is, one ERC constrains

the first 5 solution bits, while the other the last three bits. Solutions shall be

represented by a binary strings of length l = 30, and we set the optimization time

to T = 2000 time steps. As the test function f let us consider the standard NK

landscapes or, equivalently, NKα landscapes withα = 0(see Section5.1.2.2). For

the two tunable parameters of NK landscapes, N and K, we consider four different

settings for K, namely, K = 1,2,3, and 4, while N is fixed to N = l = 30.

These settings will allow us to cover reasonable degrees of epistasis.

Letusnowanalyzetheperformanceofthestaticandlearning-basedconstrainthandling

strategies on the different NK landscapes. Figure 5.13 shows the population

average fitness obtained with the different constraint-handling strategies

on the four NK landscapes as a function of the time counter. The average fitness

of the best solution in a population is in alignment with the population average

fitness. From the plots we can see that the ERCs affect the performance of an

EA. Also, the plots confirm what we mentioned before that similar patterns are

obtained for all the test functions. In fact, with respect to the static strategies,

we observe a trend that the subpopulation strategy performs best in the initial

stages of the optimization, forcing and penalizing in the middle part of the optimization,

and waiting in the final stages of the optimization. Also, the time step

at which waiting outperforms penalizing shifts further to the right as the degree

of epistasis increases. The behavior of the static constraint-handling strategies is

similar to the one we made in the case study of Section 5.1.4.

For RL-EA, the results in Figure 5.13 were obtained using a training phase

involving 5000 different NK landscapes with N = 30 and K = 2. After that


150 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=1

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=2

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=3

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=4

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Figure 5.13: Plots showing the population average fitness (we do not show the standard

error as it was negligible) obtained by the different constraint-handling strategies

on NK landscapes with N = 30 and K = 1 (top left), K = 2 (top right),

K = 3 (bottom left), and K = 4 (bottom right) as a function of the time counter

t; results are averaged over 100 independent runs using a different randomly generated

NK problem instance for each run. All instances were subject to the commitment

relaxation ERCs commRelaxERC(0,2000,20,H = (10101 ∗ ∗ ∗ ...)) and

commRelaxERC(0,2000,20,H = (∗...∗∗101)). Theresults of ‘Unconstrained EA’ were

obtained by running the EA on the same problem instances but without the ERCs.

training phase, the same frozen Q-function was used in the testing phase of all

four NK landscape types. Consequently, this allows us to assess the robustness

of the control policy learnt as training and testing are done in environments with

different properties (at least for the cases K ≠ 2). From the figure we can see

that RL-EA is the best performing strategy on all four NK landscape types at

T = 2000. For a low level of epistasis, RL-EA is even able to match almost the

performance of an EA optimizing in an ERC-free environment. We observe also

that the performance advantage over waiting, the second best strategy, tends to

increase with the degree of epistasis; when comparing the two strategies against

each other using more than 100 algorithmic runs, then, according to the Kruskal-

Wallis test (significance level of 5%), RL-EA is also significantly better than


5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 151

Population average fitness

commRelax(0,2000,20,H = (1010∗∗∗...)),

commRelax(0,2000,20,H = (∗...∗∗101)),N = 30,K = 2

1.0

Forcing

0.8

Regenerating

0.6

0.4

0.2

0

000 11100

000 11100

000 111

00 11 000 111 00

00 11

11

00 11

0001111

0000

00 11

1110000

1111

00 11

1110000

1111

00 11

00 11

00 11

00 11

00 11

00 11

00 11

00 11

0

00 11

00 11

00 11

00000

11111

400 800 1200 1600 2000

Time counter t

00 11

00 11 Waiting

000 111

00 11

00 11

000 111

00 11 Subpop. strategy

000 111

Penalizing

Unvisited state during

training phase

Figure 5.14: A plot showing the greedy actions a ∗ learnt by the RL agent for each state

s. Training was done on NK landscapes with N = 30 and K = 2.

waiting for K = 3 and K = 4. This aspect indicates the robustness of RL-EA.

On the other hand, although D-MAB is not able to perform as well as RL-EA, it

is able to match the performance of waiting for N = 30,K = 4. Another benefit

of offline learning is that if the optimization time would be shorter, say around

T = 500, then RL-EA would learn a different control policy, while D-MAB would

not. Thatis, RL-EAadaptstothereal-worldproblemathand; wehaveconfirmed

this experimentally (results not shown).

Figure 5.14 shows the greedy action a ∗ in each state s learnt by the RL

agent. From the plot it is apparent that the agent learnt to use mainly waiting

at the beginning of the optimization process, penalizing in the middle part of the

optimization, and, depending on the population average fitness, either forcing,

waiting, or thesubpopulationstrategy, inthefinal partof theoptimization. From

Figure5.13onemayconclude that apolicythat uses arepairing strategy (forcing,

regenerating, or subpopulation strategy) at the beginning of the optimization,

and waiting or penalizing towards the end, should also perform well. The agent

did not learn this policy because the repairing strategies may lead quickly to a

homogeneous populationcontaining manysolutions thatfallinto bothor either of

the constraint schemata. If the schemata are poor, then one should escape from

this population state but this is difficult as diversity needs to be again introduced

into the population and this takes too long using waiting or penalizing.

Figure 5.15 illustrates what strategies have been used on average by D-MAB

during different periods of the optimization process of N = 30,K = 4. From


152 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Average number of pulls

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=4

50

40

30

20

10

Regenerating

Subpop. strategy

Forcing

Waiting

Penalizing

0

[0;400)

[1600;2000]

[1200;1600)

[800;1200)

[400;800)

Time counter interval [t i ; t i+1 )

Figure 5.15: A plot showing the average number of times each strategy is played by

D-MAB at different periods of the optimization process. Results are shown for N = 30

and K = 4.

the plot we can see that penalizing is selected least often, which is due to the

low fixed reward associated when playing the associated arm. A trend is obvious

that waiting is used less often than the repairing strategies at the end of the

optimization process, which is in alignment with the policy learnt by the RL

agent. The reason that the performance of D-MAB is nevertheless poorer than

the one of RL-EA is that the repairing strategies are used too often throughout

the optimization process but in particular at the beginning of the optimization.

As mentioned before, this may result in a homogeneous population state from

which it is difficult to escape; the fact that the total number of plays increases

towards the end of the optimization confirms that D-MAB actually ends up in

this poor population state. On NK instances with lower epistasis, the PH test is

triggered less often, and waiting is used almost as often as the repairing strategies

throughout the optimization.

Overall, the strong performance of the RL-EA is encouraging, but we want to

mention that in order to achieve that performance, some tuning of the agent may

berequired. Thisisduetotwoaspects.First, thestochasticnatureofanEAinthe

sensethat: (i)takingthesameactioninaparticularstatebutindifferent episodes

may cause an EA to end up in different future states, and (ii) visiting the same

state-action pairs in different episodes may result in different rewards obtained at

the end of an episode. Secondly, since a different problem instance is used in each

training episode, the constraint schemata of the two ERCs may represent both

good and poor sets of solutions. Hence, as the quality of a schema has an impact


5.2. LEARNING-BASED CONSTRAINT-HANDLING STRATEGIES 153

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=2

0.725

Population average fitness

0.72

0.715

0.71

0.705

λ = 1.0, γ = 1.0

λ = 0.0, γ = 1.0

λ = 0.5, γ = 0.5

0.7

λ = 0.25, γ = 0.0

λ = 0.25, γ = 1.0

0.695

0 1000 2000 3000 4000 5000

#Training episodes

Figure 5.16: A plot showing the population average fitness obtained by RL-EA for

different values of thedecay factor λ andthediscount rate γ as afunction of thenumber

of trainingepisodes. Thepopulationaverage fitnessvaluesthemselves representaverage

values across 30 learning trials; each trial used different randomly generated problem

instances and random number generator seeds for training and testing, but the same

instances and seeds for any combination of λ and γ values. Training and testing was

done on NK landscapes with N = 30 and K = 2

on what strategy is most appropriate, the current optimal control policy learnt

by the RL agent may change constantly during training. The approach taken

here to deal with these two issues was to train the agent for different numbers of

training episodes using also different settings of parameters involved in the agent

algorithm. The parameter setting combination that performed, on average, best

and most robustly on the training or validation problems was used.

Figure 5.16 illustrates how the population average fitness may be affected

by the number of training episodes, and different settings of the decay factor λ

and the discount rate γ. From the figure it is apparent that the RL agent is

able to learn an optimal policy after around 1000 episodes or algorithmic runs.

Furthermore, whiletheagentisrelatively robusttothesetting ofλandγ, optimal

performance tends to be obtained for configurations where both parameters take

large values. The meaning of these configurations is that we should not discount

anyrewardobtainedduringanepisode(largeγ)norshouldaction-valueestimates

Q(s,a) be updated based on other action-value estimates (large λ). These are

sensible configurations because the aim of discounting (small γ) is to reach a

terminalstatemorequickly; thisisnotthecaseinourscenariobecauseweprovide

a reward always after visiting the same amount of simulated time. The fact that

we obtain better performance by updating visited state-action pairs (s,a) based

on the actual rewards obtained (small values of λ) is an indication that the states


154 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

0.75

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=3

Population average fitness

0.7

0.65

Forcing

Regenerating

0.6

Waiting

Subpop. strategy

Penalizing

0.55

RL-EA

D-MAB

Unconstrained EA

0.5

0 500 1000 1500 2000

Time counter t

Figure 5.17: A plot showing the population average fitness obtained in a single run

on a randomly generated NK landscape with N = 30 and K = 3 as a function of

the time counter t; the instance was subject to the two commitment relaxation ERCs

commRelaxERC(20,H = (10101∗∗∗...)) and commRelaxERC(20,H = (∗...∗∗101)).

we visited in an episode are not significantly dependent on the actions taken; this

is true in our scenario because the EA passes through approximately the same

state trajectory in an episode. Of course, different settings of γ and λ may be

more suitable for more fine-grained state space realizations.

Overall, based on the results shown, we would select RL-EA for the real-world

closed-loop problem as it performs best after 2000 time steps. Let us assume that

the real-world problem is a NK landscape instance with N = l = 30 and K = 3.

Hence, to verify our selection, we perform one run with all strategies on a single

newly generated NK landscape instance with N = 30 and K = 3; the results are

shown in Figure 5.17, and note that RL-EA uses the control policy of Figure 5.14.

The plot confirms the findings made on the test functions. In particular, RL-EA,

D-MAB,andwaiting, performbest attheend oftherunwhile thestaticrepairing

strategies perform best at the beginning of the optimization.

5.2.4 Summary and conclusion

In this study we have investigated two learning-based constraint-handling strategies

for coping with commitment relaxation ERCs. The learning-based strategies

aim at learning when to switch between static constraint-handling strategies,

which we introduced in Section 5.1.1, during the optimization process. However,

while one strategy learns this task offline using a reinforcement learning (RL)

agent, here Sarsa(λ), the other strategy performs online learning using the UCB


5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 155

algorithm extended with the statistical Page-Hinkley test to detect changes in

the sequence of rewards obtained. Offline learning is possible in this optimization

scenario because the performance of an EA depends largely on the type of ERC,

with similar effects observable for different fitness landscapes; this observation

has been indicated by the study conducted in Section 5.1.3.

The experimental analysis compared the learning-based constraint-handling

strategies against the static strategies themselves, and it concluded that ERCs

affect the performance of an EA but an RL-based strategy is able to get close to

the performance of an EA optimizing in an ERC-free environment, particularly

on fitness landscapes with little epistasis. The online learning algorithm did not

perform aswell asthe RL-based algorithm, mainly because it does not lookahead

in the optimization process and so may end up quickly in a poor population state

from which it is difficult to escape; this is a general issue of online learning

within an ERCOP scenario and it is yet unclear how to avoid it. Static repairing

strategies are well suited if little optimization time is available, while a static

waiting or penalizing strategy is more suited for longer optimization times.

To improve the performance of learning-based constraint-handling strategies,

in particular, offline learning-based techniques, one may look at the design and

tuning of RL agents that also account for the stochasticity present in evolutionary

search (e.g. the P-Trace and Q-Trace algorithm of Pendrith (1994), or the

Kalmanfiltering approachof Chang et al. (2004)). Analyzing constraint-handling

and learning strategies on different and perhaps more realistic fitness landscapes

than NK landscapes is another important research avenue. Furthermore, in addition

to learning the task of switching between constraint-handling strategies, an

agent may be required to arrange its own resources required in the evaluation of

solutions. The next section investigates strategies for coping with such scenarios.

Whilst these strategies do not rely on RL algorithms, extending them with ideas

from RL is certainly an interesting and promising research direction.

5.3 Online resource-purchasing strategies

So far, this chapter has investigated strategies for dealing with non-evaluable

solutions arising due to ERCs. In this section, our focus shifts to online resourcepurchasing

strategies to cope with commitment composite ERCs, which we introduced

in Section 4.2.5. Recall that with this type of ERC, some part of the


156 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

solution vector defines a complex subpart or composite (expressed by a high-level

constraint schema H # ) that must be purchased online and pre-emptively so that

it arrives in time to be used in the evaluation process of a solution. Once the

composite arrives (after some lag of TL time steps) it can be stored for a limited

period of time only (because composites have a shelf life of SL time steps) and/or

re-used in different experiments only a certain number of times (RN denotes the

reuse number); we assume SL ≥ RN. The number of composites available simultaneously

is upper bounded by the number of storage cells #SC. We also assume

a budget C limiting the usage of the composites and/or limiting time. In the next

section we devise three resource-purchasing strategies (for use in a generational

EA) to cope with this type of ERCs. Before we deploy and test these strategies

empirically over a number of resource-constraint settings and budget regimes in

Section 5.3.3, we define the experimental setup in Section 5.3.2. Section 5.3.4

draws together the findings of the experimental study.

5.3.1 Specific online purchasing strategies

This section proposes three online resource-purchasing strategies for coping with

commitmentcompositeERCs:ajust-in-timestrategy, ajust-in-timestrategywith

repairing, and a sliding window strategy. All strategies are augmented on a generational

EA and designed such that the EA has never to deal with solutions

that have a null fitness value (although null solutions may need to be submitted

purposely to bridge waiting periods between arriving composite orders).

In general, a strategy designed to deal with commitment composite ERCs

is composed of three mechanisms, which are concerned with: (i) selecting the

compositethatisorderedandthepointoftimeatwhichitisordered, (ii)selecting

the storage cell into which an arrived composite is put, which may mean selecting

an existing composite that is to be replaced by a new one, and (iii) selecting an

available composite to repair a non-evaluable solution (given it is to be repaired).

Each of these mechanisms is described for our strategies in the following. As

before, we assume that time steps refer to function evaluations of single solutions.

5.3.1.1 Just-in-time strategy

Without repairing and any composites in storage, the minimum number of time

steps for evaluating all solutions of a population Pop (of size µ) is µ+TL (TL


5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 157

time steps are needed for the first composite to arrive; this period is bridged by

submitting TL null solutions). This is achieved if orders are processed in contiguous

groups organized by the composites they require, e.g. ccbbbadd..., where

a,b,c, and d shall represent different composites required by solutions. Thus, the

first main mechanism of the just-in-time (JIT) strategy is to arrange solutions of

a population Pop into these contiguous groups, and then to make purchase orders

so that composites arrive just in time for the scheduled experiment time. This

mechanism, which is performed by the method alignOrdersAndSolutions(Pop)

(see below), prolongs the availability of resources.

When composites are already in storage (we call them old composites) savings

in purchase orders may be made if those composites are used first. For example,

suppose the optimizer requires the composites ccdadcac, and composite a is

available in one of the storage cells and has 3 uses and 5 time steps of its shelf

life remaining. Then, by placing a first, the permutation aaccccdd will save us a

purchase order since only two a composites are needed. Thus, the second main

mechanism of the JIT strategy is to efficiently schedule the evaluation of solutions

in a population Pop using old composites. This is performed by the method

useUpOldComposites(Pop) (see below). Notice that, at any given time, JIT (and

JIT with repairing) maintain non-identical composites in storage.

During the optimization, the JIT strategy checks at each generation whether

oldcompositescanbeusedintheevaluationofsolutionsofthecurrentpopulation.

If so, the method useUpOldComposites(Pop) is applied first and then the method

alignOrdersAndSolutions(Pop), otherwise we proceed directly with the method

alignOrdersAndSolutions(Pop).

alignOrdersAndSolutions(Pop): Recalling that solutions are grouped by the

composite they require; this method must just choose the order in which these

groups of solutions are to be submitted for evaluation. To obtain a permutation

of groups we select groups one by one using roulette wheel selection (without

replacement) based on the number of solutions in Pop associated with them,

and then reverse the order so obtained. This strategy increases the chances that

left-over composites at the end of the generation will be useful next generation

(because the solutions associated with them are over-represented in the population).

After aligning composite orders with the evaluation sequence of (groups

of) solutions, we know exactly when composite orders need to be submitted for

them to arrive just in time for being used in the evaluation of solutions.


158 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

After the arrival of a composite we need to decide the storage cell into which

we put it. The default isto placethem inanempty storagecell. If no cell is empty,

then the arriving composite replaces the old composite that can be used in the

fewest evaluations within the subsequent generation. That is, if P is the current

set of old composites and associated with each p ∈ P there is the remaining

shelf life ψ p and a remaining number of reuses ρ p , then an arriving composite

replaces the composite p ∗ ∈ P given by min p∈P (ψ p −(t new generation −t),ρ p ), where

t new generation is the time step when the subsequent generation begins, and t the

current time step. Remember that we are aware of t new generation because the entire

schedule for the evaluation of Pop is done beforehand. In the case where there

are multiple composites p ∗ , ties are broken by replacing the composite that has

the shortest shelf life remaining; further ties are broken randomly.

useUpOldComposites(Pop): Our aim when using up old composites is to save

as many composite orders as possible. To achieve this when there are several storage

cells to consider, we solve the lexicographical optimization problem

lexmax i∈π(Z) (F 1 ,F 2 ) over the permutation set of

Z = {ζ|ζ ∈ P ∩C ∧θ ζ modRN ≤ min(ψ ζ ,ρ ζ )}, (5.1)

where P,ψ p and ρ p have the same meaning as before, and C is the set of composites

required by the optimizer to evaluate Pop; associated with each c ∈ C there

is a required number θ c . The first condition, ζ ∈ P ∩C, says that a composite

needs to be currently in storage and it needs to be required by at least one solution

of the population Pop. The second condition, θ ζ modRN ≤ min(ψ ζ ,ρ ζ ),

ensures that the only left-over solutions considered for evaluation with the old

composites are ones that do not require the ordering of a new composite. 10

The objective functions to be maximized are F 1 = ∑ j∈1,...,|Z| f 1(π ij (Z)) and

10 Remember that we assume that SL ≥ RN, meaning that the maximum number of evaluation

steps associated with a composite is limited by its reuse number. If the shelf life would be

the limiting factor, then the condition would be θ ζ modSL ≤ min(ψ ζ ,ρ ζ ).


5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 159

F 2 = ∑ j∈1,...,|Z| f 2(π ij (Z)), where π ij is the jth element of permutation i,


1 if j = 1 ⎪⎨ ( ∑j−1

)

f 1 (π ij (Z)) = 1 if ψ πij −

k=1 f 1(π ik (Z))·θ πik modRN ≥ θ πij modRN

⎪⎩

0 otherwise,

(5.2)

and

f 2 (π ij (Z)) = f 1 (π ij (Z))·θ πij modRN. (5.3)

F 1 determines the number of purchase orders saved for each permutation of Z,

while F 2 breaks ties among the best permutations by selecting thepermutation(s)

that evaluatethe most solutions; further ties arebroken randomly. Required compositesleftinstorageafterevaluatingthelastsolutionoftheselected

permutation

(i.e.aftermax(F 2 )timesteps)areusedupfurtherinarandomfashion; i.e.wefirst

select a composite at random and then select a random (unscheduled) solution

from Pop that requires this composite, and we repeat this until no future evaluations

can be scheduled. For small sizes of Z, the lexicographical optimization

problem lexmax i∈π(Z) (F 1 ,F 2 ) can be solved to optimality by exhaustive search

(see Figure 5.18); this is the approach taken here.

Having devised a schedule for using up the old composites, and selecting the

solutions to be evaluated with these composites, we can now specify at which

time step the first new composite order needs to be submitted (to evaluate the

remaining solutions in Pop) in order to reduce our waiting time for it to arrive.

Subsequently, we know when and how many null solutions need to be submitted.

5.3.1.2 Just-in-time strategy with repairing

Although the JIT strategy avoids repairing of solutions and thus allows an optimizer

to perform an unconstrained optimization, this may be associated with a

waste of up to µ×(RN −1) reuses per generation (if each solution of the population

requires a different composite). To reduce wastage, the JIT with repairing

(JITR) strategy extends the basic JIT strategy with a repairing procedure.

The method performRepairing(Pop) (see Algorithm 5.3) performs the repairing


160 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

1)

p

a b d e f r

c

b d e f g

ψ p 12 4 6 5 6 19

ρ p 8 5 6 8 4 9

θ c

θ c modRN

3 14 21 5 7

3 4 1 5 7

2)

Z = {b,d,e}

(composite f is not a member of Z because θ f modRN > min(ψ f ,ρ f ))

3)

i

π ij

j = 1 j = 2

j = 3

f 1 (π ij (Z))

j = 1 j = 2 j = 3

F 2

F 1

4

4) Choose at random between

permutations i = 3,4, and 6

1

b

d

e

1

0 1

2

2

3

4

5

6

b

d

d

e

e

e

b

e

b

d

d

e

b

d

b

1

1

1

1

1

1 0

0 1

1 0

1 0

1 0

2

2

2

2

2

4

5

5

4

5

5) Use up left-over composites

from the set P if they are needed

in the evaluation of the current

population. Regardless of the permutation

selected in this example,

one evaluation will be left-over

with composite d (because

min(ψ d ,ρ d )−F 2 = 1)

Figure 5.18: Visualization of the internal calculations made by the method

useUpOldComposites(Pop). We assume a commitment composite ERC with RN = 10

and SL = 20. We need to perform five steps to process the population Pop (containing

in this example ∑ c θ c = 50 individuals): 1) determining the parameters ψ p ,ρ p , and θ c ;

2) determining the permutation set Z; 3) computing the objective functions F 1 and F 2

for each permutation π; 4) selecting a random permutation in case there are equally

goodpermutations; 5) match unscheduledsolutions of Pop andcomposites (from setP)

in a random fashion if any composites are left after realizing the selected permutation.

and this method is always called before calling the method alignOrdersAndSolutions(Pop)

(butafteruseUpOldComposites(Pop)).Theideaoftherepairingstrategy

is to repair solutions such that they use a composite that is nearly the one

required. Solutions to be repaired are identified by first clustering their composites,

and then trying to find an assignment of solutions to clusters that minimizes

the total Hamming distance of all repairs.

performRepairing(Pop): The first step of this method is to filter out all solutions

from the population Pop that use up a composite entirely (without being

repaired); Pop ′ shall denote the resulting solution. If there are more than RN solutions

requiring the same composite type, then we filter out a subset of RN solutions

among them at random. The remaining solutions take part in the clustering

process and are subject to being repaired. The number of clusters k into which

we can partition these solutions (or their required composites), varies between


5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 161

Algorithm 5.3 Just-in-time strategy with repairing

Require: w (weighting factor), #mIter (number of shifting rounds)

1: performRepairing(Pop){

2: create Pop ′ by filtering out solutions from Pop that use up composites entirely

3: for


|Pop ′ |

RN


≤ k ≤ #CompsInPop do

4: cluster solution in Pop ′ using k-medoids

5: if required, shift solutions between clusters (perform mIter rounds of shifting); retain

configuration with the smallest Hamming distance loss and record HDL k

6: normalizethe valuesHDL k andk ofallconfigurations,andcalculatethe scoresHDLN⌉

k·(1−

w)+kN ·w,∀k, where HDLN k =

HDL k −min k (HDL k )

max k (HDL k )−min k (HDL k )

and kN =

k−


|Pop ′ |

RN

#CompsInPop−⌈ |Pop ′ |

RN

7: repair solutions in Pop ′ according to the configuration associated with the smallest score

or min k (HDLN k ·(1−w)+kN ·w); add the filtered-out solutions back to Pop ′ to create a

new population Pop ′′ .}

⌈ ⌉

|Pop ′ |

≤ k ≤ #CompsInPop, where ⌈.⌉ is the ceil function and #CompsInPop

RN

the number of non-identical composites in Pop ′ . Let us first describe how we

perform the clustering for a given k, and then how we select a particular cluster

configuration according to which we repair.

To obtain a partition with k clusters we apply k-medoids (Kaufman and

Rousseeuw, 1990) on the different composites. 11 The distance between two composites

is the Hamming distance between their constraint schemata. The medoid

composite of a cluster is the composite that would be used to repair (using forcing)

all solutions in that cluster that require a different composite. For diversity

reasons, we want to avoid ordering a medoid composite more than once, or, in

other words, we want to keep the number of solutions within a cluster smaller

than the reuse number RN. A simple way to ensure this is to randomly shift

solutions from clusters that are overly large to clusters that can accommodate

further solutions. We perform mIter shifting rounds in total and select the configuration

with the smallest total Hamming distance of all repairs HDL k to be the

best configuration obtained with k clusters; formally we can define this selection

step as:

HDL k = min

i=1,...,mIter

⎛ ⎧ ⎫⎞

k∑ ⎨ ∑ ⎬

⎝ HD(H j ,H j,r,i ) ⎠ , (5.4)

⎩ ⎭

j=1 r∈CL j,i


11 Thek-medoidsalgorithmdividesasetofndatapointsintok groups(clusters), withk being

known a priori, in such a way that the sum of pairwise dissimilarities between the data points

of a cluster is minimal. The medoid of a cluster is the data point whose average dissimilarity

to all other points in that cluster is minimal. In other words, the medoid is the most centrally

located data point in a cluster.


162 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

where H j represents the constraint schema of the medoid composite in cluster j,

and H j,r,i the rth solution in the set CL j,i , which is the cluster j under shifting

configuration i; note that while the medoid remains the same for different shifting

rounds, the set CL j,i depends on how solutions are shifted between clusters. The

function HD(.,.) computes the Hamming distance between two schemata.

The cluster configuration according to which we repair is the one with the

smallest score HDLN k·(1−w)+kN·w, where HDLN k andkN arethe normalized

valuesofHDL k andk (seeLine6ofAlgorithm5.3),andw ∈ [0,1]apredetermined

weightingfactorrepresenting thedegreeofrepairing.Roughlyspeaking, thelarger

the value w the more solutions are repaired; note, a setting of w = 0 would

result in the basic JIT strategy because the smallest total Hamming distance of

all repairs is achieved by not repairing at all. We can now add the solutions that

were filtered out againto Pop ′ to form a new population Pop ′′ , which is processed

further as usual using the method alignOrdersAndSolutions(Pop ′′ ).

5.3.1.3 Sliding window strategy

While JIT and JITR operate in a sequential mode in that they devise a schedule

of solution evaluations and purchase orders upon receiving a population from

the EA first, the sliding window (SW) strategy deals with the working of the

algorithm and the purchasing of orders in parallel. More precisely, the strategy

submits solutions for evaluation in the order they are generated, non-evaluable

solutions are always repaired, and it aims for the ‘most useful’ composites to be

kept in storageby(i) replenishing composites so that there cannever be anempty

storage cell and (ii) maintaining composites that were recently requested by the

optimizer.

The first aspect, avoiding empty storage cells, is achieved by making purchase

orders to fill all storage cells every RN time steps (because SL ≥ RN). The

second aspect, maintaining recently requested composites, is achieved by ordering

composites from the sliding window, which we define here as a set κ(t) containing

composites that were requested most recently but were unavailable at the time

of the request; κ(t) is limited in size such that |κ(t)| ≤ WS, where WS is the

window size. Here, a requested composite means one that was part of a solution

that the EA submitted to the function wrapper for evaluation. In this thesis, we

simply order the#SC composites fromκ(t) thathave been addedtothisset most

recently. That is, if we set WS = #SC, then we order always all composites in


5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 163

κ(t); note, this means that all composites in the storage cells are replaced upon

the arrival of new composites. If we cannot order a composite for each storage

cell because |κ(t)| < #SC, then we order random composites for the remaining

storage cells; this is, for example, the case at t = 0 where the set κ(0) is empty.

To repair a non-evaluable solution, we use the composite from the storage cell

that has the smallest Hamming distance to the actually required composite; ties

between equally distant composites are broken randomly. This repairing step is

encoded within the repairing method, repair(⃗x,t), of the function wrapper (see

Line 36 of Algorithm 5.4). This is different to the JITR strategy, which applies

repairing after generating an offspring population (see Line 15 of Algorithm 5.4).

5.3.2 Experimental setup

We augment the three online resource-purchasing strategies on the same elitist

generationalEAasusedintheanalysisofthestaticconstraint-handlingstrategies

(see Section 5.1.2.1); we also use the same parameter settings (see Table 5.1).

Algorithm 5.4 shows how we incorporate the purchasing strategies into this EA. 12

Astheobjective functionf weconsider againtheMAX-SATprobleminstance

introduced in Section 5.1.2.2 (i.e. solutions are represented as binary strings of

length l = 50). Any results shown are average results across 100 independent

algorithm runs on this instance. Similarly to our previous studies, we choose the

order-defining bits of the high-level constraint schema H # at random at each run

but, of course, use the same schemata across the strategies analyzed.

Table 5.5 gives the settings of the online purchasing strategies. JIT is parameter-free

and thus not included in this table. For JITR, we anneal the weighting

factor w stepwise as a function of the cost counter c as w = max(0,0.1 ×

max(0,4−⌊c/250⌋)). This means that the weighting factor is decreasing in the

range [0.4,0.1], resulting in a decreasing number of repairs as the optimization

proceeds. The factor remains constant at w = 0.1 (i.e. only little repairing is

applied) for c ≥ 1000. This setting performed most robustly on the test problem

considered here. For SW, we found that ordering a composite from κ(t) that

first undergoes mutation yields better performance. Preliminary experiments revealed

a per-bit mutation rate of p κ = 0.05, which will also be used here, to be

promising. This modification encourages exploration in case the EA gets stuck at

12 The purpose of Line 27 and 28 of Algorithm 5.4 is to permit the submission and arrival of

composite orders also during time steps where null submissions are submitted.


164 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Algorithm 5.4 Generational EA with online resource-purchasing strategies

Require: commCompERC(...) (a commitment composite ERC), f (objective function), T

(time limit), C (budget), c time step (cost per time step), µ (parent population size), λ

(offspring population size), #Strategy (selected resource purchasing strategy)

1: t = 0,c = 0(globalvariablesrepresentingthe currenttime step and costs), Pop = ∅ (current

population), OffPop = ∅ (offspring population)

2: while |Pop| < µ ∧ t < T ∧ c < C do

3: generate solution ⃗x at random

4: ⃗x = functionWrapper(⃗x,t)

5: Pop = Pop∪{⃗x}; t++

6: while t < T ∧ c < C do

7: OffPop = ∅

8: repeat

9: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

10: OffPop = OffPop∪{⃗x (1) }∪{⃗x (2) }

11: until |OffPop| = λ

12: if Z ≠ ∅ ∧ (#Strategy=JIT ∨ #Strategy=JITR) then

13: useUpOldComposites(OffPop)

14: if #Strategy=JITR then

15: performRepairing(OffPop)

16: if #Strategy=JIT ∨ #Strategy=JITR) then

17: alignOrdersAndSolutions(OffPop)

18: for i = 0 to |OffPop| do

19: if t < T ∧ c < C then

20: ⃗x i = functionWrapper(⃗x i ,t) // ⃗x i represents the ith solution of OffPop

21: t++; c+=c time step

22: form new Pop by selecting the best µ solutions from the union population Pop ∪ OffPop

23: functionWrapper(⃗x,t){ // Notice that Line 32to 36can only be enteredby the SW strategy

24: y t = null

25: if ⃗x is a null solution then

26: ⃗x t = ⃗x;

27: do the same task as in Line 4 of Algorithm 4.5

28: do the same task as in Line 5 of Algorithm 4.5

29: else

30: if ⃗x satisfies the ERC commCompERC(...) then

31: ⃗x t = ⃗x; y t = f(⃗x t )

32: else

33: if |κ(t)| = WS then

34: remove the element of κ(t) that has been added earliest

35: κ(t)∪⃗x comp // ⃗x comp denotes the composite required by ⃗x

36: ⃗x t =repair(⃗x,t); y t = f(⃗x t )

37: return ⃗x t and y t }

suboptimal composites. It is likely that the appropriate mutation rate depends

on the string length l of solutions, and the number of composite-defining bits; a

more thorough investigation is needed to confirm this.


5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 165

Table 5.5: Parameter settings of online resource-purchasing strategies.

Strategy Parameter Setting

JITR

SW

Number of shifting rounds mIter 500

Weighting factor w

0.1×max(0,4−⌊c/250⌋), where

c is the current cost counter

Per-bit mutation probability

for composites from κ(t), p κ

0.05

5.3.3 Experimental results

Ourfocusinthisstudyistoanalyzetheimpactofcommitment compositionERCs

inthepresence ofbudgetary constraints ratherthantimelimitations. Wethus set

the time limit T to an arbitrarily large number and terminate the optimization

upon reaching some budget C.

Figure 5.19 shows the probability of SW and JIT of achieving the population

average fitness of our base algorithm obtained in an ERC-free environment. For

SWtheperformanceimpactdependscruciallyonthenumberofstoragecells#SC

and the reuse number RN. From the left plot we observe that the performance

improves the more storage cells are available and the lower the reuse number

is. The reason therefore is that for these settings we order a larger number

of potentially different composites more frequently. This in turn reduces the

probability of needing to repair and if in case it needs to be repaired it increases

the diversity of the repaired solutions among the composite-defining bits.

From the right plot of Figure 5.19 we observe that the performance of JIT improves

as the time lag decreases and the reuse number increases. As this strategy

does not repair and composites are gratis in this experiment, any positive effect

on the performance comes from a shorter waiting period between transitions of

population generations. While a shorter time lag has a direct positive effect on

the waiting period, a greater reuse number allows an optimizer to evaluate more

solutions from a new population using old composites and thus to compensate a

slightly larger time lag. An increase in the number of storage cells #SC, results

not shown here, can have a positive performance effect because one is more likely

to have a required composite in storage and thus can reduce the waiting time

between population generations.


166 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Sliding window - Probability of achieving the

population average fitness of an ERC-free search

64

0.6

Just-in-time - Probability of achieving the

population average fitness of an ERC-free search

64

0.6

52

0.5

52

0.5

Reuse number RN

40

28

16

0.4

0.3

0.2

0.1

Reuse number RN

40

28

16

0.4

0.3

0.2

0.1

4

3 12 21 30 39 48

Number of storage cells #SC

0

4

0 12 24 36 48

Time lag TL

0

Figure 5.19: Plots showing the probability of SW (left) and JIT (right) of achieving

the population average fitness of our base algorithm obtained in an ERC-free environment

given a budget and time limit of C = T = 1500. For SW this probability

is shown as a function of #SC and RN for the ERC commCompERC(o(H # ) =

30,#SC,TL =10,RN,SL = RN), and for JIT it is shown as a function of TL and RN

for the ERC commCompERC(o(H # ) = 10,#SC =10,TL,RN,SL = RN); cost were

set to c order = 0, c time step = 1, C = 1500.

The performance of JITR obtained at large budgets is not significantly different

from the performance of JIT as a function of TL and RN. This is due to the

rather conservative cooling scheme of w. Let us now analyze whether there is a

performance difference between the two strategies when we vary the budget and

associate composite orders with costs c order > 0. Figure 5.20 shows pairwise performance

comparisons of the three strategies as a function of the the cost counter

c and the cost parameter c order . From the top left plot it is apparent that JITR

is able to locate fit solutions at a lower budget than required by JIT (see region

0 < c ≤ 600, 0 ≤ c time step ≤ 0.5). That is, repairing yields a more economical

search and should be preferred in situations where the number of evaluation steps

is limited due to a low budget and/or high costs.

From the top right plot of Figure 5.20 we observe that the weakness of JIT of

being too expensive or wasteful at low budgets and/or high costs is also apparent

when comparing it with SW. The bottom plot of Figure 5.20 indicates that

repairing improves performance significantly in these constraint setting regimes,

thoughtheperformanceofSWcannot bematched. Thereasonthat SWperforms

so well in the presence of small budgets is that it avoids waiting for composite

orders to arrive. However, this is achieved at the cost of many repairs, which,

ultimately, may cause a shift in the search direction towards suboptimal composites.

An indication that such shifts in the search direction take place is the fact


5.3. ONLINE RESOURCE-PURCHASING STRATEGIES 167

1.6

Ratio P(f(x) > f JITR )/P(f(x) > f JIT )

10 1

1.6

Ratio P(f(x) > f JIT )/P(f(x) > f SW )

10 4

Cost per time step c time_step

1.4

1.2

1

0.8

0.6

0.4

0.2

10 0

10 -1

10 -2

10 -3

Cost per time step c time_step

1.4

1.2

1

0.8

0.6

0.4

0.2

10 3

10 2

10 1

10 0

0

0 400 800 1200 1600

Cost counter c

10 -4

0

0 400 800 1200 1600

Cost counter c

10 -1

1.6

Ratio P(f(x) > f JITR )/P(f(x) > f SW )

10 2

Cost per time step c time_step

1.4

1.2

1

0.8

0.6

0.4

0.2

10 1

10 0

0

0 400 800 1200 1600

Cost counter c

10 -1

Figure 5.20: Plots showing the ratio P(f(x) > f JIT )/P(f(x) > f SW ) (left) and

P(f(x) > f JITR )/P(f(x) > f JIT ) (right) as a function of c and c time step for the ERC

commCompERC(o(H # ) = 10,#SC = 5,TL = 5,RN = 30,SL = 30), c order = 1;

here, x is a random variable that represents solutions drawn uniformly at random

from the search space and f ∗ the population average fitness obtained with strategy

∗. Here, x is a random variable that represents solutions drawn uniformly at random

from the search space and f ∗ the population average fitness obtained with policy ∗. If

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a higher average

best solution fitness than strategy ∗ and a greater advantage of ∗∗ is indicated by a

darker shading in the heat maps; similarly, if P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is

better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

that JIT and JITR can match, and sometimes outperform SW for larger budgets

(see region 1000 < c ≤ 1600, 0 ≤ c time step ≤ 0.5 in the top right and bottom

plot).

The previous experiment looked at composite commitment ERCs featuring

a rather small number of storage cells SC, which is in favour of SW. We also

used a rather high order o(H # ) = 30 in the previous experiment. For SW, a high

order o(H # ) tends to reduce the risk of having identical composites on storage

and thus benefits the diversity in the population with respect to the compositedefining

solution bits. Let us now investigate whether the performance advantage


168 CHAPTER 5. STRATEGIES FOR COPING WITH ERCS

Ratio P(f(x) > f JIT )/P(f(x) > f SW ) at C=1500

Ratio P(f(x) > f JIT )/P(f(x) > f SW ) at C=3000

50

10 2 50

10 1

Order of high level

constraint schema H # , o(H # )

42

34

26

18

10

10 1

10 0

10 -1

10 -2

Order of high level

constraint schema H # , o(H # )

42

34

26

18

10

10 0

10 -1

2

3 12 21 30 39 48

Number of storage cells #SC

10 -3

2

3 12 21 30 39 48

Number of storage cells #SC

10 -2

Figure 5.21: Plots showing is the ratio P(f(x) > f JIT )/P(f(x) > f SW ) for a budget

of C = 1500 (left) and C = 3000 (right) as a function of the number of #SC

and o(H # ) for the ERC commCompERC(o(H # ),#SC,TL =25,RN =25,SL =25),

c order =c time step = 1. Here, x is a random variable that represents solutions drawn uniformly

at random from the search space and f ∗ the population average fitness obtained

with policy ∗. If P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy ∗∗ is able to achieve a

higheraverage bestsolution fitnessthanstrategy ∗andagreater advantage of∗∗isindicatedbyadarkershadingintheheatmaps;

similarly, ifP(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1,

then ∗ is better than ∗∗ and a lighter shading indicates a greater advantage of ∗.

of SW over the other two strategies can still be preserved when we vary #SC and

o(H # ). Figure 5.21 compares the performance between JIT and SW as a function

of#SC ando(H # ) forabudget ofC = 1500(left plot)andC = 3000(right plot);

JITR achieved a similar performance to JIT for the considered budget regimes.

For the lower budget (left plot), we observe that SW outperforms JIT for rather

large orders o(H # ) and few storage cells #SC (see range 8 < o(H # ) < 40,

#SC < 15); SW is too wasteful in the presence of more storage cells. As the

budget increases (right plot), JIT is able to match the performance of SW and

eventually to overtake it, particularly if both #SC and o(H # ) are large.

5.3.4 Summary and conclusion

In this study we have considered an optimization scenario in which costly and

expendable resources (composites) are required in the evaluation of solutions,

and these composites had to be ordered in advance, kept in capacity-limited storage,

and used within a certain time frame. We proposed three online resourcepurchasing

strategies (see Section 5.3.1) and analyzed them over a number of

resource-constraint settings. A simple just-in-time (JIT) strategy, which can be

seenastheminimalstrategy, hasshowntobegenerallyeffectivebuttooexpensive

at early optimization stages. This drawback has been eliminated by intelligently


5.4. CHAPTER SUMMARY 169

repairing solutions during these stages (JITR). Besides these two reactive strategies

we also considered a sliding window (SW) strategy, which applied some from

of anticipation. This (simple) strategy performs better than JITR and JIT for

some constraint settings, although it is costly in the presence of much storage

space.

We can try out various modifications to the purchasing strategies to improve

their performance. Forinstance, fortheSWstrategyonemay ensurethatdistinct

composites only are ordered from the sliding window, and also avoid having all

storage cells filled at any time. Furthermore, rather than ordering the composites

that were added to κ(t) most recently, one may order composites that have been

requested most frequently, or have proved to be successful in the past. In essence,

a sophisticated ordering strategy may trade off exploration of rarely-requested

composites (from the sliding window) with exploitation of composites that have

proved to be successful and/or were requested frequently. Alternative repairing

strategies for the JIT strategy may also prove fruitful.

5.4 Chapter summary

In this chapter we proposed and analyzed a variety of strategies for coping with

ERCs. Wefirst lookedatstaticstrategies fordealing withnon-evaluable solutions

arising due to ERCs. Experiments over a number of test functions suggest that

the appropriate strategy depends on the constraint parameters of the ERC rather

than the fitness landscape it is optimized over. If this turns out to be generally

true, then this means we can select a strategy offline given the common situation

that the ERCs are known beforehand. Motivated by this observation we then

showed that learning offline when to switch between the static strategies during

an optimization process can yield better performance than the static strategies

themselves. Finally, we proposed several online resource-purchasing strategies for

coping with an ERC type that leaves the arrangement of resources to the optimizer.

Overall we can tentatively say that the most suitable constraint-handling

strategy depends on the constraint settings of the ERCs considered.

The next chapter focuses on the two remaining resourcing issues considered in

this thesis: optimization subject to changes of variables and lethal environment.


Chapter 6

Further Resourcing Issues in

Closed-Loop Optimization

In this chapter we propose and analyze strategies for coping with two further

resourcing issues in closed-loop optimization: problems subject to changes of

variables and lethal environments. The first resourcing issue, optimization subject

to changes of variables, is motivated by an experimental problem involving

the identification of effective drug combinations drawn from a non-static drug

library. The second resourcing issue, optimization in lethal environments, causes

the population of an EA to decrease in size (permanently) upon evaluating a poor

or lethal solution. We motivate this scenario, and devise and validate guidelines

for tuning EAs to cope with it.

6.1 Optimizationon problemssubjectto changes

of variables

Optimization problems subject to changes of variables belong to the class of dynamic

optimization problems, which we introduced in Section 3.4. The dynamic

component is related to the fitness landscape rather than the constraints (as was

the case with ephemeral resource-constrained problems). The next section motivates

optimization problems subject to changes of variables, and in Section 6.1.2

we propose a method, which we call fair mutation, specifically designed for coping

with such problems. The test problems we use to compare fair mutation against

four standard strategies from dynamic optimization are described Section 6.1.3.

170


6.1. CHANGES OF VARIABLES 171

Library of drugs at t = 0

Library of drugs at t = ∆g

Replace drugs c and e

with drugs f and g

a b

c d

e

a b f d g

Figure 6.1: An illustration of the effect of changing variables. In the (drug discovery)

scenario considered, this corresponds to replacing a number of drugs (here #v = 2) of

a library with new ones (every ∆g generations during an optimization process).

Theexperimental setupisoutlinedinSection6.1.4,andanempiricalstudyfollows

in Section 6.1.5. Section 6.1.6 concludes this study.

6.1.1 Motivation

Our motivation for considering optimization problems subject to changes of variables

is a completed experimental study (Small et al., 2011), in which we were

involved, concerned with the identification of effective drug combinations using

EAs. For a detailed description of the experimental setup please refer to Small

et al. (2011); here we focus solely on the effect of changing variables. In this

study, a human experimentalist needs to arrange a library (of size l) of promising

or relevant drugs to be considered by an optimizer. In our case, each drug i

corresponds to a single binary variable x i indicating whether the drug is included

into a drug mixture (x i = 1) or not (x i = 0); the drug mixture itself represents

an entire solution ⃗x. In applications of this type, the experimental equipment is

often set up such that multiple experiments (which are here the mixing of drugs

and subsequent testing of the mixtures) can bedone in parallel. In our scenario, a

robot is taking care of the mixing of drugs, and this robot is able to mix up to 50

combinations in parallel. During the running experimental optimization, which

may take several months, the human experimentalist may decide to replace drugs

from the library with new ones (as indicated in Figure 6.1) because, for instance,

the old drugs have an undesired effect on the efficiency of a drug cocktail or are

simply not of interest anymore. 1 Hence, whenever a change of drugs takes place,

the corresponding variables and their effects on the fitness are changed as well.

Note, a change of drugs does not change the effectivity of cocktails that did not

use any of the replaced drugs; i.e. solutions with a 0-bit at any of the replaced

1 The situation where drugs are only added to or removed from an existing library is realistic

too but will not be considered in this study.


172 CHAPTER 6. FURTHER RESOURCING ISSUES

variables retain their fitness. Obvious ways of dealing with this are to carry on

regardless, or to restart optimization entirely. We wish to see if other strategies

fare any better.

Similar to other dynamic optimization problems, problems subject to changes

of variables feature a changing fitness function and thus a changing fitness landscape.

An example of a conceptually similar dynamic optimization problem is the

dynamic traveling salesman (DTSP) or vehicle routing problem (Larsen, 2000),

where cities may be replaced with new ones during the optimization. The two

main differences between our dynamic combinatorial problem and these existing

dynamic problems are that: (i) we do not need to detect changes in the landscape

because it is always known when variables are changed (sometimes the

exact generation of a change can even be controlled), and (ii) the fitness value

remains unchanged for a known subset of solutions (see above). The fact that in

our case solutions are evaluated by conducting physical experiments, whose number

is limited by resource restrictions, can be considered as another distinction to

previous analyses of dynamic problems. The presence of resource limitations requires

a quick tracking of newoptima. Fortunately, thanks to(i) and(ii), fulfilling

this task is usually easier in our scenario than, say, in DTSP.

6.1.2 Fair mutation

In Section 3.4 we introduced three main approaches for tracking optima in a

dynamic optimization environment: diversity control, memory-based, and multipopulation

approaches. The strategy we propose here for dealing with changes

of variables, which we call fair mutation, can be regarded as a diversity-control

approach. Unlike for the majority of strategies of this type, we do not need to answer

the question of how to detect environmental changes. Instead, the question

is rather how to exploit best the (often expensively obtained and partially intact)

information contained in the current population in the generation of a new population

after exchanging some variables. Fair mutation introduces diversity into

the population only in the initial generations following an environmental change.

This is done by using effectively the information stored in the current population

and by making use of the fact that only bits with value 1 have potentially an

effect. We explain the strategy in detail in the following.


6.1. CHANGES OF VARIABLES 173

Algorithm 6.1 Fair mutation

Require: p r (raised mutation rate), ∆h (number of generations for which p r is used)

1: applyFairMutation(Pop,λ,V,∆t){

2: if ∆t = 0 then

3: for each variable k ∈ V do

4: for 1 ≤ i ≤ ⌊λ/|V|⌋ do

5: generate an offspring ⃗x off as normal (if the application of selection and variation

operators yields two offspring, then select one of them at random); mutate ⃗x off

using a rate of p r ; set variable k of ⃗x off to 1 and add ⃗x off to offPop; i.e. offPop =

offPop ∪⃗x off

6: in the situation where |offPop| < λ, fill offPop in the same way as above but instead of

setting variable k to 1, set to 1 a variableselected from V at random without replacement

7: else

8: if 0 < ∆t ≤ ∆h then

9: generate offspring population as normal but for any offspring containing a 1-bit at any

of the variables in V, mutate the other bits using p r

10: else

11: generate offspring population as normal

12: return offPop}

6.1.2.1 Algorithm

The idea of fair mutation is to allow an optimizer both to continue optimization

as normal and at the same time to rapidly explore the space of solutions that use

any of the new variables. In our context the new variables represent a set of new

drugs that replaces a set of existing drugs in the drug library. We employ two

mechanisms to facilitate the exploration of the search space spanned by the new

variables: (i) allowing an optimizer to test each new variable (i.e. its bit being set

to1)withinthesamenumber ofoffspringsolutions, and(ii) enablinganoptimizer

to explore solutions with a 1-bit at any of the new variables by applying a raised

mutation rate p r for ∆h generations (p r and ∆h are user-defined parameters).

Algorithm 6.1 describes the method applyFairMutation(Pop,λ,V,∆t), which we

call to generate the offspring population OffPop at each generation. The method

takes as input the current population Pop, the number of offspring solutions λ to

be generated, the set of changed variables V, and the number of generations ∆t

passed after the last environmental change (or variable replacement step). We

assume that fair mutation is augmented on a generational EA with a (µ + λ)-ES

reproduction scheme. 2

From Line 8 and 9 in Algorithm 6.1 one can see that, in the first ∆h generations

after an environmental change, we apply a raised mutation rate (from which

2 For a steady state EA or λ = 1, we would call the method applyFairMutation(Pop,1,V,∆t)

multiple times thereby ensuring that each variable is tested an equal number of times.


174 CHAPTER 6. FURTHER RESOURCING ISSUES

1-bits among the new variables are protected) if an offspring has one or more 1-

bits among any of the new variables. Although this step allows an optimizer to

perform exploration among solutions that use one or more of the new variables,

it also runs the risk that an offspring may be perturbed so much that it turns into

a poor one. However, when we embed this strategy within an (elitist) population

update scheme it achieves both the continued normal optimization of strings that

do not contain any of the new variables, and the fair and rapid exploration of

each of the subspaces (schemata) that contain a new variable. Alternatively, one

may decrease the mutation rate p r during the period of ∆h generations, or look

at other parameter controlling techniques (see Section 3.1.2) for tuning p r .

6.1.3 Testing Environment

To validatetheperformanceoffairmutationwe useamain-and-joint effects (MJ)

problem, which models the real drug mixture problem we are interested in. We

present the MJ model in the following section. We also consider variable changes

on NK landscapes as a comparison.

6.1.3.1 MJ model

In the application described in (Small et al., 2011), the experience of the human

experimentalist and a previously performed pilot study suggested that only some

drugs (bits) of the library have a positive effect while others have a negative side

effect or no effect at all. Moreover, it is believed that interactions among the

considered drugs can be described by main and joint effects only with higher

order interaction effects being negligible.

Let M denote the number of randomly selected bits x h(q) ,q = 1,...,M ≤ l

(l is the total number of bits) which are assigned a main effect; h(q) is the variable’s

index of the qth main effect. And let J denote the number of randomly

selected distinctive bit pairs (x i(r) ,x j(r) ),r = 1,...,J,i(r) ≠ j(r), which are assigned

a joint effect; i(r) and j(r) are the indexes of the two variables that have

the rth joint effect. In our case, the strengths of all main effects m h(q) and joint

effects g i(r),j(r) are drawn uniformly from the interval [−3,1]. That is, on average,

a quarter of all main and joint effects will be positive; negative effects can model

undesired side effects associated with (pairs of) drugs. As the purpose of this


6.1. CHANGES OF VARIABLES 175

model is to model the real drug mixture problem at hand, we assume that variables

or drugs have only an effect if they are included in a drug mixture. Hence,

the strength of a main effect m h(q) or joint effect g i(r),j(r) is only considered if the

variable x h(q) , respectively, both variables x i(r) and x j(r) are 1-bits. To evaluate

a candidate solution, we look up all the main and joint effects which are on and

take theaverageofthesumofall these values. Inother words, thefitness function

f to be maximized is defined as

(

f(⃗x) = 1 M

)


J∑

m h(q) ·x h(q) + g i(r),j(r) ·x i(r) ·x j(r) . (6.1)

l

q=1 r=1

6.1.3.2 NK landscapes

The second test problem we consider are NK landscapes, or, equivalently, NKα

landscapes with the setting α = 0. Please refer to Section 5.1.2.2 for a description

of the test problem. Compared to an MJ model, an NK landscape assigns

a positive effect (drawn here from [0,1]) to each of the possible 2 K+1 bit-wise

neighborhood configurations; i.e. 0 and 1-bits are treated equally. For fair mutation

this means that (i) offspring solutions devoted to a new variable need to

be set to 1 and to 0 in equal number (this concerns the for-loop in Line 4 of

Algorithm 6.1), and (ii) new variables are always protected from mutation (this

concerns the if-clause in Line 8 of Algorithm 6.1).

6.1.3.3 Realizing a change of variables

On an NK landscape, a change of a variable involves reselecting its K neighbors

and reinitializing the corresponding 2 K+1 effects (of the neighborhood configurations).

Similarly, on a MJ model, if the replaced variables had a main and/or

joint effect, then these effects are reinitialized and the joint bits reselected. Recall

that we select the joint bits such that we end up with J distinctive bit pairs only.

In this study, a change of variables is performed every ∆g generations whereby

always #v randomly selected variables are changed. That is, while ∆g controls

the frequency of environmental changes, #v controls the severity of a change.


176 CHAPTER 6. FURTHER RESOURCING ISSUES

6.1.4 Experimental setup

We compare the performance of fair mutation (denoted as #Strategy=1 in Algorithm

6.2) on dynamic NK landscapes and MJ models against four standard

strategies from dynamic optimization: triggered hypermutation (#Strategy=2),

restart of the optimization (#Strategy=3), re-evaluating the current population

after changing variables and then carrying on as normal (#Strategy=4), and a

niching algorithm (#Strategy=5). We will also show partial results of a strategy

called random immigrants (#Strategy=6).

We introduced the idea of triggered hypermutation (Cobb, 1990) and random

immigrant (Grefenstette, 1992) in Section 3.4. Here, we will outline the working

procedures of the two diversity-controlled approaches in more detail. The

triggered hypermutation approach is typically employed within a generational

EA, with crossover and a small per-bit mutation probability p m,b (also referred

to as base mutation rate). The method monitors the quality of the fitness of the

best individual in the population over time, and when this measure declines, it

is assumed that the environment has changed and the mutation rate is increased

drastically (by some factor HR) for a single generation. The obvious drawback

of this approach is that changes in the landscape occurring in regions that are

not populated by an EA will not trigger the hypermutation. The random immigrants

approach circumvents this drawback by promoting exploration of the

search space continuously. In fact, every generation, it replaces a percentage p RI

of the population by randomly generated individuals.

As the niching algorithm, we use deterministic niching, whose procedure we

already described in Algorithm 3.2 of Section 3.4.

Parameter settings: We augment all dynamic optimization strategies (except

the niching algorithm) on an EA with a (µ+λ)-ESreproduction scheme; the niching

algorithm is elitist in its own way as offspring solutions are allowed to replace

their parents only if they are fitter. The EA is identical to the one described in

Section 5.1.2.1 using also the same setup (see Table 5.1). Algorithm 6.2 shows

how we integrate the different dynamic optimization strategies as well as variable

changes (performed in Line 9) into this EA.

The parameter settings of the different dynamic optimization strategies is

given in Table 6.1. Note from the table that the triggered hypermutation is set

up such that, once triggered, we use a mutation rate of p m,b × HR = 0.5 for

a single generation. The effect of mutating a solution with this probability is


6.1. CHANGES OF VARIABLES 177

Algorithm 6.2 Generational EA with dynamic optimization strategies for handling

problems subject to changes of variables

Require: f (objective function), G (maximal number of generations), µ (parent population

size), λ (offspring population size), #Strategy (number of selected dynamic optimization

technique)

1: g = 0 (generation counter), ∆t = 0 (number of generations past after the last variable

change), MA = −1 (moving average of the best-of-generation fitness), Pop = ∅ (current

population), OffPop = ∅ (offspring population)

2: while |Pop| < µ ∧ g < G do

3: generate solution ⃗x at random

4: evaluate ⃗x using f

5: Pop = Pop∪{⃗x}; g++; update MA

6: while g < G do

7: OffPop = ∅

8: if g mod∆g = 0 then

9: select #v variables to be replaced at random, and add their indices to the set of

changed variables, V; reinitialize the fitness effects of the new variables in case the

replaced variables were previously associated with any (i.e. generate a new objective

function f ); set ∆t = 0

10: if #Strategy=3 then

11: reinitialize Pop with random solutions

12: else

13: re-evaluate all solutions in Pop using (the new objective function) f

14: if #Strategy=1 then

15: OffPop = applyFairMutation(Pop,λ,V,∆t)

16: else

17: if #Strategy=2 ∧ MA declined then

18: generate λ solutions at random to form the population OffPop

19: if #Strategy=5 then

20: create new population Pop according to Algorithm 3.2 of Section 3.4

21: else

22: for i = 0 to λ/2 do

23: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

24: OffPop = OffPop∪{⃗x (1) }∪{⃗x (2) }

25: if #Strategy=6 then

26: replace p RI ×λ solutions of OffPop with random solutions

27: if #Strategy≠5 then

28: evaluate all solutions of OffPop using f

29: update MA based on the best solution fitness in OffPop

30: form new Pop by selecting the best µ solutions from the union population Pop ∪

OffPop

31: g++; ∆t++

identical to generating a solution at random. Hypermutation is triggered when

there is a decrease in the moving average of thebest-of-generation fitness over five

generations. These arestandardparameter setting combinations forthis strategy,

and so is the setting of the replacement rate p RI for the random immigrants


178 CHAPTER 6. FURTHER RESOURCING ISSUES

Table 6.1: Parameter settings of dynamic optimization techniques.

Strategy Parameter Setting

Fair mutation

Raised mutation rate p r 1/l

Number of generations for

which p r is applied, ∆h

0

Triggered Base mutation rate p m,b 0.001

Hypermutation Hypermutation rate HR 500

Random Immigrants Replacement rate p RI 0.3

approach. Note, triggered hypermutation is typically employed within a nonelitist

EA. We integrate it into an elitist EA because the performance obtained

with that EA proved to be better on the test problems considered.

We are mainly interested in the dynamics resulting from changing a small

number of variables frequently as this is the scenario we are faced with in (Small

et al., 2011). On this class of problems, the parameter setting for which fair

mutation performed best is ∆h = 0 and p r = 1/l, which will also be used here

(see Table 6.1). The interpretation of this parameter setting is that new variables

are protected from mutation only at the generation at which an environmental

change occurs, and that the raised mutation rate is the same as the standard

mutation rate (i.e. 1/l). Higher values of ∆h and p r are required for more severe

environmental changes and different interval ranges of main and joint effects. At

the generation of an environmental change, all strategies (apart from the restart

strategy, see Line 10 and 11 of Algorithm 6.2) re-evaluate the current population

before they generate an offspring population.

Ouraimistounderstandwhateffect changesofvariableshaveonevolutionary

search for different settings of #v and ∆g. For ease of analysis we therefore fix

the problem parameters of the NK landscape and MJ model to N = l = 30 and

K = 2, respectively, M = 30 and J = 30. That is, in the MJ model all bits

have a main effect and 30 bit pairs have a joint effect. The search space size l

corresponds also to the size of the drug library considered in our experimental

study (see Small et al. (2011)). Table 6.2 summarizes the test problem settings.

Any results shown are average results across 100 independent algorithm runs.

Performance measurement: To measure the performance of the investigated


6.1. CHANGES OF VARIABLES 179

Table 6.2: Parameter settings of test functions f modelling problemssubjectto changes

of variables.

Test function Parameter Setting

MJ model

NK landscapes

Solution parameters l 30

Number of main effects M 30

Number of joint effects J 30

Solution parameters N = l 30

Neighborhood size K 2

strategies we use threemeasurements: accuracy, adaptability, andaverage best-ofgeneration

fitness. AccuracyandadaptabilityhavebeenproposedbyTrojanowski

and Michalewicz (1999) and rate an algorithm’s overall performance with a single

number. Both metrics are offline performance measures (De Jong, 1975) and rely

on the assumption that the time steps at which the environment changes are

known beforehand. The accuracy value represents the fitness difference between

the best solution in the population at the generation just before a change and the

optimal fitness in that environment, averaged over the entire run. Let S denote

thetotalnumber ofenvironmental changesduringanoptimizationprocess, andas

before ∆g the number of generations between successive environmental changes.

We can define the accuracy metric as

Acc = 1 S

S∑

∆e i,∆g−1 , (6.2)

i=1

where∆e i,j isthefitnessdifferencebetween thebestsolutionatthejthgeneration

after the last environmental change (j ∈ [0,∆g −1]), and the optimal fitness of

the landscape after the ith environmental change (i ∈ [0,S −1]).

Theadaptabilityvaluerepresentsthefitnessdifferencebetweenthebestfitness

of each generation and the optimal fitness in the current environment, averaged

over the entire run. We can formally define this metric as

Ada = 1 S

(

S∑

i=1

1

∆g

∆g−1


j=0

∆e i,j

)

. (6.3)

For both metrics the optimal solution in an environment is approximated by

using the best solution found from running an elitist generational EA for 100


180 CHAPTER 6. FURTHER RESOURCING ISSUES

generations, 100 times independently. It is easy to see that an algorithm is better

thesmaller its accuracyandadaptabilitymeasurements are. As Acc decreases the

performance of an algorithm in tracking the optima within the ∆g generations

improves; a value of Acc = 0 means that the algorithm was always able to find

the optimum within ∆g generations. A small value of Ada indicates that the best

solution fitness found during an optimization run was always close to the optimal

fitness; a value of Ada = 0 means that the optimal fitness has never been lost.

The average best-of-generation fitness is a more standard per-generation measure,

suitable for plotting.

6.1.5 Experimental results

Figure 6.2 to 6.7 show the best-of-generation fitness of all dynamic optimization

strategies (except random immigrants) on MJ models and NK landscapes for

three different settings of #v and ∆g: #v = 2 and ∆g = 10 (Figure 6.2 and 6.3),

#v = 8 and ∆g = 25 (Figure 6.4 and 6.5), and #v = 15 and ∆g = 40 (Figure 6.6

and 6.7). The accuracy and adaptability measurements of all strategies (including

random immigrants) on these problem instances are shown in Table 6.3. For

comparison reasons, the table includes also the results obtained by triggered hypermutationusingabasemutationrateofp

m,b = 1/l = 1/30andahypermutation

rate of HR = 15. This base mutation rate is identical to the per-bit mutation

rate p m employed by the other dynamic optimization strategies; the hypermutation

rate is again set such that hypermutated solutions are equivalent to random

solutions.

For each of the three problem instances it is apparent that all strategies perform

better on the MJ model than on the NK landscape. This is due to the

higher degree of variable interactions featured by the NK landscape, which tends

to make the problem more difficult to solve. Let us analyze the performance of

the strategies on each of the problem instances in turn.

Figure 6.2 and 6.3 investigate the situation where the environment changes

only little but frequently. This is the most realistic and for us most interesting

case. Here we observe that:

• Environmental changes are easier to detect on NK landscapes (Figure 6.3)

than on the MJ model (Figure 6.2) because all variables have an effect

regardless of whether they are 0 or 1-bits (as opposed to 1-bits only on the


6.1. CHANGES OF VARIABLES 181

MJ model, #v=2, ∆g=10

Average best-of-generation fitness

0.1

0

-0.1

-0.2

-0.3

0 20 40 60 80 100

#Generations

Average best-of-generation fitness

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

Zoom on MJ model

0

70 75 80 85 90 95 100 105 110

#Generations

Figure 6.2: Plots showing the average best-of-generation fitness obtained by various

dynamic optimization strategies on a MJ model with #v = 2,∆g = 10. The bottom

plot zooms into the area indicated by the box in the top plot.

MJ model).

• Using a restart approach or triggered hypermutation results in a poor performance,

especially on NK landscapes (Figure 6.3). In the case of the

restart strategy, the reasons are the low frequency and severity of environmental

changes; there is simply insufficient time for the EA to converge to

a good population state before the environment changes again. The poor

performance of triggered hypermutation is due to a similar reason: while

environmental changes are reliably detected on this problem, the low base

mutation rate does not allow for a sufficiently quick convergence to fit regions

in the search space. From the accuracy and adaptability measures in

Table6.3weobserve thatahigherbasemutationrateimproves performance

significantly.


182 CHAPTER 6. FURTHER RESOURCING ISSUES

Average best-of-generation fitness

Average best-of-generation fitness

0.76

0.74

0.72

0.7

0.68

0.66

0.64

0.62

0.6

NK landscape, #v=2, ∆g=10

0.58

0 20 40 60 80 100

#Generations

0.76

0.75

0.74

0.73

0.72

0.71

0.7

0.69

0.68

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

Zoom on NK landscape

0.67

20 25 30 35 40 45 50 55 60

#Generations

Figure 6.3: Plots showing the average best-of-generation fitness obtained by various

dynamic optimization strategies on NK landscapes with #v = 2,∆g = 10. The bottom

plot zooms into the area indicated by the box in the top plot.

• The other three strategies yield similar performances though a slight advantage

is apparent for fair mutation, which is closely followed by the reevaluation

strategy; this is also confirmed by the accuracy and adaptability

measurements. Fair mutation seems to recover significantly faster than

the other strategies when there is a severe change in the landscape; see

generation 90 and 100 on the NK landscape (Figure 6.3) and MJ model

(Figure 6.2), respectively.

• The niching algorithm maintains a relatively high population diversity

throughout the search as it searches within several promising niches simultaneously.

This slows down the convergence, as canbe seen at the beginning

of the optimization, but it also increases the probability of finding new optima

quicker after undergoing an environmental change; see e.g. generation

50 and 80 on the NK landscape (Figure 6.3) and MJ model (Figure 6.2),

respectively.


6.1. CHANGES OF VARIABLES 183

Average best-of-generation fitness

MJ model, #v=8, ∆g=25

0.2

0.1

0

-0.1

-0.2

-0.3

-0.4

0 50 100 150 200 250

#Generations

Average best-of-generation fitness

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

Zoom on MJ model

180 200 220 240 260

#Generations

Figure 6.4: Plots showing the average best-of-generation fitness obtained by various

dynamic optimization strategies on a MJ model with #v = 8,∆g = 25. The bottom

plot zooms into the area indicated by the box in the top plot.

From Figure 6.4 and 6.5, where more variables are changed less frequently we

observe that:

• Changing a larger number of variables alters the fitness landscape more

severely and thus simplifies detecting environmental changes.

• The restart approach can sometimes outperform the other strategies on the

NK landscape (Figure 6.5). This is, for example, the case at generation 50,

where the landscape seems to undergo a relatively severe change.

• Fair mutation outperforms the other strategies again slightly. The fact that

only a small number of offspring solutions can be devoted to each new

variable reduces its ability to explore the search region spanned by the new

variables effectively.

• The deficit in the convergence speed of the niching algorithm is now more

apparent. This is because a more severe landscape change reduces the probability

that any of the currently occupied niches is close to the new optimal


184 CHAPTER 6. FURTHER RESOURCING ISSUES

Average best-of-generation fitness

NK landscape, #v=8, ∆g=25

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

0.64

0.62

0.6

0 50 100 150 200 250

#Generations

Average best-of-generation fitness

0.8

0.78

0.76

0.74

0.72

0.7

0.68

0.66

0.64

Zoom on NK landscape

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

180 200 220 240 260

#Generations

Figure 6.5: Plots showing the average best-of-generation fitness obtained by various

dynamic optimization strategies on NK landscapes with #v = 8,∆g = 25. The bottom

plot zooms into the area indicated by the box in the top plot.

searchregion; i.e.thepopulationneedstoshiftitssearchfocusmoreseverely

and this takes time when trying to maintain several niches simultaneously.

Figure 6.6 and 6.7 consider the case where half of all solution variables are

replaced with new ones every ∆g = 40 generation. This setting may simulate the

case where a human experimentalist wants to test many different drugs without

being able (e.g. due to storage or budget limitations) or wanting to change the

drug library size. From Figure 6.7 and the accuracy and adaptability measurements

inTable 6.3 it isapparent that a restart approach significantly outperforms

the other strategies on the NK landscape. On the MJ model (Figure 6.6), a

restart approach yields best results in terms of the accuracy metric with fair mutation

being best in terms of the adaptability metric. That is, a restart approach

is able to find, on average, a better solution fitness than fair mutation within the

∆g generations available between successive environmental changes. Fair mutation,

on the other hand, tends to be on average closer to the optimal fitness at

all times than the restart approach; this is due to the quicker convergence of fair


6.1. CHANGES OF VARIABLES 185

Average best-of-generation fitness

Average best-of-generation fitness

0.2

0.1

0

-0.1

-0.2

-0.3

0.14

0.12

0.1

0.08

0.06

0.04

0.02

MJ model, #v=15, ∆g=40

0 50 100 150 200 250 300 350 400

#Generations

Zoom on MJ model

Restart

Reevaluate

Niching

Trig. hypermutation

Fair mutation

Optimal fitness

0

280 300 320 340 360 380 400 420 440

#Generations

Figure 6.6: Plots showing the average best-of-generation fitness obtained by various

dynamic optimization strategies on a MJ model with #v = 15,∆g = 40. The bottom

plot zooms into the area indicated by the box in the top plot.

mutation at the initial phase upon an environmental change (see bottom plot of

Figure 6.6). The very small number of offspring solutions devoted to each new

variable reduces the local search abilities of fair mutation considerably, and this

translates into long-term performance effects. The niching algorithm performs

worst on the NK landscape. The reason is again its deficit in the convergence

speed combined with the severity of the environmental changes. On the NK

landscape, triggered hypermutation performs sometimes quite well, losing often

only to the restart approach (see e.g. generation 200, 250, and 440 in Figure 6.7).

6.1.6 Summary and conclusion

In this study we have compared different dynamic strategies for enabling a generational

EA to cope with problems that are subject to changes of variables,

motivated by a real problem in drug discovery. We proposed a strategy, which

we call fair mutation, specifically designed for dealing with such problems, and

compared it against five standard strategies applied in dynamic optimization.


186 CHAPTER 6. FURTHER RESOURCING ISSUES

NK landscape, #v=15, ∆g=40

Average best-of-generation fitness

0.75

0.7

0.65

0.6

0.55

0 50 100 150 200 250 300 350 400

#Generations

Average best-of-generation fitness

0.78

0.76

0.74

0.72

0.7

Zoom on NK landscape

0.68

0.66

Restart

Reevaluate

0.64

Niching

Trig. hypermutation

0.62 Fair mutation

Optimal fitness

0.6

280 300 320 340 360 380 400 420 440

#Generations

Figure 6.7: Plots showing the average best-of-generation fitness obtained by various

dynamic optimization strategies on NK landscapes with #v = 15,∆g = 40. The

bottom plot zooms into the area indicated by the box in the top plot.

The results have shown that only little additional diversity if at all needs to be

introduced into the population when changing a few variables frequently, or optimizing

a landscape with a low degree of variable interactions. Here, very good

results were also obtained using a niching algorithm or a standard elitist EA that

simply re-evaluates the population after a variable change and then carries on

as normal. When changing many variables (around 10 or more), particularly on

a quite epistatic fitness landscape, restarting the optimization from scratch has

shown to be often the best choice.

Future work should investigate how to exploit further the correlation existing

between the changed variables and the parts of the landscape undergoing a

change, featured by optimization problems subject to changes of variables. If the

optimizer were allowed to decide the point of time (generation) at which variables

are exchanged, then optimizing this aspect might be worth considering in the design

of strategies too; this kind of scenario has been considered by Olhofer et al.

(2001) for a shape design optimization problem but more work is needed to fully


6.2. LETHAL ENVIRONMENTS 187

Table 6.3: Accuracy and adaptability results of various dynamicoptimization strategies

on different problem instances of dynamic NK landscapes and MJ models. For each

problem instance and metric, we highlighted all strategies in bold face that are not

significantly worse than any other strategy. The statistical test applied here is the

non-parametric Kruskal-Wallis test with a significance level of 5% (2-sided).

Dynamic optimization Met- #v = 2,∆g = 10 #v = 8,∆g = 25 #v = 15,∆g = 40

strategy ric NK MJ NK MJ NK MJ

Fair mutation

Restart

Re-evaluation

Niching

Acc 0.0163 0.0155 0.0150 0.0002 0.0193 0.0011

Ada 0.0242 0.0344 0.0298 0.0210 0.0480 0.0279

Acc 0.0520 0.0815 0.0151 0.0022 0.0076 0.0003

Ada 0.0942 0.2133 0.0552 0.1232 0.0395 0.0609

Acc 0.0170 0.0163 0.0202 0.0003 0.0202 0.0012

Ada 0.0252 0.0350 0.0302 0.0220 0.0516 0.0282

Acc 0.0242 0.0290 0.0188 0.0032 0.0264 0.0012

Ada 0.0321 0.0471 0.0314 0.0271 0.0560 0.0296

Triggered hyperm. Acc 0.0387 0.0313 0.0376 0.0334 0.0241 0.0290

p m,b = 0.001,HR = 500 Ada 0.0492 0.0554 0.0586 0.0778 0.0552 0.0833

Triggered hyperm Acc 0.0194 0.0195 0.0163 0.0004 0.0146 0.0015

p m,b = 1/l,HR = 15 Ada 0.0299 0.0442 0.0360 0.0354 0.0487 0.0444

Random Acc 0.0214 0.0228 0.0209 0.0011 0.0231 0.0019

immigrants Ada 0.0297 0.0434 0.0357 0.0266 0.0537 0.0337

understand how algorithmic choices affect performance. Furthermore, in closedloop

applications like our scenario, variables might not only be associated with

resources (e.g. drugs) but their use may also be subject to resource-constraints:

e.g. the use of drugs might be associated with costs, or drugs might have to be

bought in batches. Here, a good strategy for dealing with changing variables

has to account for both fitness gradients and variable experimental costs. For

instance, while a restart approach is likely to be quite expensive under these circumstances,

a strategy that aims at wasting very little resources might take too

long to find fit solutions. Thus, future research into the development of strategies

that perform well in the presence of resource-constraints is needed too.

6.2 Optimization in lethal environments

In this section we consider a resourcing issue related to what we term ‘lethal

environments’. Unlike problems subject to ephemeral resource constraints and


188 CHAPTER 6. FURTHER RESOURCING ISSUES

changes of variables, this resourcing issue yields an optimization problem featuring

a static feasible region and fitness landscape. However, the population size

may reduce over the course of the optimization as a function of poor (lethal) solutions

evaluated. We define a lethal solution as one with a fitness below a certain

threshold. This setup models certain closed-loop evolution scenarios that may

be encountered, for example, when evolving nano-technologies or autonomous

robots. The next section motivates this optimization scenario with a specific example.

We then look at the tuning of some of the basic EA configuration parameters

for coping with lethal environments: degree-of-elitism, population size,

selection pressure, mutation mode and strength, and crossover rate. We also vary

aspects of the fitness landscape we are optimizing to observe which landscape

topologies pose a particular challenge when optimizing in a lethal environment.

Section 6.2.2 will describe the setup of this study, and the study itself will be

conducted in Section 6.2.3. Finally, Section 6.2.4 concludes this study.

6.2.1 Motivation

We consider the use of EAs for optimization in a setting where the notions of an

individual and of a population size, are slightly different from standard, which

forces us to reconsider how to configure an EA appropriately. The general setting

is closed-loop optimization but our concern is for a particular sort of closed-loop

setting where the hardware on which individuals are tested are reconfigurable,

destructible and non-replaceable.

Consider the following scenario. We are developing, by evolution, the control

software for an autonomous endoscopic robot (similar to Moglia et al. (2007);

Ciuti et al. (2010)), which is ultimately intended to be swallowed by a patient in

capsule form, and then used within the body for screening, diagnosis and therapeutic

procedures. Before robotic capsules are used on humans, however, their

reliability and effectiveness typically needs to be first validated through in vivo

animal trials. Imagine we have available µ prototypes of a robotic capsule, and

we can “radio in” new control software to each of these individual capsules and

evaluate them. However, our control software may cause a capsule to malfunction

in a lethal (for it) way, in which case it is no longer available (and falls out of the

evolving population). In such a scenario, our population size for the remainder

of the evolution is now at most µ −1, in terms of how many pieces of software

can be tested each generation. If we continue to explore too aggressively, we may


6.2. LETHAL ENVIRONMENTS 189

upload other pieces of software that cause the loss of individuals, which will cost

evolution in terms of the extent of parallelism (and hence efficiency in time) we

enjoy, and also perhaps in the loss of expensive hardware. So whilst we wish to

innovate by testing new control policies, this must be balanced carefully against

considerations of maintaining a reasonable population size.

This setting may seem far-fetched to some readers. However, as we see evolution

being applied more and more at the boundary between a full embodied

type of evolution (where the hardware individuals may reproduce or exchange information)

3 and a fully in silico simulation-only approach, especially as EAs are

taken up in the experimental sciences more and more, it is likely that variants of

the above scenario will emerge. It need not be robotic capsules; it could equally

be nano-machines or drone planes we are dealing with; it could be robot swarms,

or software released on the Internet, or anything that is reconfigurable remotely

in some way, and also destructible.

Clearly, theprincipleof“survivalofthefittest”isgoingtobesomewhat dampened

by the constraint that we wish all designs to survive. Indeed, no innovation

at all will be possible if we enforce the evaluation of only safe designs, given the

assumption that any unseen design could be lethal. So, as is usual with evolution,

we should expect that some rate of loss of individuals is going to be desirable, but

set against this is the fact that they cannot be replaced, and hence population

size and parallelism will be compromised. So, the question is, how should one

control the exploration/exploitation trade-off in this optimization setup.

In more natural settings than we consider here, the notion of robustness and

the tendency of evolution to build in mutational robustness for free has been

previously studied, especially in the context of evolution on neutral plateaux (see

e.g. Bullock (2003); Schonfeld and Ashlock (2004); Schonfeld (2007)). Bullock’s

workinparticularshowsthatcertainin-builtbiasestowardmutationalrobustness

can retard innovation (our primary concern here). He proposed methods that

avoidthebiasandhenceexploreneutral plateauxmorerapidly. Althoughrelated,

this work has a different purpose to ours, and also involves different assumptions.

6.2.2 Experimental setup

This section describes the search algorithms for which we investigate the impact

of a lethal optimization scenario, and the parameter settings as used in the

3 For concrete examples of embodied evolution applications please refer to Section 2.2.


190 CHAPTER 6. FURTHER RESOURCING ISSUES

subsequent experimental analysis.

6.2.2.1 Search Algorithms

We consider three types of search algorithms in this study: a tournament selection

based genetic algorithm (TGA), a modified version of it, which we call

RBS (standing for reproduction of best solutions), and a population of stochastic

hill-climbers (PHC). Similar types of algorithms have been considered in the

previously mentioned studies related to mutational robustness.

Before we describe each algorithm, we first set out the procedures common

to all of them. These are the generation process of the initial population of candidate

solutions, the setting of the fitness threshold below which solutions are

deemed lethal, the mutation operators, and the handling of duplicate solutions.

Regarding the initialization, our aim is to simulate the scenario where the initial

population, whose size we denote by µ 0 (the reason for adding a subscript to µ is

described below), consists of evolving entities (e.g. roboticcapsules, robotswarms

or software) that are of a certain (high or state-of-the-art) quality. We achieve

this by first generating a sample set of S random and non-identical candidate solutions,

and then selecting the fittest µ 0 solutions from this set to form the initial

population. The lethal fitness threshold f LFT is set to the fitness value of the qth

fittest solution of the sample set. This threshold is kept constant throughout an

algorithmic run.

For mutation, we will investigate two modes (similar to Barnett (2001)):

(i) Poisson mutation, where each bit is flipped independently with a mutation

rate of p m , and (ii) constant mutation, where exactly d randomly selected bits

are flipped. Poisson mutation is the more commonly used mutation mode of the

two and also the mode we have used in this thesis so far. Furthermore, all search

algorithms ensure that a solution is not evaluated multiple times. That is, all

evaluated solutions are cached and compared against a new solution (offspring)

before performing an evaluation; preliminary experimentation showed that avoiding

duplication improves the performance significantly in the presence of lethal

environments. If a solution has been evaluated previously, then the GA iteratively

generates new solutions (offspring solutions) until it generates one that is

evaluable, i.e. has not been evaluated previously, or until L trials have passed

without success. In the latter case, we reset the mutation rate to p m = p m +0.5/l

(l is the total number of bits) and d = d+1 in the case of Poisson mutation and


6.2. LETHAL ENVIRONMENTS 191

Algorithm 6.3 Tournament selection based GA (TGA)

Require: f (objective function), G (maximal number of generations), µ 0 (initial parent population

size), λ 0 (initial offspring population size), L (maximal number of regeneration

trials)

1: g = 0 (generation counter), Pop = ∅ (current population), OffPop = ∅ (offspring population),

AllEvalSols = ∅ (set of solutions evaluated so far), trials = 0 (regeneration trials

counter), counter = 0 (auxiliary variable indicating the number of solutions evaluated during

a generation)

2: initialize Pop and set lethal fitness threshold f LFT ; copy all solutions of Pop also to

AllEvalSols, and set µ g = µ 0 ,λ g = λ 0

3: while g < G ∧ µ g > 0 do

4: OffPop = ∅;trials = 0;counter = 0

5: set mutation rate to initial value

6: while counter < λ g do

7: generate two offspring ⃗x (1) and ⃗x (2) by selecting two parents from Pop, and then

recombining and mutating them

8: for i = 1 to 2 do

9: if ⃗x (i) /∈ AllEvalSols ∧ counter < λ g then

10: evaluate ⃗x (i) using f; counter++; trials = 0

11: AllEvalSols = AllEvalSols∪⃗x (i)

12: if f(⃗x (i) ) ≥ f LFT then

13: OffPop = OffPop∪⃗x (i) //non-lethalsolutionsonlyareaddedtotheoffspring

population

14: else trials++

15: if trials = L then

16: depending on the mutation operator, reset mutation rate to p m = p m + 0.5/l

(Poisson mutation) or d = d+1 (constant mutation)

17: reset new population sizes to µ g = λ g = |OffPop|, and form new population Pop by

selecting the best µ g solutions from the union population of Pop∪OffPop

18: g++

constant mutation, respectively; note, due to its deterministic nature, it is more

likely that this reset step is applied with constant mutation. For TGA and RBS

we set the mutation rates back to their initial values at the beginning of each

new generation. For PHC it makes more sense to set the mutation rates back

whenever a new hill-climber of the current population is considered for mutation

because of the independent search of the hill-climbers. The idea of increasing

the mutation strength temporarily is to increase an EA’s exploration abilities to

allow it to create solutions that have not been evaluated in the past but are close

to the search region currently under investigation.

Tournament Selection Based GA (TGA): The algorithm is an EA with a

(µ+λ)-ES reproduction scheme (see Algorithm 6.3), as we have used in previous

studies in this thesis. This elitist scheme performed significantly better than a

standard generational reproduction scheme in lethal environments. We equip


192 CHAPTER 6. FURTHER RESOURCING ISSUES

the EA with tournament selection with replacement for parental selection (TS

shall denote the tournament size) and uniform crossover (p c shall denote the

crossover rate). We explained above the two mutation modes to be investigated

later. We set the parent and offspring population size to be identical or µ =

λ. As the population size may decrease during the optimization process (due

to evaluated lethal solutions), in future, we will denote the population size at

generation g (0 ≤ g ≤ G) by µ g = λ g with g = G being the maximum number of

generations. Clearly, the maximum number of generations can only be reached if

the population does not “die out” beforehand or µ g = 0 for g < G.

Reproduction of best solutions (RBS): This algorithm is based on TGA

andmotivated by the netcrawler of Barnett (2001). The netcrawler is designed to

efficiently solve fitness landscapes featuring neutral networks. It reproduces only

a single best solution at each generation using mutation only, and the mutation

strength adapts to the size of each neutral network of the landscape. Similar to

the netcrawler, RBS reproduces only the best solutions. More precisely, it selects

parentsforreproductionexclusively andatrandomfromthesetofsolutionsofthe

current populationwiththehighest fitness. Hence, itcanbeseenasatournament

selection based GA with TS = µ g , and where the tournament participants are

selected without replacement. Apart from the selection procedure, RBS will use

the same algorithmic setup as TGA.

Population of Hill-Climbers (PHC): This search algorithmmaintains apopulation

of stochastic hill-climbers, which explore the landscape independently of

each other. Each hill-climber undergoes mutation, and, if the mutant or offspring

is not lethal, it replaces the original solution if it is at least as fit. Algorithm 6.4

shows the pseudocode of PHC. In essence, PHC with constant mutation is identical

to running a set of random-mutation hill-climbers (Forrest and Mitchell,

1993) in parallel with the difference that, as in the case of TGA, the mutation

strength is temporarily increased if a previously unseen solution cannot be generated

within L trials. PHC has similarities to Barnett’s netcrawler, and also

the plateau crawler of Bullock (2003). The working procedure of the plateau

crawler closely resembles a population of independent hill-climbers. Similar to

the netcrawler, it is has been designed to be immune to selection for mutational

robustness when optimizing fitness landscapes featuring neutral networks. The

property common to all three approaches (netcrawler, plateau crawler and PHC)

is that they guarantee that each parent potentially leaves a copy or a slightly


6.2. LETHAL ENVIRONMENTS 193

Algorithm 6.4 Population of hill-climbers (PHC)

Require: f (objective function), G (maximal number of generations), µ 0 (initial parent population

size), L (maximal number of regeneration trials)

1: g = 0 (generation counter), Pop = ∅ (current population), OffPop = ∅ (offspring population),

AllEvalSols = ∅ (set of solutions evaluated so far), trials = 0 (regeneration trials

counter), lethal = false (auxiliary boolean variable indicating whether a hill-climber is

lethal)

2: initialize Pop and set lethal fitness threshold f LFT ; copy all solutions of Pop also to

AllEvalSols, and set µ g = µ 0

3: while g < G ∧ µ g > 0 do

4: OffPop = ∅;trials = 0

5: for i = 1 to µ g do

6: set mutation rate to initial value and lethal = false

7: repeat

8: mutate solution ⃗x i of Pop to obtain mutant ⃗x ′ i

9: if ⃗x ′ i /∈ AllEvalSols then

10: evaluate ⃗x ′ i using f; trials = 0

11: AllEvalSols = AllEvalSols∪⃗x ′ i

12: if f(⃗x ′ i ) ≥ f LFT then

13: if f(⃗x ′ i ) ≥ f(⃗x i) then

14: OffPop = OffPop∪⃗x ′ i

15: else OffPop = OffPop∪⃗x i

16: else trials++

17: if trials = L then

18: depending on the mutation operator, reset mutation rate to p m = p m + 0.5/l

(Poisson mutation) or d = d+1 (constant mutation)

19: until ⃗x ′ i is evaluable

20: reset new population size to µ g = |OffPop|, and form new population Pop by copying

all solutions from OffPop to Pop

21: g++

modified version of itself in the population. This ensures that offspring, parents,

or other ancestors, are equally likely to be selected as future parents. For fitness

landscapes with neutral networks, this feature can cause a population to be unaffected

by mutational robustness as it allows the populationto spend at each point

on a neutral network an equal amount of time (Hughes, 1995; Bullock, 2003).

6.2.2.2 Parameter Settings

The experimental study will investigate different settings of the parameters involved

in the three search algorithms (TGA, RBS and PHC). However, if not

otherwise stated, we use the default settings as given in Table 6.4 with constant

mutation being the default mutation operator. Notice that this setup, especially

the setting of p c ,TS, and the mutation mode, is significantly different from previous

algorithm setups employed in this thesis. Recall that RBS uses the same


194 CHAPTER 6. FURTHER RESOURCING ISSUES

Table 6.4: Default parameter settings of search algorithms.

Algorithm Parameter Setting

TGA,

RBS,

PHC

TGA

RBS

Initial parent population size µ 0 30

Constant mutation rate d 1

Regeneration trials L 1000

Number of generations G 200

Sampling size S 1000

Solution rank q for lethal

fitness threshold setting

250

Initial offspring population size λ 0 30

Tournament size TS (selection

with replacement is applied)

4

Crossover rate p c 0.0

Initial offspring population size λ 0 30

Tournament size TS (selection

without replacement is applied)

µ g ,0 ≤ g ≤ G

Crossover rate p c 0.0

default parameter settings as TGA with the difference that reproductive selection

is done with a tournament size of TS = µ g ,0 ≤ g ≤ G, and tournament

participants being chosen without replacement.

To investigate the effect of lethal environments on the performance of the

algorithms introduced above we consider NKα landscapes (see Section 5.1.2.2 for

a description of this problem). We analyze the impact of lethal solutions using

different settings of the neighborhood size K and the model parameter α. Recall

that the parameter K allows us to control the degree of epistasis or ruggedness

of the landscape (larger values of K represent more epistasis), while the model

parameter α allows us to specify how influential some variables may be compared

to others (larger values of α assign more influence to a minority of variables).

We fix the search space size to N = l = 50, a setting we have used and justified

previously.

Any results shown are average results across 100 independent algorithm runs.

As before, we will use a different randomly generated problem instance for each

run but, of course, the same instances for all algorithms. To allow for a fair

comparison, we also use the same initial population and (hence) lethal fitness

threshold for all algorithms in any particular run.


6.2. LETHAL ENVIRONMENTS 195

Model parameter α

2

1.5

1

0.5

0

TGA - Average best solution fitness

obtained in lethal environment

0 1 2 3 4 5 6 7 8 9 10

0.77

0.76

0.75

0.74

0.73

0.72

0.71

0.7

0.69

0.68

0.67

0.66

Model parameter α

1.5

0.5

TGA - Average absolute difference between

the best solution fitness found in

a lethal and non-lethal environment

2

1

0

0 1 2 3 4 5 6 7 8 9 10

0.005

0

-0.005

-0.01

-0.015

-0.02

-0.025

-0.03

-0.035

-0.04

-0.045

-0.05

#Neighbors K

#Neighbors K

TGA - Survived proportion of the population

at generation G=200, or µ G /µ 0

2

1

0.9

TGA - Average absolute difference between

the best solution fitness found in a non-lethal

environment and the lethal fitness threshold

2

0.24

0.23

0.8

0.22

Model parameter α

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Model parameter α

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

0.21

0.2

0.19

0.18

0.17

0.16

0.15

0.14

#Neighbors K

#Neighbors K

Figure 6.8: Plots showing the average best solution fitness found by TGA (using TS =

2,p c = 0.0 and Poisson mutation with p m = 1/N) in a lethal environment (top left),

the average absolute difference between this fitness and the fitness achieved in a nonlethal

environment, i.e. f LFT = 0 (top right), the average proportion of the population

that survived at generation G = 200, or µ G /µ 0 (bottom left), and the average absolute

difference between the best solution fitness found and the lethal fitness threshold in

a non-lethal environment (bottom right), on NKα landscapes as a function of the

neighborhood size K and the model parameter α.

6.2.3 Experimental results

Let us first analyze the effect of lethal environments on evolutionary search for

different topologies of the NKα landscapes. Figure 6.8 shows the effect on the

best solution fitness found (top plots) and the (remaining) population size (bottom

left plot) for TGA with a rather traditional setup (TS = 2,p c = 0.0 and

Poisson mutation with p m = 1/l) as a function of the parameters K and α; the


196 CHAPTER 6. FURTHER RESOURCING ISSUES

bottom right plot indicates the scope available for optimization after the initialization

of the population. From the top plots we can see that the performance is

most affected for fitness landscapes that are rugged (large K) but contain some

structure in the sense that some solution bits are more important than others

(large α). As is apparent from the bottom left plot, the reason for this pattern

is that the population size at the end of the optimization decreases as K and/or

α increase, i.e. lethal solutions are here more likely to be encountered. In fact,

for large K and α, the optimization terminates on average after only about 25

generations (out of 200); this result is not shown here.

The reason that the population size reduces for large K (bottom left plot of

Figure6.8)isthatthefitnesslandscapebecomesmorerugged, which increases the

risk of evaluating lethal solutions despite them being located close to high-quality

solutions. Although an increase in α introduces more structure into the fitness

landscape, ithas also theeffect thatsolutions which have some ofthese important

bits set incorrectly are more likely to be poor. A plausible alternative explanation

for the poor performance at large α may be that the lethal fitness threshold is

very close to the best solution fitness that can be found, thus reducing the scope

of optimization possible and causing many solutions to be lethal; the bottom

right plot rules this explanation out. Regarding the (remaining) population size,

a greater selection pressure, i.e. a larger tournament size, can help to maintain

a large population size for longer, particularly for large K and small α. On the

other hand, being more random in the solution generation process by using, for

example, a larger mutation and/or crossover rate, has the opposite effect, i.e. the

population size decreases more quickly.

The effect on the (remaining) population size translates into an impact on

the average best solution fitness. From Figure 6.9, we can see that, for rugged

landscapes with structure (i.e. in the range K > 5,α > 1), TGA is outperformed

by a GA with a rather traditional setup and a PHC in a non-lethal optimization

scenario (right plots) but performs better than both algorithms in a lethal

optimization scenario (left plots). In fact, as we will also see later, in the range

K > 5,α > 1, PHC performs best among all the search algorithms considered in

the non-lethal environment but worst in the lethal environment. The poor performance

of PHC in this range is due to the fact that the hill-climbers in the

population are searching the fitness landscape independently of each other. This

may be an advantage in a non-lethal environment because it can help to maintain


6.2. LETHAL ENVIRONMENTS 197

Model parameter α

Model parameter α

Ratio P(f(x) > f TGA )/P(f(x) > f TGA,TS=2,pc =0.0,p m =1/l ),

Ratio P(f(x) > f TGA )/P(f(x) > f TGA,TS=2,pc =0.0,p m =1/l ),

lethal optimization

10 1

non-lethal optimization

10 2

lethal optimization

10 2 non-lethal optimization

10 3

2

2

10 0

1.5

1.5

10 -1

10 1

1

1

10 -2

0.5

0.5

10 0

10 -3

0

0

10 -4

10 -1

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

Ratio P(f(x) > f TGA )/P(f(x) > f PHC ),

Ratio P(f(x) > f TGA )/P(f(x) > f PHC ),

2

10 1

2

10 2

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

10 0

10 -1

10 -2

10 -3

10 -4

10 -5

Model parameter α

Model parameter α

1.5

1

0.5

0

0 1 2 3 4 5 6 7 8 9 10

#Neighbors K

Figure 6.9: Plots showing the ratio P(f(x) > f TGA )/ P(f(x) > f TGA,TS=2,pc=0.0,p m=1/l)

(top plot) and P(f(x) > f TGA )/ P(f(x) > f PHC ) (bottom plots) in a lethal (left plots)

and non-lethal environment (right plots) on NKα landscapes as a function of K and

α. Here, x is a random variable that represents the best solution from a set of solutions

drawn uniformly at random from the search space and f ∗ the average best

solution fitness obtained with strategy ∗. If P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) > 1, then strategy

∗∗ is able to achieve a higher average best solution fitness than strategy ∗ and a

greater advantage of ∗∗ is indicated by a darker shading in the heat maps; similarly, if

P(f(x) > f ∗ )/P(f(x) > f ∗∗ ) < 1, then ∗ is better than ∗∗ and a lighter shading indicates

a greater advantage of ∗.

10 1

10 0

10 -1

10 -2

10 -3

diversity in the population and prevent premature convergence. However, in a

lethal environment, uncontrolled diversity is rather a drawback as it increases the

probability of generating solutions with genotypes that are significantly different

from the ones in the population, which in turn increases the probability of generating

lethal solutions. The less diverse and random optimization of TGA is also

the reason that it outperforms a GA with a more traditional setup on rugged


198 CHAPTER 6. FURTHER RESOURCING ISSUES

landscapes with structure.

Let us now analyze how the search algorithms fare with different parameter

settings. Table 6.5 shows the average best solution fitness obtained by different

algorithm setups in a lethal and non-lethal environment (values in parenthesis)

on NKα landscapes with K = 4,α = 2.0; Table 6.6 shows the same information

for NKα landscapes with K = 10,α = 2.0. Note that due to the nature of NKα

landscapes (particularly due to the absence of plateaus or neutral networks in

the landscapes), there was only a single best solution in the population at each

generation. For RBS, this has the effect that the performance is independent

of the crossover rate because crossover is applied to identical solutions. The

two tables largely confirm the observation made above: while a relatively high

degree of population diversity and randomness in the solution generation process

may be beneficial in a non-lethal environment (which is particularly true for

rugged landscape as can be seen from the Table 6.6), it is rather a drawback in

a lethal environment as it may increase the risk of generating lethal solutions

and subsequently limit the effectiveness of evolutionary search. More precisely,

for a lethal environment, the tables indicate that one should use a GA instead

of a PHC, and reduce randomness in the solution generation process by avoiding

crossover (generally), and using constant mutation as well as a relatively large

tournament size. The poor performance of Poisson mutation, which is a typical

mutation mode for GAs, is related to situations where an over-averaged number

of solution bits is flipped at once. Such variation steps lead to offspring that are

located in the search space far away from their non-lethal parents, which can in

turn increase the probability of generating lethal solutions.

Finally, Figure6.10analyzes whether evolving alargeset ofentities forasmall

number of generations yields better performance than evolving only a few entities

for many generations; i.e. how is the trade-off between the initial population size

µ 0 and the maximum number of generations G. For a non-lethal environment

(right plots), we make two observations: (i) best performance is achieved with

PHCusingarathersmall populationsizeofaboutµ 0 = 30, and(ii)forpopulation

sizes µ 0 > 50 one should use a GA whereby the applied tournament size or

selection pressure should increase with the population size. As the population

size increases, the maximum number of generations G available for optimization

is simply insufficient for PHC or a GA with low selection pressure and/or much


6.2. LETHAL ENVIRONMENTS 199

K=4, α=2.0, lethal optimization

K=4, α=2.0, non-lethal optimization

Average best solution fitness

0.72

0.71

0.7

0.69

0.68

0.67

Average best solution fitness

0.72

0.71

0.7

0.69

0.68

0.67

0 50 100 150 200 250

0 50 100 150 200 250

Initial population size µ 0

Initial population size µ 0

0.75

K=10, α=2.0, lethal optimization

0.75

K=10, α=2.0, non-lethal optimization

Average best solution fitness

0.74

0.73

0.72

0.71

0.7

0.69

0.68

0.67

0.66

0 50 100 150 200 250

Average best solution fitness

0.74

0.73

0.72

0.71

0.7

TGA, TS=1

0.69

TGA, TS=2

0.68

TGA, TS=4

RBS

0.67

PHC

TGA, TS=2,p c =0.0,p m =1/N

0.66

0 50 100 150 200 250

Initial population size µ 0

Initial population size µ 0

Figure 6.10: Plots showing the average best solution fitness obtained by different search

algorithms in a lethal environment (left) and non-lethal environment (right) on NKα

landscapes with K = 4,α = 2.0 (top) and K = 10,α = 2.0 (bottom) as a function of

the initial population size µ 0 ; the maximum number of fitness evaluations available for

optimization was fixed at 6000 (corresponding to the default setting G = 200,µ 0 = 30),

i.e. G = ⌊6000/µ 0 ⌋. If not otherwise stated, an algorithm used a constant mutation

mode with d = 1 and no crossover, i.e. p c = 0.0.

randomness in the solution generation process to converge quickly to a highquality

part of the search space. However, on the other hand, a GA with too

much selection pressure may cause the population to get stuck at local optima,

especially on rugged fitness landscapes (see result of RBS in the bottom right

plot). In a lethal optimization scenario (left plots), a GA clearly outperforms

PHC (regardless of the population size) due to the reasons mentioned above.

Furthermore, unlike to the non-lethal case, we observe that a GA should use an

above-average largetournament size alreadyforsmall populationsizes of µ 0 > 30.

Similar to the non-lethal case, however, there is a saturation point with RBS,

beyond which a further increase in the population size has no effect (see e.g.

bottom left plot).


200 CHAPTER 6. FURTHER RESOURCING ISSUES

6.2.4 Summary and conclusion

In this study we have conducted an initial investigation of the impact of lethal

solutions (here modeled as candidate solutions below a certain fitness threshold)

onevolutionary search. Inessence, theeffect ofevaluating a lethal solutionisthat

the solution is immediately removed from the population and the population size

is reduced by one. This kind of scenario can be found in natural evolution, where

the presence of lethal mutants may promote robustness over evolvability, but it is

also a characteristic of certain closed-loop evolution applications, for example, in

autonomousrobotsornano-technologies. Whenfacedwiththiskindofscenarioin

optimization, the challenge is to discover innovative and fit candidate solutions

without reducing the population too rapidly; i.e. trading off evolvability with

robustness.

Our analysis has been focused on the following three main aspects: (i) analyzing

the impact of lethal solutions on a standard evolutionary algorithm (EA)

framework, (ii) determining challenging fitness landscape topologies in a lethal

environment, and (iii) tuning EAs to perform well within a lethal optimization

scenario. Generally, the presence of lethal solutions can affect that performance

of EAs, but the largest (negative) impact was observed for fitness landscapes that

are rugged and possess some structure in the sense that some solution bits are

more important than others; in terms of our test functions, which were NKα

landscapes, this type of landscape corresponds to large values of K and α. For

this fitness landscape topology, we obtained best performance in a non-lethal

environment using a small population of stochastic hill-climbers. In a lethal environment,

however, significantly better results were obtained using an EA that

limits randomness in the solution generation process by employing elitism, a relatively

large selection pressure, a constant mutation mode (i.e. flipping exactly

one solution bit as opposed to flipping each bit independently with some low

probability), and no or a small crossover rate. Also, in a lethal optimization scenario

and on the test problems considered, we observed that an EA that evolves

a large set of entities (e.g. autonomous robots or software) for a small number of

generations performs better than one that evolves a few entities for many generations.

The practical implication of this is that, if possible, a larger budget should

be allocated for the production of the entities in the first place rather than for

attempting to prolong the testing or optimization phase.

As with the other resourcing issues considered in this thesis, there remains


6.3. CHAPTER SUMMARY 201

much else to learn about optimization problems subject to lethal environments.

The next step could be to design strategies that specifically account for lethal

environments (rather than to tune EA operators).

6.3 Chapter summary

In this chapter we have investigated the impact of two resourcing issues on evolutionary

search applied to closed-loop optimization: changes of variables, and

lethal environments. Problems subject to changes of variables feature a dynamically

changing fitness landscapes (but a static feasible region) and thus belong

to the class of dynamic optimization problems. However, unlike standard problems

in this domain, it is known when changes are applied to the landscape

(the point of time may even be controlled), and which part of the search space

undergoes a change. We proposed and analyzed a strategy, which we call fair mutation,

that specifically exploits these two properties. Lethal environments yield

optimization problems that limit the number of solutions evaluable below a certain

fitness threshold; we refer to these poor solutions as lethal solutions. This

setup models certain closed-loop evolution scenarios that may be encountered,

for example, when hardware (e.g. autonomous robots) on which solutions (e.g a

control software) are tested is available in limited quantity only. To cope with

lethal environments, we looked at the tuning of some of the basic EA configuration

parameters: degree-of-elitism, population size, selection pressure, mutation

mode and strength, and crossover rate. The main finding of our study was that

lethal environments can be dealt with effectively, even when tuning only some of

the basic parameters of EAs, however, the optimal EA configuration depends on

the fitness landscape to be optimized.

In the following chapter we will conclude the thesis. We will draw together

the findings and their implications, and discuss directions for further research.


202 CHAPTER 6. FURTHER RESOURCING ISSUES

Table6.5: Thetableshowstheaverage bestsolution fitnessfoundinalethal environmentafter G = 200generations, andinparenthesis,

the fitness found in a non-lethal environment, for different algorithm setups on K = 4,α = 2.0; the results of random sampling were

obtained by generating G×µ 0 = 200×30 = 6000 solutions per run at random and averaging over the best solution fitness values found.

The numbers in the subscript indicate the rank of the top five algorithm setups within the respective environment. We highlighted

all algorithm setups (among the top five) in bold face that are not significantly worse than any other setup. A Friedman test revealed

a significant difference between the search algorithm setups in general, but differences among the individual setups were tested for in

a post-hoc analysis using (paired) Wilcoxon tests (significance level of 5%) with Bonferroni correction.

TGA

Search algorithm

TS = 1

TS = 2

TS = 4

Constant mutation Poisson mutation

d = 1 d = 2 p m = 0.5/N p m = 1.0/N

p c = 0.0 0.722 4 (0.7297 4 ) 0.6998 (0.7286) 0.7174 (0.7272) 0.7141 (0.7285)

p c = 0.25 0.7188 (0.7272) 0.6969 (0.7296 5 ) 0.7128 (0.7275) 0.7029 (0.7296)

p c = 0.5 0.713 (0.7279) 0.6901 (0.7287) 0.7091 (0.7276) 0.7078 (0.7277)

p c = 0.0 0.7223 3 (0.7269) 0.7053 (0.7287) 0.7198 (0.7271) 0.7175 (0.7275)

p c = 0.25 0.7217 (0.7263) 0.7039 (0.7284) 0.7172 (0.7263) 0.7119 (0.7275)

p c = 0.5 0.7178 (0.7261) 0.7003 (0.728) 0.7176 (0.7268) 0.7135 (0.727)

p c = 0.0 0.7218 5 (0.7266) 0.7107 (0.726) 0.7203 (0.7269) 0.7184 (0.7269)

p c = 0.25 0.7238 1 (0.7257) 0.7085 (0.7254) 0.7196 (0.7273) 0.7169 (0.7278)

p c = 0.5 0.7224 2 (0.7251) 0.7101 (0.7256) 0.7172 (0.7263) 0.7167 (0.726)

RBS 0.7179 (0.7226) 0.7129 (0.723) 0.7177 (0.7229) 0.7172 (0.7223)

PHC 0.7117 (0.7334 2 ) 0.6843 (0.7271) 0.7033 (0.7343 1 ) 0.6912 (0.7329 3 )

Random sampling 0.6197 (0.6358)


Table6.6: Thetableshowstheaverage bestsolution fitnessfoundinalethal environment afterG = 200generations, andinparenthesis,

the fitness found in a non-lethal environment, for different algorithm setups on K = 10,α = 2.0; the results of random sampling were

obtained by generating G×µ 0 = 200×30 = 6000 solutions per run at random and averaging over the best solution fitness values found.

The numbers in the subscript indicate the rank of the top five algorithm setups within the respective environment. We highlighted

all algorithm setups (among the top five) in bold face that are not significantly worse than any other setup. A Friedman test revealed

a significant difference between the search algorithm setups in general, but differences among the individual setups were tested for in

a post-hoc analysis using (paired) Wilcoxon tests (significance level of 5%) with Bonferroni correction.

TGA

Search algorithm

TS = 1

TS = 2

TS = 4

Constant mutation Poisson mutation

d = 1 d = 2 p m = 0.5/N p m = 1/N

p c = 0.0 0.6918 (0.7317) 0.6637 (0.7383 4 ) 0.6812 (0.7338) 0.6767 (0.7356 5 )

p c = 0.25 0.6752 (0.7324) 0.6546 (0.7332) 0.6623 (0.7299) 0.6578 (0.7324)

p c = 0.5 0.6653 (0.7298) 0.6472 (0.7331) 0.6478 (0.7264) 0.6472 (0.7287)

p c = 0.0 0.702 4 (0.7299) 0.6697 (0.7318) 0.6938 (0.7307) 0.6821 (0.7305)

p c = 0.25 0.6897 (0.7277) 0.662 (0.732) 0.6717 (0.7281) 0.6643 (0.7302)

p c = 0.5 0.6768 (0.7288) 0.6537 (0.7296) 0.6607 (0.7263) 0.659 (0.7274)

p c = 0.0 0.7072 2 (0.7275) 0.677 (0.7294) 0.6981 5 (0.7302) 0.6902 (0.7272)

p c = 0.25 0.6979 (0.7294) 0.6701 (0.7298) 0.6834 (0.7279) 0.6805 (0.7263)

p c = 0.5 0.6936 (0.7265) 0.6635 (0.7296) 0.6715 (0.7258) 0.67 (0.7255)

RBS 0.7102 1 (0.7207) 0.6871 (0.7219) 0.7056 3 (0.7217) 0.6987 (0.7215)

PHC 0.6651 (0.7434 2 ) 0.6476 (0.7345) 0.6568 (0.7438 1 ) 0.6517 (0.7421 3 )

Random sampling 0.632 (0.6524)

6.3. CHAPTER SUMMARY 203


Chapter 7

Conclusion

In this thesis we have established an understanding of why and how three resourcing

issues — ephemeral resource constraints, changing variables, and lethal

environments — affect evolutionary search when applied to closed-loop optimization,

and devised search strategies for combating these issues. Generally, we observed

that all three resourcing issues may negatively impact the performance

of an optimizer, but that the strategies that we devised can be effective against

these issues. We will now discuss what we have learnt about each resourcing issue

and the practical implications surrounding it.

Optimization subject to ephemeral resource constraints (ERCs): The

main resourcing issue we focused on in this thesis (Chapters 4 and 5) were optimization

problems subject to ephemeral resource constraints (ERCs). These are a

typeofdynamicconstraintthatcancausecertainsolutionstobetemporarilynonevaluable

during the optimization process. We define solutions as non-evaluable

when the resources required in their evaluation are temporarily not available.

Clearly, ERCs are less likely to arise if we ensure that all the resources needed

to cover all possible search paths are available at any point during an optimization

process. However, this strategy is rather costly and inefficient. Instead, we

need to develop strategies that interplay with the workings of an evolutionary

algorithm (EA) to ensure that promising and relevant resources are available; if a

required resource is not available, then a strategy should also be able to efficiently

utilize those resources that are available.

Generally, we observed that ERCs may affect the performance of an optimizer

in some (but by no means all) cases, and clear patterns emerge that relate ERC

parameters to performance effects. For instance, we observed that the later the

204


205

constraint time frame of an ERC begins, the less disruptive is the impact on

search. This could mean that investment in resources at early stages of the optimization

should bepreferred, where possible. We also observed that the impact of

an ERC on the performance of evolutionary search depends on the order and the

quality of the genetic material represented by the constraint schemata of an ERC.

Thus, to some degree, we may be able to predict the extent of impact if information

about these schemata is available. Consequently, this prediction can serve as

the basis for investment decisions related to resources. For an ERC that leaves

the arrangement of resources to the optimizer (see Section 5.3), we observed that,

at the beginning of an optimization process, it is necessary to limit the number

of different resources purchased in the presence of budgetary constraints. This

policy is cost-effective and does not impact the performance significantly in the

long run.

We have also clearly seen (theoretically and empirically) that the impact on

the performance of an evolutionary algorithm (EA) is modulated by the choice of

constraint-handling strategy adopted and EA parameters. Which choice of strategy

is best is dependent on the details of the ERC, and the time and budgetary

constraints present. For summaries of (theoretical and empirical) results relating

ERC parameters to suitability of EA configurations and search strategies please

refer to Sections 4.3.4, 5.1.5, 5.2.4, and 5.3.4.

Finally, we also observed that knowing about the type of ERC may be sufficient

to select an effective search strategy for dealing with it a priori, even when

knowledge of the fitness landscape is limited. If this observation turns out to be

more generally true, then it is good news because we usually have more knowledge

about the ERCs than about the fitness landscape. Therefore we would not

need to be ‘right’ about the fitness landscape in order to choose the right search

strategy. Nevertheless, as we indicated in the case study of Section 5.1.4, a comprehensive

a priori analysis of the ERCs at hand can be beneficial when it comes

to selecting a suitable policy.

Optimization subject to changes of variables: Motivated by a real closedloop

problem in drug discovery, the second resourcing issue we considered in

this thesis (Chapter 6) was concerned with optimizing subject to changes of

variables (drugs) and thus a changing fitness landscape. The general objective

when tackling such dynamic problems is to track the global optimum, which may

shift upon a change in the landscape. Quick tracking of global optima is essential


206 CHAPTER 7. CONCLUSION

in a closed-loop optimization scenario because evaluations are expensive so their

number is typically limited. Clearly, we will not encounter this resourcing issue in

the first place if the experimentalist knows in advance all the drugs she wants to

test during the optimization process. However, knowing that evolutionary search

can cope with changing variables gives an experimentalist the opportunity to

interfere in the optimization process if necessary.

Compared to standard dynamic problems, problems subject to changes of

variables feature two important advantages for tracking shifting optima quickly:

(i) changes in the fitness landscape do not need to be detected explicitly because

an optimizer knows and may even be allowed to control when and which variables

(drugs) are replaced, and (ii) tracking of optima is simplified because there is a

correlation between the variables changed and the parts of the landscape undergoing

a change. We proposed a strategy (see Section 6.1.2), which we call fair

mutation, that specifically exploits these two aspects for tracking shifting optima.

Our analyses showed that the difficulty of tracking optima depends on the

number of variables changed and the frequency with which they are changed.

Fair mutation has two user-defined parameters that allow the experimentalist

to account for these two factors. We have shown in Section 6.1.5 that, when

tuned appropriately, fair mutation can track shifting optima more quickly than

standard strategies from the dynamic optimization literature. However, as more

variables are changed less frequently, restarting the optimization from scratch

upon a variable change has often turned out to be the best choice on the test

problems considered (see Section 6.1.6).

Optimization in lethal environments: The final resourcing issue we considered

in this thesis (Chapter 6) was related to lethal optimization environments.

This resourcing issue has an effect on the population size of an optimizer as poor

solutions ‘die at birth’ and are permanently lost from the evolving population.

This issue arises because the hardware on which individuals (representing, e.g.

control software) are tested is reconfigurable, destructible and non-replaceable.

The hardware may be, for instance, prototypes of autonomous robots, drone

planes, or nano-machines, all of which are expensive pieces and commonly only

available in limit numbers for experimentation. Clearly, the hardware is expensive

and we would like to avoid damaging it, but set against this is that we want

to improve the control software that runs on the hardware. This conflict requires

an optimizer to trade off exploration and exploitation rather carefully.


7.1. FUTURE WORK 207

To cope with lethal environments, we looked (in Section 6.2.3) at the tuning

of some of the basic EA configuration parameters: degree-of-elitism, population

size, selection pressure, mutation mode and strength, and crossover rate. We also

varied aspects of the fitness landscape we are optimizing to observe which landscape

topologies pose a particular challenge when optimizing in a lethal environment.

The main finding of our investigation was that randomness in the solution

generation process should be limited in a lethal optimization environment. That

is, elitism and a relatively high selection pressure should be employed and so

should be a more deterministic mutation operator; the mutation and crossover

rates should be low too. We have also observed that an EA that evolves many

individuals (e.g. control software) for a small number of generations performs

better than one that evolves a few entities for many generations. The practical

implication of this is that, if possible, a larger budget should be allocated for the

production of the entities in the first place rather than for attempting to prolong

the testing or optimization phase.

7.1 Future work

Although this work allows us to draw valuable conclusions, our study has of

course been very limited. There remains much else to learn about the effects of

resourcing issues in closed-loop optimization and how to handle them. We now

discuss several directions for future research towards achieving this goal.

Gaining a more robust understanding for the search strategies developed.

To gain a more robust understanding of the behavior of the search strategies

developed, it would be beneficial to consider further and perhaps more realistic

fitness landscapes than the ones we employed here. Of course, it would be

ideal tovalidatethesearchstrategiesonreal-worldclosed-loopproblemsfeaturing

real resource constraints. However, this approach is generally not realistic due

to the time and/or budgetary requirements. The next best thing we can do is to

simulate a fitness landscape based on data obtained from real-world experiments.

This is the approach we have taken in the case study of Section 5.1.4, and more

studies of this kind are needed.

The test problems considered in this theses were mainly of pseudo-boolean

nature; it would also be important to investigate how the search strategies behave

for search spaces with real or mixed integer variables. Also, our analysis


208 CHAPTER 7. CONCLUSION

has looked at the impact of different resourcing issues isolated from each other,

which was necessary to gain an initial understanding of their impact on optimization.

However, in real applications it is possible to encounter closed-loop

problems that are subject to multiple resourcing issues simultaneously. Hence,

further work should investigate how to trade off conflicting properties of different

search strategies to cope efficiently with multiple resourcing issues.

Further theoretical analysis of resourcing issues. In Chapter 4 we have

used Markov chains to analyze theoretically the effect of a particular ERC type

on simple EAs. Although our analysis used a simplified optimization environment

(two solution types only), valuable observations were made with respect to

the applicability of different selection and reproduction schemes. We also gained

some understanding about the impact of ERCs on evolutionary search, which,

ultimately, may help us in the design of effective and efficient search strategies

for closed-loop optimization. However, our theoretical results were limited in the

sense that we did not derive mathematical equations relating, for instance, ERC

configurations to optimal EA parameter settings. It remains to be seen whether

it is possible to derive such expressions, and how applicable they would be in

practice. A number of recent advances in EA theory might present the possibility

of understanding ERCs (and other resourcing issues) more deeply, including

drift analysis (Auger and Doerr, 2011) and the fitness level method (Chen et al.,

2009a; Lehre, 2011).

Understanding the effects of non-homogeneous experimental costs in

closed-loop optimization. So far, we have made the assumption that all solution

evaluations take equal time or resources. This need not be the case. For

instance, when dealing with commitment composite ERCs, it is a very realistic

scenario that the composites to be ordered vary in their prices and delivery periods.

Under a limited budget, this scenario might cause an optimizer not only to

follow fitness gradients but also to account for variable experimental costs. Hence,

further work should investigate how to trade off these two aspects effectively. For

inspiration, we may look at strategies employed in the Robot Scientist study

of King et al. (2004), where, as noted earlier, this scenario has been encountered

within an inference problem rather than an optimization problem. Alternatively,

we may also cast the problem of trading off fitness with costs as a multi-objective

optimization problem and employ multi-objective EAs to tackle this problem.

Once we have gained an understanding of the effect of non-homogeneous costs


7.1. FUTURE WORK 209

on optimization, the next step would be to identify challenging problem instances

to investigate whether or not it may be worth developing sophisticated search

strategies for dealing with non-homogeneous costs, or whether the same level of

performance can be obtained using more straightforward naïve search strategies.

Developing strategies for coping with lethal environments. Our investigation

of the impact of lethal environments (see Section 6.2) on evolutionary

search was limited to the tuning of simple EA configuration parameters, such as

population size, variation operators and their settings. Clearly, it is possible that

a better performance may be achieved using strategies specifically designed to

cope with lethal environments. From preliminary experiments, we know already

that a learning approach, similar to the one proposed in Section 5.2.1, which

learns a switching policy offline can yield better performance. Alternatively, we

may augment an EA with a strategy that uses assumptions of local fitness correlation

to pre-screen the designs and forbid the upload of potential lethals. Such

a strategy is similar to brood selection with repair and/or some fitness approximation

schemes used in EAs to filter solutions before evaluation (Walters, 1998;

Jin and Sendhoff, 2003; Jin, 2005).

Improving and broadening the application of machine learning techniques

in closed-loop optimization. We have shown (in Section 5.2) that

evolutionary search augmented with machine learning techniques, such as reinforcement

learning (RL), can be a powerful optimization tool to cope with

ERCs. Similar learning approaches can be applied to other resourcing issues.

As mentioned above, we employed a learning-based approach to cope with lethal

environments, and the results obtained are promising. However, as mentioned in

Section 5.2.3, some tuning of the reinforcement learning agent may be required

to achieve this strong performance. The tuning refers to things like parameter

settings involved in the RL agent, the state space partitioning, and the reward

function. To increase the applicability of learning-based optimizers to different

types of optimization problems, one could also try combining offline learning with

online learning. For instance, RL can be used to learn offline a policy until some

distant point in time, and this policy can then be refined or slightly modified

online using the anticipation approach of Bosman (2005). Another avenue worth

pursuing is to extend search methods that use surrogate models (see Section 2.6),

which are commonly used in real-world problems, to cope with the resourcing

issues considered in this thesis.


210 CHAPTER 7. CONCLUSION

Casting ERCOPs as RL problems. We indicated in Section 4.1.2 that an

ERCOP can be cast as an RL problem rather than a pure optimization problem

(perhaps this perspective is more intuitive now that we have seen how RL can be

used to learn when to switch between different constraint-handling strategies).

When cast as an RL problem, we could use a standard RL technique (such as

Sarsa(λ)) to generate solutions to an ERCOP, rather than augment an EA with

constraint-handling strategies under RL control. As we know, to make this approach

work, we need to define the state space, actions, and the reward function.

To define a state, we may now need to consider also the set of evaluable solutions

at the current time step (in addition to other indicators, such as the current time

step and repairs performed so far). The acti