slides available - FBK | SE

Search-Based Testing 

Phil McMinn 

University of Sheffield, UK

Overview 

How and why Search-Based Testing Works 

Examples 

Temporal, Functional, Structural 

Applications in Mutation Testing 

Test suite prioritisation 

Search-Based Test Data Generation 

Testability Transformation 

Input Domain Reduction 

Empirical and Theoretical Studies 

Future Directions

Acknowledgements 

The material in some of these slides has kindly 

been provided by: 

Mark Harman (KCL/UCL) 

Joachim Wegener (Berner & Matner)

Conventional Testing 

manual design of test cases / scenarios


manual design of test cases / scenarios 

laborious, 

time-consuming



laborious, 

time-consuming 

tedious



tedious 

laborious, 


difficult! 

(where are the faults )



tedious 

laborious, 


difficult! 

(where are the faults ) 

the hard part...

Random Test Data 

Generation 

Input


Generation 

Input


Generation 

Input

Search-Based Testing is an 

automated search of a 

potentially large input space 

the search is guided by a 

problem-specific ‘fitness function’

Search-Based Testing is an 

automated search of a 

potentially large input space 

the search is guided by a 

problem-specific ‘fitness function’ 

the fitness function guides the search 

to the test goal

Fitness Function 

The fitness function scores different inputs 

to the system according to the test goal



to the system according to the test goal 

which ones are ‘good’ 

(that we should develop/evolve further)



to the system according to the test goal 

which ones are ‘good’ 

(that we should develop/evolve further) 

 

which ones are useless 

(that we can forget about)

Fitness-guided search 

Fitness 

Input


Fitness 

Input


Fitness 

Input

First publication on SBST 

Automatic Generation of Floating-Point Test Data 

IEEE Transactions on Software Engineering, 1976 

Webb Miller 

David Spooner 

Winner of the 2009 Accomplishment by a Senior Scientist Award 

of the International Society for Computational Biology 

2009 Time100 

Scientists and Thinkers

Publications since 1976 

source: SEBASE publications repository http://www.sebase.org

International Search-Based Testing 

Workshop 

co-located with ICST 

4th event at ICST 2011 

next year 

check websites for 

submission deadlines etc.: 

http://sites.google.com/site/icst2011/ 

http://sites.google.com/site/icst2011workshops/

International Symposium on 

Search-Based Software 

Engineering 

www.ssbse.org 

Benevento, Italy. 

7th - 9th September 2010

International Symposium on 

Search-Based Software Engineering 

www.ssbse.org 

2011: Co-location with FSE. 

Szeged, Hungary 

Phil McMinn 

General Chair 

Myra Cohen and Mel Ó Cinnéide 

Program Chairs

Fitness Functions 

Often easy 

We often define metrics 

Need not be complex

Daimler Temporal Testing 

deceleration 

optimal point 

in time for 

triggering the 

airbag igniter 

Fitness = duration 

t min 

t 

t max

J. Wegener and M. Grochtmann. Verifying timing constraints of 

real-time systems by means of evolutionary testing. Real-Time 

Systems, 15(3):275– 298, 1998.

Conventional testing 


laborious, 


tedious 

difficult! 

(where are the faults )



laborious, 


tedious 

difficult! 


Search-Based Testing: 

automatic - may sometimes be 

time consuming, but it is not a 

human’s time being consumed



laborious, 


tedious 

difficult! 




automatic - may sometimes be 

time consuming, but it is not a 

human’s time being consumed 

a good fitness function 

will lead the search to 

the faults

Generating vs Checking 

Conventional Software Testing Research 

Write a method to construct test cases 


Write a method 

to determine how good a test case is

Generating vs Checking 

Conventional Software Testing Research 

Write a method to construct test cases 


Write a fitness function 

to determine how good a test case is

Daimler Autonomous 

Parking System 

Input 

psi 

gap 

dist2space 

space width 

space length

Daimler Autonomous 

Parking System 

Fitness

Generation 0

Generation 0 Generation 10 

critical

Generation 0 Generation 10 

critical 

collision 

Generation 20

Test setup 

Usual approach to testing 

Manually generated input situations 

Simulation 

environment 

Software 

under test 

Output of test

Test setup 

Search-Based Testing approach 

Automatically generated inputs 

Search 

Algorithm 

Simulation 

environment 

Software 

under test 

Feedback loop 

outputs (converted to fitness values)

O Bühler and J Wegener: Evolutionary functional testing, Computers & Operations Research, 

2008 - Elsevier 

O. Buehler and J. Wegener. Evolutionary functional testing of an automated parking system. In 

International Conference on Computer, Communication and Control Technologies and The 

9th. International Conference on Information Systems Analysis and Synthesis, Orlando, 

Florida, USA, 2003.

Structural testing

Structural testing 

fitness function analyses 

the outcome of 

decision statements and the 

values of variables in 

predicates

Structural testing 

fitness function analyses 

the outcome of 

decision statements and the 

values of variables in 

predicates 

More later ...

Assertion testing

Assertion testing 

assertion condition 

speed < 150mph



speed < 150mph 

fitness function: 

f =150 - speed



speed < 150mph 

fitness function: 

f =150 - speed 

fitness minimised 

If f is zero or less a fault is found

Bogdan Korel, Ali M. Al-Yami: Assertion-Oriented 

Automated Test Data Generation. ICSE, 1996: pp. 71-80

Search Techniques

Hill Climbing 

Fitness 

Input

Hill Climbing 

Fitness 

Input

Hill Climbing 

Fitness 

Input

Hill Climbing 

Fitness 

Input

Hill Climbing 

Fitness 

No better solution in neighbourhood 

Stuck at a local optima 

Input

Hill Climbing - Restarts 

Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input

Simulated Annealing 

Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Input


Fitness 

Worse solutions 

temporarily 

accepted 

Input

Evolutionary Algorithm 

Fitness 

Input


Fitness 

Input


Fitness 

Input

Evolutionary Algorithms 

inspired by Darwinian Evolution and concept of 

survival of the fittest 

Crossover 

Mutation

Crossover 

a b c d 

10 10 20 40 

a b c d 

20 -5 80 80

Crossover 

a b c d 

10 10 20 40 

a b c d 

20 -5 80 80

Crossover 

a b c d 

10 10 20 40 

a b c d 

20 -5 80 80

Crossover 

a b c d 

10 10 20 40 

a 

10 

b 

10 

a b c d 

20 -5 80 80 

c 

d 

20 40

Crossover 

a b c d 

10 10 20 40 

a 

10 

b 

10 

c 

d 

80 80 

a b c d 

20 -5 80 80 

a 

20 

b 

-5 

c 

d 

20 40

Mutation 

a b c 

10 10 20 

d 

40

Mutation 

a b c 

d 

d 

10 10 20 40 20

Mutation 

a b c 

d 

d 

10 10 20 40 

20

Mutation 

a 

b c 

d 

d 

20 10 10 20 40 

20

Evolutionary Testing

Evolutionary Testing 

Insertion 

Mutation 

Fitness Evaluation 

Crossover 

End 

Selection


Insertion 

Mutation 

Test cases 

Fitness Evaluation 

Crossover 

Monitoring 

Selection 

Execution 

End

Which search method 

Depends on characteristics of the search landscape










Some landscapes are 

hard for some 

searches but easy for 

others 

...and vice 

versa...



hard for some 


others 

...and vice 

versa...



hard for some 


others 

...and vice 

versa...



hard for some 


others 

...and vice 

versa... 

more on 

this later...

Ingredients for 

an optimising search 

algorithm 

Representation 

Neighbourhood 

Fitness Function




Neighbourhood 

Fitness Function




Neighbourhood 


A method of encoding all possible inputs 

Usually straightforward 

Inputs are already in data structures




Neighbourhood 

Part of our understanding of the problem 

We need to know our near neighbours 

Fitness Function




Neighbourhood 


Transformation of the test goal to a 

numerical function 

Numerical values indicate how ‘good’ an 

input is

More search algorithms

More search algorithms 

Tabu Search 

Particle Swarm Optimisation 

Ant Colony Optimisation 

Genetic Programming 

Estimation of Distribution Algorithms

SBST Surveys & Reviews 

Phil McMinn: Search-based software test data generation: a survey. Software Testing, Verification and 

Reliability 14(2): 105-156, 2004 

Wasif Afzal, Richard Torkar and Robert Feldt: A Systematic Review of Search-based Testing for Non- 

Functional System Properties. Information and Software Technology, 51(6):957-976, 2009 

Shaukat Ali, Lionel Briand, Hadi Hemmati and Rajwinder Panesar-Walawege: A Systematic Review of the 

Application and Empirical Investigation of Search-Based Test-Case Generation. IEEE Transactions on 

Software Engineering, To appear, 2010

Getting started in SBSE 

M. Harman and B. Jones: Search-based software engineering. Information and Software Technology, 43(14): 

833–839, 2001. 

M. Harman: The Current State and Future of Search Based Software Engineering, In Proceedings of the 

29th International Conference on Software Engineering (ICSE 2007), 20-26 May, Minneapolis, USA (2007) 

D. Whitley: An overview of evolutionary algorithms: Practical issues and common pitfalls. Information and 

Software Technology, 43(14):817–831, 2001.

More applications of 

Search-Based Testing

Mutation Testing 

original


mutant 

original 

mutant


mutant 

mutant


mutant 

higher order mutant 

mutant


mutant 

higher order mutant 

mutant 

Fewer Equivalent Mutants 

A. J. Offutt. Investigations of the software testing coupling effect. ACM 

Transactions on Software Engineering and Methodology 1(1):5–20, Jan. 1992.

Finding Good HOMs 

Due to the large number of potential HOMs finding the 

ones that are most valuable is hard 

We want: 

Subtle mutants 

Reduced Test 

Effort




We want: 


HOMs that are hard to kill, 

corner cases where 

undiscovered faults reside 

Reduced Test 

Effort




We want: 


HOMs that are hard to kill, 

corner cases where 

undiscovered faults reside 

Reduced Test 

Effort 

HOMs that subsume as 

many first-order mutants as 

possible

Subtle Mutants - 


ratio of the killability of the HOM to 

the killability of the constituent FOMs 

> 1 HOM is weaker than FOMs 

Y. Jia and M. Harman. Constructing subtle faults using 

higher order mutation testing. In 8th International 

Working Conference on Source Code Analysis and 

Manipulation (SCAM 2008), Beijing, China, 2008. IEEE 

Computer Society. 

< 1 HOM is stronger than FOMs 

= 0, potential equivalent mutant

Time aware test suite 

prioritisation 

K. R. Walcott, M. L. Soffa, G. M. Kapfhammer, and R. S. Roos. 

Time aware test suite prioritization. In Proc. ISSTA, pages 1–11, 2006.



Suppose a 12 minute time 

budget



Order by considering the 

number of faults that can be 

detected



Order by considering the 

time only



Average % of faults detected



Intelligent heuristic search



Intelligent heuristic search 

Same no. of 

faults found, 

less time


The tester is unlikely to know the location of faults 

Need to estimate how likely a test is to find defects 

% of code coverage is used to estimate a suite’s potential 

x 

the time taken to execute the test suite

Results 

Mutations used to seed faults

Multi-objective Search 

Instead of combining the objectives into one 

fitness function, handle them as distinctive goals 

Time 

Faults


Fitness Function A 

Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



Fitness Function B



The Pareto Front 

Fitness Function B

Some Results 

Pareto Efficient Multi-Objective Test Case 

Selection, Shin Yoo and Mark Harman. 

Proceedings of ACM International Symposium 

on Software Testing and Analysis 2007 (ISSTA 

2007), p140-150

Three objectives

Other applications of 


Combinatorial Interaction Testing 

M.B. Cohen, M.B. Dwyer and J. Shi, Interaction testing of highly-configurable systems in 

the presence of constraints, International Symposium on Software Testing and Analysis 

(ISSTA), London, July 2007, pp. 129-139. 

GUI Testing 

X. Yuan, M.B. Cohen and A.M. Memon, GUI Interaction Testing: Incorporating Event 

Context, IEEE Transactions on Software Engineering, to appear

Other applications of 


Stress testing 

L. C. Briand, Y. Labiche, and M. Shousha. Stress testing real-time systems with genetic 

algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference 

(GECCO 2005), pages 1021–1028, Washington DC, USA, 2005. ACM Press. 

State machine testing 

K. Derderian, R. Hierons, M. Harman, and Q. Guo. Automated unique input output 

sequence generation for conformance testing of FSMs. The Computer Journal, 

39:331–344, 2006.

Search-Based 

Structural Test Data 

Generation

Covering a structure 

TARGET

Fitness evaluation 

TARGET

Fitness evaluation 

The test data 

executes the 

‘wrong’ path 

TARGET

Analysing control flow 

TARGET

Analysing control flow 

The outcomes at 

key decision 

statements 

matter. 

TARGET 

These are the 

decisions on 

which the target 

is control 

dependent

Approach Level 

TARGET


= 2 

TARGET


= 2 

= 1 

TARGET


= 2 

= 1 

TARGET 

= 0


= 2 

= 1 

minimisation 

= 0 

TARGET


Roy P. Pargas, Mary Jean Harrold, Robert Peck: Test-Data Generation 

Using Genetic Algorithms. Softw. Test., Verif. Reliab. 9(4): 263-282 (1999) 

Joachim Wegener, André Baresel, Harmen Sthamer: Evolutionary test environment for 

automatic structural testing. Information & Software Technology 43(14): 841-854 (2001)

Analysing predicates 

Approach level alone gives us coarse values 

a = 50, b = 0 

a = 45, b = 5 

a = 40, b = 10 

a = 35, b = 15 

a = 30, b = 20 

a = 25, b = 25 !

Analysing predicates 

Approach level alone gives us coarse values 

a = 50, b = 0 

a = 45, b = 5 

a = 40, b = 10 

a = 35, b = 15 

a = 30, b = 20 

a = 25, b = 25 ! 

getting ‘closer’ 

to being true

Branch distance 

Associate a distance formula with different 

relational predicates 

a = 50, b = 0 branch distance = 50 






getting ‘closer’ 

to being true

Branch distances for 

relational predicates 

Webb Miller & D. Spooner. Automatic Generation of Floating- 

Point Test Data. IEEE Trans. Software Eng. 1976 

Bogdan Korel. Automated Software Test Data Generation. 

IEEE Trans. Software Eng. 1990 

Nigel Tracey, John Clark & Keith Mander. 

The Way Forward for Unifying Dynamic Test-Case Generation: 

The Optimisation-Based Approach.1998

Putting it all together 

Fitness = approach Level + normalised branch distance 

true 

if a >= b 

false 

true 

if b >= c 

false 

true if c >= d false 

TARGET 

TARGET



true 

if a >= b 

false 

true 

if b >= c 


false 

TARGET MISSED 

Approach Level = 2 

Branch Distance = b - a 

TARGET 

TARGET 

normalised branch distance between 0 and 1 

indicates how close approach level is to being penetrated



true 

if a >= b 

false 

true 

if b >= c 


TARGET 

TARGET 

false 






Branch Distance = c - b 





true 

if a >= b 

false 

true 

if b >= c 


TARGET 

TARGET 

false 






Branch Distance = c - b 



Branch Distance = d - c 



Normalisation Functions 

Since the ‘maximum’ branch distance is generally unknown 

we need a non-standard normalisation function 

Baresel (2000), alpha = 1.001

Normalisation Functions 

Since the ‘maximum’ branch distance is generally unknown 

we need a non-standard normalisation function 

Arcuri (2010), beta = 1

Alternating Variable Method 

‘Probe’ moves 

void fn( input1, input2, input3 .... )



increase 

void fn( input1, input2, input3 .... ) 

decrease



increase 

increase 


decrease 

decrease



increase increase increase 


decrease decrease decrease


Accelerated hill climb 

Fitness 

Input variable value



Fitness 




Fitness 




Fitness 




Fitness 




Fitness 




Fitness 



1. Randomly generate start point 

a=10, b=20, c=30



a=10, b=20, c=30 

2. ‘Probe’ moves on a 

a=9, b=20, c=30 

a=11, b=20, c=30 

no effect



a=10, b=20, c=30 


a=9, b=20, c=30 

a=11, b=20, c=30 

no effect 

3. ‘Probe’ moves on b 

a=10, b=19, c=30 improved 

branch distance



a=10, b=20, c=30 


a=9, b=20, c=30 

a=11, b=20, c=30 

3. ‘Probe’ moves on b 

a=10, b=19, c=30 

no effect 

improved 

branch distance 

4. Accelerated moves in 

direction of improvement

Key Publications 

Alternating variable method 

Bogdan Korel. Automated Software Test Data Generation. 

IEEE Trans. Software Eng. 1990 

Evolutionary structural test data generation 

S. Xanthakis, C. Ellis, C. Skourlas, A. Le Gall, S. Katsikas, and K. Kara- poulios. 

Application of genetic algorithms to software testing (Applica- tion des algorithmes g 

én étiques au test des logiciels). In 5th International Conference on Software 

Engineering and its Applications, pages 625–636, Toulouse, France, 1992

A search-based 

test data generator tool

Input Generation Using Automated Novel Algorithms

IGUANA 

(Java) 

Test object 

(C code compiled 

to a DLL )

IGUANA 

(Java) 

inputs 

Test object 


to a DLL )

IGUANA 

(Java) 

inputs 

Test object 

fitness 

computation 

information from 

test object 

instrumentation 


to a DLL )

IGUANA 

(Java) 

search algorithm 

inputs 

Test object 

fitness 

computation 


test object 



to a DLL )

IGUANA 

(Java) 

search algorithm 

inputs 

Test object 

fitness 

computation 

Java Native 

Interface 


test object 



to a DLL )

A function for testing

Test Object Preparation 

1. Parse the code and 

extract control dependency graph 

if b == c 

if a == b 

“which decisions are key for the 

execution of individual structural 

targets” 

return 1 

TARGET


2. Instrument the code 

if b == c 

if a == b 

for monitoring control flow 

and variable values in 

predicates 

return 1 

TARGET


3. Map inputs to a vector 

a b c



a b c 

Straightforward in many cases



a b c 

Straightforward in many cases 

Inputs composed of dynamic data 

structures are harder to compose 

Kiran Lakhotia, Mark Harman and Phil McMinn. 

Handling Dynamic Data Structures in Search-Based Testing. 

Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2008), Atlanta, 

Georgia, USA, July 12-16 2008, pp. 1759-1766, ACM Press.

Instrumentation

Instrumentation

Each branching 

condition is 

replaced by a call 

to the function 

node(...) 

the instrumentation should only observe 

the program and not alter its behaviour

The first parameter is 

the control flow 

graph node ID of the 

decision statement 

The second parameter is a boolean condition that 

replicates the structure in the original program 

(i.e. including short-circuiting)

Relational predicates are replaced with functions that 

compute branch distance.

The instrumentation tells us: 

Which decision nodes were executed 

and their outcome (branch distances) 

Therefore we can find which decision control flow 

diverged from a target for an input.... 

... and compute the approach level 

from the control dependence graph 

fitness value 

for an input 

... and lookup the branch distance

1nput: 

NODE T F 

1 0 1 

2 10 0 

4 20 0 

.... 

Diverged at node 2 

if a == b 

if b == c 

return 1 

TARGET 

approach level: 0 

branch distance: 10 

fitness = 0.009945219

Testability 

Transformation

The ‘Flag’ Problem

Program Transformation

Program Transformation

Programs will inevitably have features that 

heuristic searches handle less well 

Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and Marc Roper. 

Testability Transformation. 

IEEE Transactions on Software Engineering. 30(1): 3-16, 2004.



Testability transformation: 

change the program to improve test data generation 






Testability transformation: 

change the program to improve test data generation 

... whilst preserving test adequacy 




Nesting 

Phil McMinn, David Binkley and Mark Harman 

Empirical Evaluation of a Nesting Testability Transformation for Evolutionary Testing 

ACM Transactions on Software Engineering and Methodology, 18(3), Article 11, May 2009

Testability Transformation


Note that the programs are no longer equivalent


Note that the programs are no longer equivalent 

But we don’t care - so long as we get the test data is 

still adequate

Nesting & Local Optima




Results - Industrial & 

Open source code 

100 

Change in success rate after applying 

transformation (%) 

80 

60 

40 

20 

0 

-20 

-40 

-60 

-80 

-100 

Nested branches

Dependent & 

Independent Predicates 

Independent 

Predicates influenced by 

disjoint sets of input variables 

Can be optimised 

in parallel 

e.g. ‘a == 0’ and ‘b == 0’

Dependent & 

Independent Predicates 

Interactions 

Dependent 

Predicates influenced by 

non-disjoint sets of input 

variables 

between 

predicates inhibit 

parallel 

optimisation 

e.g. ‘a == b’ and ‘b == c’

100 



80 

60 

40 

20 

0 

-20 

-40 

-60 

-80 

-100 

Nested branches 

Dependent predicates

Independent and 

some dependent predicates 

100 



80 

60 

40 

20 

0 

-20 

-40 

-60 

-80 

-100 

Nested branches 

Dependent predicates

When not preserving program 

equivalence can go wrong

we are testing to cover structure 

... but the structure is the problem 

so we transform the program 

... but this alters the structure




... but this alters the structure




... but this alters the structure 

so we need to be careful: 

are we still testing according to the same 

criterion

Input Domain 

Reduction

Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn and Joachim Wegener. 

The Impact of Input Domain Reduction on Search-Based Test Data Generation. 

The 6th joint meeting of the European Software Engineering Conference and the 

ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/ 

FSE 2007), Cavtat, Croatia, September 3-7 2007, pp. 155-164, ACM Press.











Effect of Reduction 

-100,000 ... 100,000 

-100,000 ... 100,000 

-100,000 ... 100,000 

approx 10 16


-100,000 ... 100,000 

-100,000 ... 100,000 

-100,000 ... 100,000


-100,000 ... 100,000 

-100,000 ... 100,000 

-100,000 ... 100,000


-100,000 ... 100,000 

-100,000 ... 100,000 

-100,000 ... 100,000 

200,001

Variable Dependency 

Analysis


Analysis


Analysis


Analysis

Empirical Study 

Studied the effects of reduction with: 

Random Search 



Case studies: 

Defroster 

F2 

Gimp 

Spice 

Tiff

Effect on Random Testing 

-49 ... 50 

-49 ... 50 

-49 ... 50 

Probability of 

executing target: 

100 x 100 x 1 

100 x 100 x 100

Effect on Random Testing 

-49 ... 50 

-49 ... 50 

-49 ... 50 

Probability of 

executing target: 

100 x 100 x 1 

100 x 100 x 100

Results with Random Testing

Results with AVM

Effect on AVM 

Saves probe moves (and thus wasteful fitness 

evaluations) around irrelevant variables 

increase increase increase increase 

void fn( irrelevant1, irrelevant2, irrelevant3, required1 .... ) 

decrease decrease decrease decrease

Effect on AVM 

Saves probe moves (and thus wasteful fitness 

evaluations) around irrelevant variables 

increase 

void fn( irrelevant1, irrelevant2, irrelevant3, required1 .... ) 

decrease

Effect on ET 

Saves mutations on irrelevant variables 

Mutations concentrated on the variables 

that matter 

Likely to speed up the search

Results with ET

Conclusions for 

Input Domain Reduction 

Variable dependency analysis can be used to 

reduce input domains 

This can reduce search effort for the 

AVM and ET 

Perhaps surprisingly, there is no overall 

change for random search

Which search 

algorithm

Empirical Study 

Bibclean 

Defroster 

F2 

Eurocheck 

Gimp 

Space 

760 branches in 

~5 kLOC 

Spice 

Tiff 

Totinfo 

Mark Harman and Phil McMinn. 

A Theoretical and Empirical Study of Search Based Testing: Local, Global and Hybrid Search. 

IEEE Transactions on Software Engineering, 36(2), pp. 226-247, 2010.

Interesting branches 

8 

20 9 

Alternating 

Variable Method 

Evolutionary 

Testing

Wins for the AVM 

100 

90 

80 

Success Rate (%) 

70 

60 

50 

40 

30 

20 

10 

0 

f2 F2 (11F) 

space space_seqrotrg (17T) 

space space_seqrotrg (22T) 

spice clip_to_circle (49T) 

spice clip_to_circle (62T) 

spice clip_to_circle (68F) 

spice cliparc (22T) 

spice cliparc (24T) 

spice cliparc (63F) 

totinfo InfoTbl (14F) 


totinfo InfoTbl (29T) 


totinfo InfoTbl (35T) 



Hill Climbing

Wins for the AVM 

100,000 

Avergage number of fitness evaluations 

10,000 

1,000 

100 

10 

1 

gimp_rgb_to_hsl (4T) 

gimp_rgb_to_hsv (5F) 

gimp_rgb_to_hsv4 (11F) 

gimp_rgb_to_hsv_int (10T) 

gradient_calc_bilinear_factor (8T) 

gradient_calc_conical_asym_factor (3F) 

gradient_calc_conical_sym_factor (3F) 

gradient_calc_spiral_factor (3F) 

cliparc (13F) 

cliparc (15T) 

cliparc (15F) 

TIFF_SetSample (5T) 


Hill Climbing

When does the AVM win 

fitness 

input 

input

Wins for Evolutionary Testing 

100% 

90% 

80% 

Success Rate 

70% 

60% 

50% 

40% 

30% 

20% 

10% 

0% 

check_ISBN (23F) 

check_ISBN (27T) 



check_ISSN (23F) 

check_ISSN (27T) 




Hill Climbing

When does ET win 

The branches in question were part of a 

routine for validating ISBN/ISSN strings 

When a valid character is found, a counter 

variable is incremented

When does ET win 

Evolutionary algorithms incorporate 

a population of candidate solutions 

crossover 

Crossover enables valid characters to be 

crossed over into different ISBN/ISSN strings

Schemata 

Subsets of 

useful genes 

e.g. 

substrings of 

1010100011110000111010 

1’s in a 

binary all 

1111101010000000101011 

ones 

problem 

0001001010000111101011

Schemata 

Subsets of 

useful genes 

e.g. 

substrings of 

1010100011110000111010 

1’s in a 

binary all 

1111101010000000101011 

ones 

problem 

0001001010000111101011 

The schema theory predicts that schema of above 

average fitness will proliferate in subsequent 

generations of the evolutionary search

Schemata 

1**1 *11* 

1111 

crossover of fit schemata 

can lead to even fitter, 

higher order schemata

Royal Roads 

landscape structures where there is a 

‘red carpet’ for crossovers to follow

The Genetic Algorithm 

Royal Road 

S1: 1111**************************** 

S2: ****1111************************ 

S3: ********1111******************** 

S4: ************1111**************** 

S5: ****************1111************ 

S6: ********************1111******** 

S7: ************************1111**** 

S8: ****************************1111 

S9: 11111111************************ 

S10: ********11111111**************** 

S11: ****************11111111******** 

S12: ************************11111111 

S13: 1111111111111111**************** 

S14: ****************1111111111111111 

S15: 11111111111111111111111111111111

When Crossover Helps

When Crossover Helps 

Executes the target 

R1 

R2 

Q1 

Q2 

Q1 

Q2 

P1 

P2 

P3 

P4 

P1 

P2 

P3 

P4

When Crossover Helps

When Crossover Helps 

Executes the target 

Contains 4 valid 

characters 


characters 


characters 


characters 


characters 


characters 

Contains 

1 valid 

character 

Contains 

1 valid 

character 

Contains 

1 valid 

character 

Contains 

1 valid 

character 

Contains 

1 valid 

character 

Contains 

1 valid 

character 

Contains 

1 valid 

character 

Contains 

1 valid 

character

Headless Chicken Test 

T. Jones, “Crossover, macromutation and population-based search,” Proc. 

ICGA ’95. Morgan Kaufmann, 1995, pp. 73–80. 

100% 

90% 

80% 

Success Rate 

70% 

60% 

50% 

40% 

30% 

20% 

10% 

0% 










Headless Chicken Test

Royal Roads 

Investigations into 

Crossover 

M. Mitchell, S. Forrest, and J. H. Holland, “The royal road for genetic algorithms: Fitness 

landscapes and GA performance,” Proc. 1st European Conference on Artificial Life. MIT 

Press, 1992, pp. 245–254. 

HIFF 

R. A. Watson, G. S. Hornby, and J. B. Pollack, “Modeling building-block interdependency,” Proc. PPSN V. 

Springer, 1998, pp.97–106. 

Real Royal Roads 

T. Jansen and I. Wegener, “Real royal road functions - where crossover provably is essential,” Discrete 

Applied Mathematics, vol. 149, pp. 111–125, 2005. 

Ignoble Trails 

J. N. Richter, A. Wright, and J. Paxton, “Ignoble trails - where crossover is provably harmful,” Proc. 

PPSN X. Springer, 2008, pp. 92–101.


Schemata 

{(a, b, c) | a = b} 

(50, 50, 25) 

(100, 100, 10) 

... 

{(a, b, c) | a > 0} 

(50, 10, 25) 

(100, -50, 10) 

...

Crossover of 

good schemata 

{(a, b, c) | a = b} {(a, b, c) | b ≥ 100} 

{(a, b, c) | a = b ∧ b ≥ 100}

Crossover of 

good schemata 

subschema 

subschema 

{(a, b, c) | a = b} {(a, b, c) | b ≥ 100} 

{(a, b, c) | a = b ∧ b ≥ 100} 

superschema

Crossover of 

good schemata 

building block 

subschema 

building block 

subschema 

{(a, b, c) | a = b} {(a, b, c) | b ≥ 100} 

{(a, b, c) | a = b ∧ b ≥ 100} 

superschema

Crossover of 

good schemata 

{(a, b, c) | a = b ∧ b ≥ 100} {(a, b, c) | c ≤ 10} 

{(a, b, c) | a = b ∧ b ≥ 100 ∧ c ≤ 10}

Crossover of 

good schemata 

{(a, b, c) | a = b ∧ b ≥ 100} {(a, b, c) | c ≤ 10} 

{(a, b, c) | a = b ∧ b ≥ 100 ∧ c ≤ 10} 

covering 

schema

What types of program and program 

structure enable Evolutionary Testing to 

perform well, through crossover and how 

1. Large numbers of conjuncts in the input condition 

{(a, b, c...) | a = b ∧ b ≥ 100 ∧ c ≤ 10 ... 

each represents a ‘sub’-test data generation 

problem that can be solved independently and 

combined with other partial solutions 

P. McMinn, How Does Program Structure Impact the Effectiveness of the Crossover Operator 

in Evolutionary Testing Proc. Symposium on Search-Based Software Engineering, 2010

What types of program and program 

structure enable Evolutionary Testing to 

perform well, through crossover and how 

2. Conjuncts should reference disjoint sets of variables 

{(a, b, c, d ...) | a = b ∧ b = c ∧ c = d ... 

the solution of each conjunct independently does 

not necessarily result in an overall solution 

P. McMinn, How Does Program Structure Impact the Effectiveness of the Crossover Operator 

in Evolutionary Testing Proc. Symposium on Search-Based Software Engineering, 2010

Progressive Landscape 

fitness 

input 

input

Crossover - Conclusions 



Crossover lends itself to programs/units that process 

large data structures (e.g. strings, arrays) resulting in 

input condition conjuncts with disjoint variables






input condition conjuncts with disjoint variables 

... or units that require large sequences of method 

calls to move an object into a required state






input condition conjuncts with disjoint variables 

... or units that require large sequences of method 

calls to move an object into a required state 

e.g. testing for a full stack - push(...), push(...), push(...)

Other Theoretical Work 

A. Arcuri, P. K. Lehre, and X. Yao, “Theoretical runtime analyses of search algorithms 

on the test data generation for the triangle classification problem,” SBST workshop 

2008, Proc. ICST 2008. IEEE, 2008, pp. 161–169. 

A. Arcuri, “Longer is better: On the role of test sequence length in software 

testing,” Proc. ICST 2010. IEEE, 2010.

Future directions...

The Oracle Problem 

Determining the correct output 

for a given input is called the 

oracle problem 

Software engineering research has 

devoted much attention to 

automated oracles 

... but many systems do not have automated oracles

Human Oracle Cost 

Typically the responsibility falls 

on the human 

Reducing human effort, the 

human oracle cost remains an 

important problem

Test data generation 

and human oracle cost 

Quantitative 

generate test data that 

maximises coverage but 

minimises the number 

of test cases 

reduce size of test cases



Quantitative 

generate test data that 

maximises coverage but 

minimises the number 

of test cases 

M. Harman, S. G. Kim, K. Lakhotia, P. McMinn, and S. Yoo. 

Optimizing for the number of tests generated in search based test 

data generation with an application to the oracle cost problem. 

3rd Search-Based Software Testing workshop (SBST 2010), 2010. IEEE 

digital library. 

reduce size of test cases 

A. Leitner, M. Oriol, A. Zeller, I. Ciupa, and B. Meyer. 

Efficient unit test case minimization. 

ASE 2007, pp. 417–420. ACM.



Qualitative 

e.g. how easily can the 

scenario comprising a 

test case be understood 

so that the output can 

be evaluated



Qualitative 


scenario comprising a 

test case be understood 

so that the output can 

be evaluated



Qualitative 


scenario comprising 

generated test data 

be understood so that 

the output can be 

evaluated 

Phil McMinn, Mark Stevenson and Mark Harman. 

Reducing Qualitative Human Oracle Costs associated with Automatically Generated Test Data. 

Proceedings of the 1st International Workshop on Software Test Output Validation (STOV 2010), to appear.

Calendar program 

Takes two dates 

(represented by 3 

integers each) 

Finds the number of days 

between the dates

Calendar program 

Some example 

dates generated: 

-4048/-10854/-29141 

10430/3140/6733 

3063/31358/8201 

Machine-generated test data tends to not fit the 

operational profile of a program particularly well

Seeding knowledge 

Programmer test case 

used as the starting 

point of the search 

process 

16/1/2010 

2/1/2009 

2/32/2010

In what ways can operational profile 

knowledge be obtained 

Programmers 

The Program



Programmers 

Likely to have run the program 

with a sanity check 

The Program 

These can be seeded to bias the 

test data generation process



Programmers 

Identifier names 

The Program 

Code comments 

Sanitisation routines

Identifier names 

Give clues to the sort of inputs that 

might be expected 

dayOfTheWeek 

url 

country_of_origin 

Can be analysed in conjunction with large-scale 

natural language lexicons such as WordNet

Sanitisation Routines 

Defensive 

programming 

routines might 

be used to 

‘correct’ a 

program’s own 

test data

Test Data Re-use

Test Data Re-use

Program similarity and 

test data re-use 

Code structure 

Code comments 

clone detection 

plagiarism detection 

Identifier names

Will these 

techniques work 

Will they compromise 

fault-finding capability

web: http://www.dcs.shef.ac.uk/~phil 

email: p.mcminn@sheffield.ac.uk 

twitter: @philmcminn

Questions & 

Discussion

slides available - FBK | SE

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?