13.04.2014 Views

Mathematical Optimization in Graphics and Vision - Luiz Velho - Impa

Mathematical Optimization in Graphics and Vision - Luiz Velho - Impa

Mathematical Optimization in Graphics and Vision - Luiz Velho - Impa

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Mathematical</strong> <strong>Optimization</strong><br />

<strong>in</strong> <strong>Graphics</strong> <strong>and</strong> <strong>Vision</strong><br />

Paulo Cezar P<strong>in</strong>to Carvalho<br />

<strong>Luiz</strong> Henrique de Figueiredo<br />

Jonas Gomes<br />

<strong>Luiz</strong> <strong>Velho</strong><br />

Instituto de Matemática Pura e Aplicada, IMPA<br />

Rio de Janeiro, Brazil


Preface<br />

<strong>Mathematical</strong> optimization has a fundamental importance for the solution of<br />

many problems <strong>in</strong> computer graphics <strong>and</strong> vision. This fact is apparent from a<br />

quick look at the SIGGRAPH proceed<strong>in</strong>gs <strong>and</strong> other relevant publications <strong>in</strong> these<br />

areas, where a significant percentage of the papers employ <strong>in</strong> one way or another<br />

mathematical optimization techniques.<br />

The book provides a conceptual analysis of the problems <strong>in</strong> computer graphics<br />

<strong>and</strong> discusses the mathematical models used to solve them. This motivates the<br />

reader to underst<strong>and</strong> the importance of optimization techniques <strong>in</strong> graphics <strong>and</strong><br />

vision.<br />

The book gives an overview of comb<strong>in</strong>atorial, cont<strong>in</strong>uous <strong>and</strong> variational optimization<br />

methods, focus<strong>in</strong>g on graphical applications. The prerequisites for<br />

the book are: (i) background on l<strong>in</strong>ear algebra <strong>and</strong> calculus of one <strong>and</strong> several<br />

variables; (ii) computational background on algorithms <strong>and</strong> programm<strong>in</strong>g; (iii)<br />

knowledge of geometric model<strong>in</strong>g, animation, image process<strong>in</strong>g, image analysis<br />

<strong>and</strong> visualization.<br />

This book orig<strong>in</strong>ated from a set of notes <strong>in</strong> Portuguese that we wrote for a<br />

course on this topic at the Brazilian <strong>Mathematical</strong> Colloquium <strong>in</strong> July of 1999<br />

<strong>and</strong> at Brazilian Congress of Applied Mathematics <strong>in</strong> October of 2000. After<br />

that, these notes were exp<strong>and</strong>ed <strong>and</strong> translated to English. This material was then,<br />

presented <strong>in</strong> two highly successful SIGGRAPH tutorial courses <strong>in</strong> August of 2002<br />

<strong>and</strong> 2003.<br />

We wish to thank Anderson Mayr<strong>in</strong>k who collaborated with us to produce the<br />

chapter on Probability <strong>and</strong> <strong>Optimization</strong>.


The book is conceptually divided <strong>in</strong>to five parts: Computer <strong>Graphics</strong> <strong>and</strong> <strong>Optimization</strong>;<br />

Cont<strong>in</strong>uous <strong>and</strong> variational optimization; Comb<strong>in</strong>atorial optimization;<br />

Global <strong>Optimization</strong> Methods; Probability <strong>and</strong> <strong>Optimization</strong>.<br />

The first part of the book will give a conceptual overview of computer graphics,<br />

focus<strong>in</strong>g on problems <strong>and</strong> describ<strong>in</strong>g the mathematical models used to solve them.<br />

This is a short <strong>in</strong>troduction to motivate the study of optimization techniques. The<br />

subject is studied <strong>in</strong> such a way to make it clear the necessity of us<strong>in</strong>g optimization<br />

techniques <strong>in</strong> the solution of a large family of problems.<br />

The second part of the book discusses the optimization of functions f : S → R<br />

, where S is a subset of the Euclidean space R n . Optimality conditions are discussed<br />

for optimization with <strong>and</strong> without restrictions. Different algorithms are<br />

described such as: Newton methods, l<strong>in</strong>ear programm<strong>in</strong>g, conjugate gradient, sequential<br />

quadratic programm<strong>in</strong>g. Applications to camera calibration, color correction,<br />

animation <strong>and</strong> visualization are given. The problem of variational optimization<br />

is properly posed, with special emphasis to variational model<strong>in</strong>g us<strong>in</strong>g<br />

plane curves.<br />

In the third part of the book the problem of comb<strong>in</strong>atorial optimization is posed<br />

<strong>and</strong> different strategies to devise good algorithms are discussed. Special emphasis<br />

is given to Dynamic programm<strong>in</strong>g <strong>and</strong> Integer programm<strong>in</strong>g. Algorithms of<br />

Dijkstra <strong>and</strong> Branch-<strong>and</strong>-bound are discussed. Examples are given to image <strong>and</strong><br />

color quantization, level of detail computation, <strong>in</strong>teractive visualization, m<strong>in</strong>imum<br />

paths <strong>in</strong> maps <strong>and</strong> surface reconstruction from sections.<br />

In the fourth part of the book the difficulties of global optimization are clearly<br />

po<strong>in</strong>ted <strong>and</strong> some strategies to f<strong>in</strong>d solutions are discussed, <strong>in</strong>clud<strong>in</strong>g simullated<br />

anneal<strong>in</strong>g <strong>and</strong> genetic algorithms. The role of <strong>in</strong>terval <strong>and</strong> aff<strong>in</strong>e arithmetic <strong>in</strong> the<br />

solution of problems is also discussed. Applications to geometric model<strong>in</strong>g, <strong>and</strong><br />

animation are given.<br />

The fifth part of the book discusses the subtle relationships between probability<br />

theory <strong>and</strong> mathematical optimization.<br />

iii


Contents<br />

1 Computer <strong>Graphics</strong> 1<br />

1.1 What is computer graphics? . . . . . . . . . . . . . . . . . . . . . 1<br />

1.1.1 Related areas . . . . . . . . . . . . . . . . . . . . . . . . 2<br />

1.1.2 Is there someth<strong>in</strong>g miss<strong>in</strong>g? . . . . . . . . . . . . . . . . 4<br />

1.2 <strong>Mathematical</strong> model<strong>in</strong>g <strong>and</strong> abstraction paradigms . . . . . . . . 5<br />

1.2.1 Terra<strong>in</strong> representation . . . . . . . . . . . . . . . . . . . 7<br />

1.2.2 Image representation . . . . . . . . . . . . . . . . . . . . 9<br />

1.3 Graphical Objects . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.4 Revisit<strong>in</strong>g the diagram . . . . . . . . . . . . . . . . . . . . . . . 13<br />

1.5 Description, representation <strong>and</strong> reconstruction . . . . . . . . . . . 13<br />

1.5.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />

1.5.2 Representation . . . . . . . . . . . . . . . . . . . . . . . 14<br />

1.5.3 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 14<br />

1.5.4 Semantics <strong>and</strong> Reconstruction . . . . . . . . . . . . . . . 17<br />

2 <strong>Optimization</strong>: an overview 21<br />

2.1 What is optimization? . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.1.1 Classification of optimization problems . . . . . . . . . . 22<br />

2.2 Cont<strong>in</strong>uous optimization problems . . . . . . . . . . . . . . . . . 23<br />

2.3 Discrete optimization problems . . . . . . . . . . . . . . . . . . . 24<br />

2.4 Comb<strong>in</strong>atorial optimization problems . . . . . . . . . . . . . . . 26<br />

2.5 Variational optimization problems . . . . . . . . . . . . . . . . . 27<br />

2.6 Other classifications . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.6.1 Classification based on constra<strong>in</strong>ts . . . . . . . . . . . . . 28<br />

2.6.2 Classification based on the objective function . . . . . . . 28<br />

v


vi<br />

CONTENTS<br />

2.7 The problem of pos<strong>in</strong>g problems . . . . . . . . . . . . . . . . . . 29<br />

2.7.1 Well-posed problems . . . . . . . . . . . . . . . . . . . . 30<br />

2.8 How to solve it? . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

2.8.1 Global versus local solutions. . . . . . . . . . . . . . . . 34<br />

2.8.2 NP-complete problems . . . . . . . . . . . . . . . . . . . 34<br />

2.9 Comments <strong>and</strong> references . . . . . . . . . . . . . . . . . . . . . . 34<br />

3 <strong>Optimization</strong> <strong>and</strong> computer graphics 37<br />

3.1 A unified view of computer graphics problems . . . . . . . . . . . 37<br />

3.1.1 The classification problem . . . . . . . . . . . . . . . . . 38<br />

3.2 Geometric Model<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

3.2.1 Model reconstruction . . . . . . . . . . . . . . . . . . . . 40<br />

3.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.4 Computer <strong>Vision</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.5 The Virtual Camera . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

3.5.1 Camera Specification . . . . . . . . . . . . . . . . . . . . 44<br />

3.6 Image Process<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.6.1 Warp<strong>in</strong>g <strong>and</strong> morph<strong>in</strong>g . . . . . . . . . . . . . . . . . . . 46<br />

3.7 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.8 Animation <strong>and</strong> Video . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

3.9 Character recognition . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

4 Variational optimization 55<br />

4.1 Variational problems . . . . . . . . . . . . . . . . . . . . . . . . 55<br />

4.1.1 Solve the variational problem directly . . . . . . . . . . . 58<br />

4.1.2 Reduce to a cont<strong>in</strong>uous optimization problem . . . . . . . 58<br />

4.1.3 Reduce to a discrete optimization problem . . . . . . . . . 58<br />

4.2 Applications <strong>in</strong> computer graphics . . . . . . . . . . . . . . . . . 59<br />

4.2.1 Variational Model<strong>in</strong>g of Curves . . . . . . . . . . . . . . 60<br />

4.3 Comments <strong>and</strong> references . . . . . . . . . . . . . . . . . . . . . . 66<br />

5 Cont<strong>in</strong>uous optimization 69<br />

5.1 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . 70<br />

5.1.1 Convexity <strong>and</strong> global m<strong>in</strong>ima . . . . . . . . . . . . . . . 75<br />

5.2 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78


CONTENTS<br />

vii<br />

5.2.1 Solv<strong>in</strong>g least squares problems . . . . . . . . . . . . . . . 81<br />

5.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

5.3.1 Newton’s method . . . . . . . . . . . . . . . . . . . . . . 83<br />

5.3.2 Unidimensional search algorithms . . . . . . . . . . . . . 84<br />

5.3.3 Conjugate gradient . . . . . . . . . . . . . . . . . . . . . 86<br />

5.3.4 Quasi-Newton algorithms . . . . . . . . . . . . . . . . . 87<br />

5.4 Constra<strong>in</strong>ed optimization . . . . . . . . . . . . . . . . . . . . . . 88<br />

5.4.1 Optimality conditions . . . . . . . . . . . . . . . . . . . 88<br />

5.4.2 Least squares with l<strong>in</strong>ear constra<strong>in</strong>ts . . . . . . . . . . . . 91<br />

5.4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 92<br />

5.5 L<strong>in</strong>ear programm<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . 96<br />

5.5.1 Simplex algorithm for l<strong>in</strong>ear programs . . . . . . . . . . . 99<br />

5.5.2 The complexity of the simplex method . . . . . . . . . . . 100<br />

5.6 Applications <strong>in</strong> <strong>Graphics</strong> . . . . . . . . . . . . . . . . . . . . . . 101<br />

5.6.1 Camera calibration . . . . . . . . . . . . . . . . . . . . . 101<br />

5.6.2 Registration <strong>and</strong> color correction for a sequence of images 105<br />

5.6.3 Visibility for real time walktrough . . . . . . . . . . . . . 109<br />

6 Comb<strong>in</strong>atorial optimization 113<br />

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br />

6.1.1 Solution methods <strong>in</strong> comb<strong>in</strong>atorial optimization . . . . . . 114<br />

6.1.2 Computational complexity . . . . . . . . . . . . . . . . . 116<br />

6.2 Description of comb<strong>in</strong>atorial problems . . . . . . . . . . . . . . . 116<br />

6.2.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<br />

6.2.2 Adjacency matrix . . . . . . . . . . . . . . . . . . . . . . 117<br />

6.2.3 Incidence matrix . . . . . . . . . . . . . . . . . . . . . . 118<br />

6.3 Dynamic programm<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . 119<br />

6.3.1 Construct<strong>in</strong>g the state of the problem . . . . . . . . . . . 122<br />

6.3.2 Remarks about general problems . . . . . . . . . . . . . . 124<br />

6.3.3 Problems of resource allocation . . . . . . . . . . . . . . 124<br />

6.4 Shortest paths <strong>in</strong> graphs . . . . . . . . . . . . . . . . . . . . . . . 125<br />

6.4.1 Shortest paths <strong>in</strong> acyclic oriented graphs . . . . . . . . . . 125<br />

6.4.2 Directed graphs possibly with cycles . . . . . . . . . . . . 126<br />

6.4.3 Dijkstra algorithm . . . . . . . . . . . . . . . . . . . . . 128<br />

6.5 Integer programm<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . 128


viii<br />

CONTENTS<br />

6.5.1 L<strong>in</strong>ear <strong>and</strong> Integer programm<strong>in</strong>g . . . . . . . . . . . . . . 129<br />

6.5.2 M<strong>in</strong>imal cost flow problem . . . . . . . . . . . . . . . . . 130<br />

6.5.3 L<strong>in</strong>ear programs with <strong>in</strong>teger solutions . . . . . . . . . . 131<br />

6.5.4 Branch-<strong>and</strong>-Bound methods . . . . . . . . . . . . . . . . 132<br />

6.6 Applications <strong>in</strong> computer graphics . . . . . . . . . . . . . . . . . 133<br />

6.6.1 Image quantization . . . . . . . . . . . . . . . . . . . . . 133<br />

6.6.2 Interactive visualization . . . . . . . . . . . . . . . . . . 140<br />

6.6.3 Geometric model<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . 146<br />

6.6.4 Surface reconstruction . . . . . . . . . . . . . . . . . . . 150<br />

7 Global <strong>Optimization</strong> 157<br />

7.1 Two Extremes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157<br />

7.2 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158<br />

7.3 Stochastic Relaxation . . . . . . . . . . . . . . . . . . . . . . . . 159<br />

7.4 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 159<br />

7.4.1 Genetic <strong>and</strong> evolution . . . . . . . . . . . . . . . . . . . 160<br />

7.4.2 Genetic algorithms for optimization . . . . . . . . . . . . 161<br />

7.4.3 The travel<strong>in</strong>g salesman problem . . . . . . . . . . . . . . 162<br />

7.4.4 <strong>Optimization</strong> of a univariate function . . . . . . . . . . . 163<br />

7.5 Guaranteed Global <strong>Optimization</strong> . . . . . . . . . . . . . . . . . . 165<br />

7.5.1 Branch-<strong>and</strong>-bound methods . . . . . . . . . . . . . . . . 166<br />

7.5.2 Accelerat<strong>in</strong>g <strong>in</strong>terval branch-<strong>and</strong>-bound methods . . . . . 168<br />

7.5.3 Constra<strong>in</strong>ed optimization . . . . . . . . . . . . . . . . . . 169<br />

7.6 Range analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 170<br />

7.6.1 Interval arithmetic . . . . . . . . . . . . . . . . . . . . . 171<br />

7.6.2 Aff<strong>in</strong>e arithmetic . . . . . . . . . . . . . . . . . . . . . . 175<br />

7.7 Examples <strong>in</strong> Computer <strong>Graphics</strong> . . . . . . . . . . . . . . . . . . 177<br />

7.7.1 Design<strong>in</strong>g a stable table . . . . . . . . . . . . . . . . . . 177<br />

7.7.2 Curve fitt<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . 178<br />

7.7.3 Colllision detection . . . . . . . . . . . . . . . . . . . . . 181<br />

7.7.4 Surface <strong>in</strong>tersection . . . . . . . . . . . . . . . . . . . . . 181<br />

7.7.5 Implicit curves <strong>and</strong> surfaces . . . . . . . . . . . . . . . . 185<br />

7.7.6 Adaptive approximation of parametric surfaces . . . . . . 186<br />

8 Probability <strong>and</strong> <strong>Optimization</strong> 189


CONTENTS<br />

ix<br />

8.1 The Second Law of Thermodynamics . . . . . . . . . . . . . . . 189<br />

8.2 Information <strong>and</strong> Measures . . . . . . . . . . . . . . . . . . . . . 191<br />

8.3 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />

8.4 Properties of Entropy . . . . . . . . . . . . . . . . . . . . . . . . 195<br />

8.5 Symbol Cod<strong>in</strong>g . . . . . . . . . . . . . . . . . . . . . . . . . . . 196<br />

8.6 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 197<br />

8.7 Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 200<br />

8.8 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

8.9 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202<br />

8.10 Cont<strong>in</strong>uous R<strong>and</strong>om Variables . . . . . . . . . . . . . . . . . . . 203<br />

8.11 Pr<strong>in</strong>ciple of Maximum Entropy - MaxEnt . . . . . . . . . . . . . 204<br />

8.12 Distributions obta<strong>in</strong>ed with MaxEnt . . . . . . . . . . . . . . . . 206<br />

8.13 Pr<strong>in</strong>ciples of <strong>Optimization</strong> by Entropy . . . . . . . . . . . . . . . 208<br />

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210


x<br />

CONTENTS


List of Figures<br />

1.1 Computer graphics: transform<strong>in</strong>g data <strong>in</strong>to images. . . . . . . . . 1<br />

1.2 Computer graphics: related areas. . . . . . . . . . . . . . . . . . 3<br />

1.3 Animation <strong>and</strong> related areas. . . . . . . . . . . . . . . . . . . . . 4<br />

1.4 The four-universe paradigm. . . . . . . . . . . . . . . . . . . . . 6<br />

1.5 Aboboral mounta<strong>in</strong>, scale 1:50000 (Yamamoto, 1998). . . . . . . 8<br />

1.6 A grid on the function doma<strong>in</strong>. . . . . . . . . . . . . . . . . . . . 9<br />

1.7 Grayscale image <strong>and</strong> the graph of the image function. . . . . . . . 10<br />

1.8 Circle with normal <strong>and</strong> tangent vector fields. . . . . . . . . . . . . 12<br />

1.9 Representation <strong>and</strong> reconstruction. . . . . . . . . . . . . . . . . . 15<br />

1.10 Uniform sampl<strong>in</strong>g the circle with reconstruction. . . . . . . . . . 16<br />

1.11 Different reconstructions from the same terra<strong>in</strong> data. . . . . . . . 17<br />

1.12 Ambiguous representation. . . . . . . . . . . . . . . . . . . . . . 19<br />

2.1 Three examples of solution set. . . . . . . . . . . . . . . . . . . . 24<br />

2.2 Discrete solution sets. . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.3 Polygonal representation of a circle. . . . . . . . . . . . . . . . . 31<br />

2.4 Well-posed formulation of a problem. . . . . . . . . . . . . . . . 33<br />

3.1 Ambiguous reconstructions. . . . . . . . . . . . . . . . . . . . . 42<br />

3.2 Inverse specification to frame a po<strong>in</strong>t <strong>in</strong> the image (Gomes &<br />

<strong>Velho</strong>, 1998). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

3.3 Image deformation. . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.4 Image metamorphosis. . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.5 Correct<strong>in</strong>g distortions. . . . . . . . . . . . . . . . . . . . . . . . 48<br />

3.6 Punctual specification. . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.7 Edges of an image. . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

xi


xii<br />

LIST OF FIGURES<br />

3.8 Edge detection us<strong>in</strong>g snakes: (a) <strong>in</strong>itial curve; (b) f<strong>in</strong>al edge curve<br />

(P. Brigger & Unser, 1998). . . . . . . . . . . . . . . . . . . . . . 52<br />

4.1 M<strong>in</strong>imum-length path on a surface . . . . . . . . . . . . . . . . . 56<br />

4.2 Detect<strong>in</strong>g a boundary segment . . . . . . . . . . . . . . . . . . . 57<br />

4.3 Discretiz<strong>in</strong>g a variational problem . . . . . . . . . . . . . . . . . 59<br />

4.4 Attractors: punctual (a) <strong>and</strong> directional (b). . . . . . . . . . . . . 62<br />

4.5 Basis of B-spl<strong>in</strong>e curves. . . . . . . . . . . . . . . . . . . . . . . 64<br />

4.6 Control po<strong>in</strong>ts <strong>and</strong> B-spl<strong>in</strong>e curves. . . . . . . . . . . . . . . . . . 64<br />

4.7 Interactive variational model<strong>in</strong>g. . . . . . . . . . . . . . . . . . . 67<br />

5.1 Descent directions. . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

5.2 Quadratic function with a unique local m<strong>in</strong>imum. . . . . . . . . . 75<br />

5.3 Quadratic function without local extrema. . . . . . . . . . . . . . 76<br />

5.4 Quadratic function without local extrema. . . . . . . . . . . . . . 77<br />

5.5 Quadratic function with an <strong>in</strong>f<strong>in</strong>ite number of local extrema. . . . 78<br />

5.6 L<strong>in</strong>ear regression. . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

5.7 Geometric <strong>in</strong>terpretation of the pseudo-<strong>in</strong>verse. . . . . . . . . . . 81<br />

5.8 Gradient descent. . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br />

5.9 d e d ′ are conjugate directions. . . . . . . . . . . . . . . . . . . . 87<br />

5.10 Equality constra<strong>in</strong>t. . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

5.11 Inequality constra<strong>in</strong>t. . . . . . . . . . . . . . . . . . . . . . . . . 90<br />

5.12 Projected gradient algorithm. . . . . . . . . . . . . . . . . . . . . 95<br />

5.13 Optimal solutions of l<strong>in</strong>ear programs. . . . . . . . . . . . . . . . 98<br />

5.14 The p<strong>in</strong>hole camera. . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />

5.15 Camera coord<strong>in</strong>ate systems. . . . . . . . . . . . . . . . . . . . . 103<br />

5.16 A panoramic image. . . . . . . . . . . . . . . . . . . . . . . . . . 106<br />

5.17 Image sequence with overlapp<strong>in</strong>g. . . . . . . . . . . . . . . . . . 107<br />

5.18 Visualization of build<strong>in</strong>g enviroments. . . . . . . . . . . . . . . . 110<br />

6.1 Network of connections between A <strong>and</strong> B . . . . . . . . . . . . . 115<br />

6.2 Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<br />

6.3 Oriented graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br />

6.4 Network of connections between A <strong>and</strong> B . . . . . . . . . . . . . 121<br />

6.5 State variable of the problem. . . . . . . . . . . . . . . . . . . . . 122


LIST OF FIGURES<br />

xiii<br />

6.6 Monocromatic discrete image. . . . . . . . . . . . . . . . . . . . 134<br />

6.7 Quantization levels <strong>and</strong> graph of a one-dimensional quantization<br />

map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135<br />

6.8 Quantization from 256 (a) to 4 (b) gray levels. . . . . . . . . . . . 139<br />

6.9 Spock represented with 3 levels of detail. . . . . . . . . . . . . . 141<br />

6.10 Human body with three levels of cluster<strong>in</strong>g. . . . . . . . . . . . . 143<br />

6.11 Visualization process <strong>in</strong> two stages. . . . . . . . . . . . . . . . . 145<br />

6.12 Neighborhoods of a pixel. . . . . . . . . . . . . . . . . . . . . . . 149<br />

6.13 Elementary triangle. . . . . . . . . . . . . . . . . . . . . . . . . . 152<br />

6.14 Toroidal graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . 153<br />

7.1 Local maxima × global maxima <strong>in</strong> hill climb<strong>in</strong>g (Beasley et al. ,<br />

1993). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158<br />

7.2 Sexual reproduction by cromossomial crossover (Beasley et al. ,<br />

1993). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162<br />

7.3 Graph of f(x) = x s<strong>in</strong>(10πx) + 1 on [−1, 2]. . . . . . . . . . . . 164<br />

7.4 Doma<strong>in</strong> decomposition performed by branch-<strong>and</strong>-bound. . . . . . 168<br />

7.5 Genetic formulation for the design of a stable table (Bentley, 1999).177<br />

7.6 Crossed reproduction of algebraic expressions. . . . . . . . . . . 179<br />

7.7 Curve fitt<strong>in</strong>g by evolution. . . . . . . . . . . . . . . . . . . . . . 180<br />

7.8 Intersection of parametric surfaces (Snyder, 1991). . . . . . . . . 182<br />

7.9 Trimm<strong>in</strong>g curves (Snyder, 1991). . . . . . . . . . . . . . . . . . . 182<br />

7.10 Intersection of parametric surfaces. . . . . . . . . . . . . . . . . . 183<br />

7.11 Intersection curves computed with <strong>in</strong>terval arithmetic <strong>and</strong> aff<strong>in</strong>e<br />

arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184<br />

7.12 Approximat<strong>in</strong>g an implicit curve (Snyder, 1991). . . . . . . . . . 185<br />

7.13 Approximat<strong>in</strong>g an implicit curve with <strong>in</strong>terval arithmetic <strong>and</strong><br />

aff<strong>in</strong>e arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . 186<br />

7.14 Approximat<strong>in</strong>g parametric surfaces (Snyder, 1991). . . . . . . . . 187<br />

8.1 Decomposition from the choice of three possibilities. . . . . . . . 194<br />

8.2 Graph of S = −p log 2 p − (1 − p) log 2 (1 − p) . . . . . . . . . . . 195<br />

8.3 Codification trees (a) α (b) β. . . . . . . . . . . . . . . . . . . . . 198<br />

8.4 Trees for schemes (a) <strong>and</strong> (b) . . . . . . . . . . . . . . . . . . . . 199


xiv<br />

LIST OF FIGURES


Chapter 1<br />

Computer <strong>Graphics</strong><br />

1.1 What is computer graphics?<br />

The usual def<strong>in</strong>ition for computer graphics is the follow<strong>in</strong>g: a set of models, methods<br />

<strong>and</strong> techniques to transform data <strong>in</strong>to images that are displayed <strong>in</strong> a graphics<br />

device. The attempt to def<strong>in</strong>e an area is a difficult, if not impossible, task. Instead<br />

of try<strong>in</strong>g to devise a good def<strong>in</strong>ition, perhaps the best way to underst<strong>and</strong> an area is<br />

through a deep knowledge of its problems <strong>and</strong> the methods to solve them. From<br />

this po<strong>in</strong>t of view, the def<strong>in</strong>ition above has the virtue of emphasiz<strong>in</strong>g a fundamental<br />

problem of computer graphics, that is: the transformation of data <strong>in</strong>to images.<br />

(Figure 1.1).<br />

Figure 1.1: Computer graphics: transform<strong>in</strong>g data <strong>in</strong>to images.<br />

In applied mathematics, the solution of problems is directly related with the<br />

mathematical models used to underst<strong>and</strong> <strong>and</strong> pose the problem. For this reason,<br />

1


2 CHAPTER 1. COMPUTER GRAPHICS<br />

the divid<strong>in</strong>g l<strong>in</strong>e between solved <strong>and</strong> open problems is more subtle than <strong>in</strong> the<br />

case of pure mathematics. In fact, <strong>in</strong> pure mathematics different solutions to the<br />

same problem, <strong>in</strong> general, do not constitute great <strong>in</strong>novations from the scientific<br />

po<strong>in</strong>t of view; on the other h<strong>and</strong>, <strong>in</strong> applied mathematics, different solutions to<br />

the same problem as consequence of the use of different models, usually br<strong>in</strong>g a<br />

significant advance <strong>in</strong> terms of applications.<br />

This book discusses the solution of various problems <strong>in</strong> computer graphics us<strong>in</strong>g<br />

optimization techniques. The underly<strong>in</strong>g idea is to serve as a two-way channel:<br />

on one h<strong>and</strong>, stimulate the computer graphics community to study optimization<br />

methods; <strong>and</strong> on the other h<strong>and</strong>, call attention of the optimization community for<br />

the extremely <strong>in</strong>terest<strong>in</strong>g real world problems <strong>in</strong> computer graphics.<br />

1.1.1 Related areas<br />

S<strong>in</strong>ce its orig<strong>in</strong>, computer graphics is concerned with the study of models, methods<br />

<strong>and</strong> techniques that allow the visualization of <strong>in</strong>formation us<strong>in</strong>g a computer.<br />

Because, <strong>in</strong> practice, there is no limitation to the orig<strong>in</strong> or nature of the data, computer<br />

graphics is used today by researchers <strong>and</strong> users from many different areas<br />

of human activity.<br />

The basic elements the compose the fundamental problem of computer graphics<br />

are data <strong>and</strong> images. There are four related areas that deal with these elements,<br />

as illustrated <strong>in</strong> Figure 1.2.<br />

Geometric model<strong>in</strong>g, deals with the problem of describ<strong>in</strong>g, structur<strong>in</strong>g <strong>and</strong><br />

transform<strong>in</strong>g geometric data <strong>in</strong> the computer.<br />

Visualization, <strong>in</strong>terprets the data created by geometric model<strong>in</strong>g to generate an<br />

image that can be viewed us<strong>in</strong>g a graphical output device.<br />

Image process<strong>in</strong>g, deals with the problem of describ<strong>in</strong>g, structur<strong>in</strong>g <strong>and</strong> transform<strong>in</strong>g<br />

images <strong>in</strong> the computer.<br />

Computer vision, extracts from an <strong>in</strong>put image various types of <strong>in</strong>formation<br />

(geometric, topological, physical, etc) about the objects depicted <strong>in</strong> the image.<br />

In the literature, computer graphics is identifyed with the area called visualization<br />

<strong>in</strong> the diagram of Figure 1.2. We believe it is more convenient to consider<br />

computer graphics as the mother area which encompasses these four sub-areas:<br />

geometric model<strong>in</strong>g, visualization, image process<strong>in</strong>g <strong>and</strong> computer vision. This<br />

is justified, because these related areas, work with the same objects (data <strong>and</strong> im-


1.1. WHAT IS COMPUTER GRAPHICS? 3<br />

Figure 1.2: Computer graphics: related areas.<br />

ages), which forces a strong trend of <strong>in</strong>tegration. Moreover, the solution of certa<strong>in</strong><br />

problems require the use of methods from all these areas under a unified framework.<br />

As an example, we can mention the case of a GIS (Geographical Information<br />

System) application, where a satelite image is used to obta<strong>in</strong> the elevation data of<br />

a terra<strong>in</strong> model <strong>and</strong> a three-dimensional reconstruction is visualized with texture<br />

mapp<strong>in</strong>g from different view po<strong>in</strong>ts.<br />

Comb<strong>in</strong>ed techniques from these four related sub-areas comprise a great potential<br />

to be exploited to obta<strong>in</strong> new results <strong>and</strong> applications of computer graphics.<br />

This br<strong>in</strong>gs to computer graphics a large spectrum of models <strong>and</strong> techniques provid<strong>in</strong>g<br />

a path to obta<strong>in</strong> new achievements: the comb<strong>in</strong>ed result is more than the<br />

sum of parts.<br />

The trend is so vigorous that new research areas have been created. This was<br />

the case of image-based model<strong>in</strong>g <strong>and</strong> render<strong>in</strong>g, a recent area that comb<strong>in</strong>es


4 CHAPTER 1. COMPUTER GRAPHICS<br />

techniques from computer graphics, geometric model<strong>in</strong>g, image process<strong>in</strong>g <strong>and</strong><br />

vision. Furthermore, <strong>in</strong> some application doma<strong>in</strong>s such as GIS <strong>and</strong> Medical Image,<br />

it is natural to use techniques from these related areas, to the po<strong>in</strong>t that there<br />

is not a separation between them.<br />

We <strong>in</strong>tend to show <strong>in</strong> this book how mathematical optimization methods can<br />

be used <strong>in</strong> this unified framework <strong>in</strong> order to pose <strong>and</strong> solve a reasonable number<br />

of important problems.<br />

1.1.2 Is there someth<strong>in</strong>g miss<strong>in</strong>g?<br />

The diagram of Figure 1.2 is classical. But it tells us only part of the computer<br />

graphics script: the area of geometric model<strong>in</strong>g <strong>and</strong> the related areas. In our<br />

holistic view of computer graphics several other areas are miss<strong>in</strong>g on the diagram.<br />

Where is animation? Where is digital video?<br />

In fact it is possible to reproduce a similar diagram for animation, <strong>and</strong> this is<br />

shown <strong>in</strong> Figure 1.3.<br />

Figure 1.3: Animation <strong>and</strong> related areas.


1.2. MATHEMATICAL MODELING AND ABSTRACTION PARADIGMS 5<br />

Note the similarity with the previous diagram. Motion model<strong>in</strong>g is the area that<br />

consists <strong>in</strong> describ<strong>in</strong>g the movement of the objects on the scene. This <strong>in</strong>volves<br />

time, dynamic scene objects, trajectories, <strong>and</strong> so on. This is also called motion<br />

specification. Similarly, we can characterize Motion analysis, Motion synthesis<br />

<strong>and</strong> Motion process<strong>in</strong>g.<br />

We could now extend this framework to digital video, but this is not the best<br />

approach. In fact this repetition process is an <strong>in</strong>dication that some concept is<br />

miss<strong>in</strong>g, a concept that could put together all of these dist<strong>in</strong>ct frameworks <strong>in</strong>to a<br />

s<strong>in</strong>gle one. This is the concept of graphical object.<br />

In fact, once we have devised the proper def<strong>in</strong>ition of a graphical object, we<br />

could associate to it the four dist<strong>in</strong>ct <strong>and</strong> related areas: model<strong>in</strong>g, visualization,<br />

analysis <strong>and</strong> process<strong>in</strong>g of graphical object. The concept of graphical object<br />

should emcompass, <strong>in</strong> particular, the geometric models of any k<strong>in</strong>d, animation<br />

<strong>and</strong> digital video.<br />

We will <strong>in</strong>troduce this concept <strong>and</strong> develop some related properties <strong>and</strong> results<br />

<strong>in</strong> the next two sections.<br />

1.2 <strong>Mathematical</strong> model<strong>in</strong>g <strong>and</strong> abstraction<br />

paradigms<br />

Applied mathematics deals with the application of mathematical methods <strong>and</strong><br />

techniques to problems of the real world. When computer simulations are used <strong>in</strong><br />

the solution of problems the area is called computational mathematics. In order<br />

to use mathematics <strong>in</strong> the solution of real world problems it is necessary to devise<br />

good mathematical models which can be used to underst<strong>and</strong> <strong>and</strong> pose the problem<br />

correctly. These models consist <strong>in</strong> abstractions which associate real objects from<br />

the physical world <strong>in</strong>to mathematical abstract concepts. Follow<strong>in</strong>g the strategy<br />

of divid<strong>in</strong>g to conquer, we should create a hierarchy of abstractions <strong>and</strong> for each<br />

level of the hierarchy we should use the most adequate mathematical models. This<br />

hierarchy is called an abstraction paradigm.<br />

In computational mathematics a very useful abstraction paradigm consists of<br />

devis<strong>in</strong>g four abstraction levels, called universes: the physical universe, the mathematical<br />

universe, the representation universe <strong>and</strong> the implementation universe


6 CHAPTER 1. COMPUTER GRAPHICS<br />

(Figure 1.4).<br />

The physical universe conta<strong>in</strong>s the objects from the real (physical) world which<br />

we <strong>in</strong>tend to study; the mathematical universe conta<strong>in</strong> the abstract mathematical<br />

description of the objects from the physical world; the representation universe<br />

conta<strong>in</strong>s discrete descriptions of the objects from the mathematical universe; <strong>and</strong><br />

the implementation universe conta<strong>in</strong>s data structures <strong>and</strong> mach<strong>in</strong>e models of computation<br />

so that objects from the representation universe can be associated with<br />

algorithms <strong>in</strong> order to obta<strong>in</strong> an implementation <strong>in</strong> the computer.<br />

Figure 1.4: The four-universe paradigm.<br />

Note that us<strong>in</strong>g this paradigm we associate to each object from the real (physical)<br />

world three mathematical models: a cont<strong>in</strong>uous model <strong>in</strong> the mathematical<br />

universe, a discrete model <strong>in</strong> the representation universe <strong>and</strong> a f<strong>in</strong>ite model <strong>in</strong> the<br />

implementation universe. This abstraction paradigm is called the four-universe<br />

paradigm (Gomes & <strong>Velho</strong>, 1995).<br />

In this book we take the computational mathematics approach to computer<br />

graphics. That is, a problem <strong>in</strong> computer graphics will be solved consider<strong>in</strong>g<br />

the four-universe paradigm. This amounts to say<strong>in</strong>g that computer graphics is<br />

<strong>in</strong>deed an area of computational mathematics. As a consequence the four-universe<br />

paradigm will be used throughout the book.<br />

Example 1 (Length representation). Consider the problem of measur<strong>in</strong>g the<br />

length of objects from the real world. To each object we should associate a number<br />

that represents its length. In order to atta<strong>in</strong> this we must <strong>in</strong>troduce a st<strong>and</strong>ard<br />

unit of measure which will be compared with the object to be measured <strong>in</strong> order<br />

to obta<strong>in</strong> its length.<br />

From the po<strong>in</strong>t of view of the four-universe paradigm the mathematical universe<br />

is the set R of real numbers. In fact, to each measure we associate a real<br />

number; rational numbers correspond to objects which are commensurable with<br />

the adopted unit of measure, <strong>and</strong> irrational numbers are used to represent the<br />

length of objects which are <strong>in</strong>commensurable.


1.2. MATHEMATICAL MODELING AND ABSTRACTION PARADIGMS 7<br />

In order to represent the different length values (real numbers) we must look for<br />

a discretization of the set of real numbers. A widely used tecnhique is the float<strong>in</strong>g<br />

po<strong>in</strong>t representation. Note that <strong>in</strong> this representation the set of real numbers is<br />

discretized <strong>in</strong> a f<strong>in</strong>ite set of rational numbers. In particular this implies that the<br />

concept of commensurability is lost when we pass from the mathematical to the<br />

representation universe.<br />

From the po<strong>in</strong>t of view of the implementation universe the float<strong>in</strong>g po<strong>in</strong>t representation<br />

can be atta<strong>in</strong>ed by used the IEEE st<strong>and</strong>ard, which is implement <strong>in</strong> most<br />

of the computers. A good reference for this topic is (Higham, 1996).<br />

The previous example, although simple, illustrates the fundamental problem of<br />

model<strong>in</strong>g <strong>in</strong> computational mathematics. Of particular importance <strong>in</strong> this example<br />

is the loss of <strong>in</strong>formation we face when mov<strong>in</strong>g from the mathematical universe<br />

to the represenation universe (from real numbers to their float<strong>in</strong>g po<strong>in</strong>t representation),<br />

where the concept of commensurability is lost. In general, pass<strong>in</strong>g from<br />

the mathematical universe to the representation universe implies <strong>in</strong> a loss of <strong>in</strong>formation.<br />

This is a very subtle problem that we must face when work<strong>in</strong>g with<br />

model<strong>in</strong>g <strong>in</strong> computational mathematics, <strong>and</strong>, <strong>in</strong> particular, <strong>in</strong> computer graphics.<br />

The model<strong>in</strong>g process consists <strong>in</strong> choos<strong>in</strong>g an object from the physical world,<br />

associate to it a mathematical model, discretize it <strong>and</strong> implement it. This implementation<br />

should provide solutions to properly posed problems <strong>in</strong>volv<strong>in</strong>g the<br />

<strong>in</strong>itial physical object. Note that the loss of <strong>in</strong>formation we mentioned above is a<br />

critical factor <strong>in</strong> this cha<strong>in</strong>.<br />

We should remark that <strong>in</strong> the model<strong>in</strong>g process the association of an object<br />

from an abstraction level (universe) to the other is the most generic possible. In<br />

fact, the same object from one abstraction level may have dist<strong>in</strong>ct correspondents<br />

<strong>in</strong> the next abstraction level (we leave to the reader the task of provid<strong>in</strong>g examples<br />

for this). Instead, <strong>in</strong> the next two sections we will provide two examples which<br />

show that different objects from the physical world may be described by the same<br />

mathematical model.<br />

1.2.1 Terra<strong>in</strong> representation<br />

Consider the problem of represent<strong>in</strong>g a terra<strong>in</strong> <strong>in</strong> the computer (a mounta<strong>in</strong> for<br />

<strong>in</strong>stance). In cartography a terra<strong>in</strong> may be described us<strong>in</strong>g a height map: we


8 CHAPTER 1. COMPUTER GRAPHICS<br />

devise a reference level (e.g. the sea level) <strong>and</strong> we take for each po<strong>in</strong>t the terra<strong>in</strong><br />

height at this po<strong>in</strong>t. In the mathematical universe this map of heights corresponds<br />

to a function f : U ⊂ R 2 → R, z = f(x, y), where (x, y) are the coord<strong>in</strong>ates of a<br />

po<strong>in</strong>t of the plane R 2 <strong>and</strong> z is the correspond<strong>in</strong>g height. Geometrically, the terra<strong>in</strong><br />

is described by the graph G(f) of the height function f:<br />

G(f) = {(x, y, f(x, y)); (x, y) ∈ U}.<br />

Figure 1.5 shows a sketch of the graph of the height function of part of the Aboboral<br />

mounta<strong>in</strong> <strong>in</strong> the state of São Paulo, Brazil (Yamamoto, 1998).<br />

Figure 1.5: Aboboral mounta<strong>in</strong>, scale 1:50000 (Yamamoto, 1998).<br />

How can we represent the terra<strong>in</strong>? A simple method consists <strong>in</strong> tak<strong>in</strong>g a uniform<br />

partition<br />

P x = {x 0 < x 1 < · · · < x n }<br />

of the x-axis 1 , a uniform partition<br />

P y = {y 0 < y 1 < · · · < y m },<br />

1 Uniform means that ∆ j = x j − x j−1 is constant for any j = 1, . . . , n.


1.2. MATHEMATICAL MODELING AND ABSTRACTION PARADIGMS 9<br />

of the y-axis, <strong>and</strong> take the cartesian product of both partitions to obta<strong>in</strong> the grid of<br />

po<strong>in</strong>ts<br />

(x i , y j ), i = 1, . . . , n, j = 1, . . . , m<br />

<strong>in</strong> the plane as depicted <strong>in</strong> Figure 1.6. In each vertex (x i , y j ) of the grid, we<br />

take the value of the function z ij = f(x i , y j ) <strong>and</strong> the terra<strong>in</strong> representation if<br />

def<strong>in</strong>ed by the height matrix (z ij ). This representation is called representation by<br />

uniform sampl<strong>in</strong>g because we take the vertices of a uniform grid <strong>and</strong> we sample<br />

the function at these po<strong>in</strong>ts. The implementation can be easily atta<strong>in</strong>ed us<strong>in</strong>g a<br />

matrix as a data structure.<br />

Figure 1.6: A grid on the function doma<strong>in</strong>.<br />

1.2.2 Image representation<br />

Now we will consider the problem of image model<strong>in</strong>g. We will take a photograph<br />

as the image model <strong>in</strong> the real world. This physical image model is characterized<br />

by two properties:<br />

• it has a support set (a rectangular piece of paper);<br />

• there is a color attribute associated to each po<strong>in</strong>t of the support set.


10 CHAPTER 1. COMPUTER GRAPHICS<br />

We will suppose that we have a black <strong>and</strong> white photo, this means that we<br />

associate a different lum<strong>in</strong>ance of black (gray value) to each po<strong>in</strong>t of the support<br />

set. This is called a grayscale image <strong>in</strong> the computer graphics literature. In this<br />

case the “color” of each po<strong>in</strong>t can be modeled by a real number <strong>in</strong> the <strong>in</strong>terval<br />

[0, 1], where 0 represents black, 1 represents white <strong>and</strong> each number 0 < t < 1<br />

represents a gray value <strong>in</strong> between. The rectangular support set of the image can<br />

be modeled <strong>in</strong> the mathematical universe as a rectangular subset U ⊂ R 2 of the<br />

plane.<br />

Therefore, the mathematical model of a grayscale image is a function f : U ⊂<br />

R 2 → R, that maps each po<strong>in</strong>t (x, y) to the value z = f(x, y) which gives the<br />

correspond<strong>in</strong>g gray value at the po<strong>in</strong>t. This function is called the image function.<br />

Figure 1.7 shows an image on the left <strong>and</strong> the graph of the correpond<strong>in</strong>g image<br />

function on the right.<br />

Figure 1.7: Grayscale image <strong>and</strong> the graph of the image function.<br />

In sum, we see that an image <strong>and</strong> a terra<strong>in</strong> can be modeled by the same mathematical<br />

object: a function f : U ⊂ R 2 → R. Therefore we can use uniform<br />

sampl<strong>in</strong>g to represent the image as we did for the terra<strong>in</strong>.<br />

The representation of a terra<strong>in</strong> <strong>and</strong> an image by a function is not accidental. In<br />

fact a huge number of objects from the physical world are represented by functions<br />

f : X → Y where X <strong>and</strong> Y are properly choosen topological spaces (<strong>in</strong> general<br />

they are subsets of euclidean spaces). On the other h<strong>and</strong>, <strong>in</strong> the next section we<br />

will characterize a graphical object by a function.


1.3. GRAPHICAL OBJECTS 11<br />

1.3 Graphical Objects<br />

In the quest for a better def<strong>in</strong>ition of computer graphics, we could def<strong>in</strong>e: computer<br />

graphics is the area that deals with the description, analysis <strong>and</strong> process<strong>in</strong>g<br />

of graphical objects. This def<strong>in</strong>ition only makes sense if we are able to devise a<br />

precise def<strong>in</strong>ition of the concept of a graphical object. We do this as follows: A<br />

graphical object is a subset S ⊂ R m together with a function f : S ⊂ R m → R n .<br />

The set S is called geometric support, <strong>and</strong> f is called attribute function of the<br />

graphical object. The dimension of the geometric support S is called dimension<br />

of the graphical object. This concept of graphical object was <strong>in</strong>troduced <strong>in</strong> the<br />

literature by (Gomes et al. , 1996).<br />

Let’s take a look at some concrete examples to clarify the def<strong>in</strong>ition.<br />

Example 2 (Subsets of euclidean space). Any subset of the euclidean space R m<br />

is a graphical object. Indeed, given S ⊂ R m , we def<strong>in</strong>e immediately an attribute<br />

function<br />

{<br />

1 if p ∈ S,<br />

f(p) =<br />

0 if p /∈ S.<br />

It is clear that p ∈ S if, <strong>and</strong> only if, f(p) = 1. In general, the values of f(p) = 1<br />

are associated with a color, called object color. The attribute function <strong>in</strong> this<br />

case simply characterizes the po<strong>in</strong>ts of the set S, <strong>and</strong> for this reason, it is called<br />

characteristic function of the graphical object.<br />

The characteristic function completely def<strong>in</strong>es the geometric support of the<br />

graphical object, that is, if p is a po<strong>in</strong>t of the space R m , then p ∈ S if, <strong>and</strong> only if,<br />

f(p) = 1.<br />

The two problems below are, therefore, equivalent:<br />

1. devise an algorithm to compute f(p) at any po<strong>in</strong>t p ∈ R m ;<br />

2. decide if a po<strong>in</strong>t p ∈ R n belongs to the geometric support S of the graphical<br />

object.<br />

The second problem is called po<strong>in</strong>t-membership classification problem. A significant<br />

part of the problems <strong>in</strong> the study of graphical objects reduce to a solution<br />

of po<strong>in</strong>t-membership classification. Therefore, the existence of robust <strong>and</strong> efficient<br />

algorithms to solve this problem is of great importance.


12 CHAPTER 1. COMPUTER GRAPHICS<br />

Example 3 (Image). A grayscale image (see Section1.2.2) is a function f : U ⊂<br />

R 2 → R, therefore it is a graphical object. More generically, an image is a function<br />

f : U ⊂ R 2 → R n , where R n is the representation of a color space. In this<br />

way, we see that an image is a graphical object whose geometric support is the<br />

subset U of the plane (<strong>in</strong> general, a rectangle), <strong>and</strong> the attribute function associates<br />

a color to each po<strong>in</strong>t of the plane.<br />

Example 4 (Circle <strong>and</strong> vector field). Consider the unit circle S 1 centered at the<br />

orig<strong>in</strong>, whose equation is given by<br />

x 2 + y 2 = 1.<br />

A mapp<strong>in</strong>g of the plane N : R 2 → R 2 given by N(x, y) = (x, y), def<strong>in</strong>es a field<br />

of unit vectors normal to S 1 . The map T : R 2 → R 2 , given by T (x, y) = (y, −x)<br />

def<strong>in</strong>es a field of vectors tangent to the circle. (Figure 1.8).<br />

Figure 1.8: Circle with normal <strong>and</strong> tangent vector fields.<br />

The circle is a one-dimensional graphical object of the plane, <strong>and</strong> the two vector<br />

fields are attributes of the circle (they can represent, for example, physical<br />

attributes such as tangential <strong>and</strong> radial acceleration). The attribute function is<br />

given by f : S 1 → R 4 = R 2 ⊕ R 2 , f(p) = (T (p), N(p)).


1.4. REVISITING THE DIAGRAM 13<br />

1.4 Revisit<strong>in</strong>g the diagram<br />

Now that we have discussed the concept of graphical object, it is time to go back<br />

to the diagram of Figure 1.2.<br />

Every output graphics device has an <strong>in</strong>ternal representation of the class of<br />

graphical objects that the device is able to display. The space of graphical objects<br />

with this representation is called the representation universe of the graphical<br />

display. As an example, the representation universe of a regular personal computer<br />

(graphics card + monitor) is a matrix representation of an image. Thus, for<br />

each output graphics device, the operation of visualization on the device consists<br />

of an operator R: O → O ′ from the space of graphical objects to be displayed, to<br />

the representation universe of the graphics display.<br />

Besides the generic visualization operation described above, we have the operations<br />

of graphical object description, graphical object process<strong>in</strong>g <strong>and</strong> graphical<br />

object analysis. This completes the elements to obta<strong>in</strong> a generic diagram for<br />

graphical objects, the conta<strong>in</strong>s, as particular cases, the diagrams <strong>in</strong> Figures 1.2<br />

<strong>and</strong> 1.3.<br />

1.5 Description, representation <strong>and</strong> reconstruction<br />

In this section we discuss the three most important problems related with graphical<br />

object:<br />

• How to describe a, cont<strong>in</strong>uous, graphical object?<br />

• How to discretize a graphical object?<br />

• How to obta<strong>in</strong> a cont<strong>in</strong>uous graphical object from one of its discretizations?<br />

1.5.1 Description<br />

Describ<strong>in</strong>g a graphical object consists <strong>in</strong> devis<strong>in</strong>g its (cont<strong>in</strong>uous) mathematical<br />

def<strong>in</strong>ition. Therefore, this is a problem posed <strong>in</strong> the mathematical universe. Depend<strong>in</strong>g<br />

on the nature of the graphical object this topic an important one <strong>in</strong> different<br />

areas of computer graphics. The description of the geometry <strong>and</strong> topology of


14 CHAPTER 1. COMPUTER GRAPHICS<br />

objects is coped with <strong>in</strong> geometric model<strong>in</strong>g, the description of motion is studyed<br />

<strong>in</strong> the area of animation, <strong>and</strong> so on.<br />

Independently of the nature of the graphical object, there are essentially three<br />

mathematical techniques to describe a graphical object: parametric description,<br />

implicit description <strong>and</strong> piecewise description. For more details about these methods<br />

the reader should take a look at (<strong>Velho</strong> et al. , 2002).<br />

1.5.2 Representation<br />

The representation of graphical objects constitute a separate chapter <strong>in</strong> computer<br />

graphics. Because discretization is the essence of computational methods, representation<br />

techniques assume great importance <strong>in</strong> the area.<br />

We have po<strong>in</strong>ted out that the representation process implies a loss of <strong>in</strong>formation<br />

about the graphical object. In fact represent<strong>in</strong>g a graphical object amounts to<br />

represent<strong>in</strong>g its geometric support <strong>and</strong> attribute function. Therefore we may loose<br />

geometry as well as some attribute <strong>in</strong>formation.<br />

From the mathematical po<strong>in</strong>t of view, we can formulate this problem <strong>in</strong> the<br />

follow<strong>in</strong>g way: Given an space of graphical objects O 1 <strong>and</strong> an space of discrete<br />

graphical objects O 2 , an operator R: O 1 → O 2 is called a representation operator.<br />

This operator associates to each graphical object x ∈ O 1 , its discrete<br />

representation R(x) ∈ O 2 .<br />

1.5.3 Reconstruction<br />

An important problem consists <strong>in</strong> recover<strong>in</strong>g the cont<strong>in</strong>uous graphical object <strong>in</strong><br />

the mathematical universe from its representation. This is called the reconstruction<br />

problem.<br />

Given an object y ∈ O 2 , a reconstruction of y is a graphical object x ∈ O 1<br />

such that R(x) = y. The reconstruction of y is denoted by R + (y). When R<br />

is an <strong>in</strong>vertible operator, we have that R + (y) = R −1 (y). Note, however, that<br />

<strong>in</strong>vertibility is not a necessary condition for reconstruction. In fact, it suffices that<br />

the operator R possesses a left <strong>in</strong>verse, R + (R(x)) = x, to make reconstruction<br />

possible.<br />

We should remark that the representation problem is a direct problem, while the


1.5. DESCRIPTION, REPRESENTATION AND RECONSTRUCTION 15<br />

reconstruction problem is an <strong>in</strong>verse problem. 2 A representation that allows more<br />

than one possibility of reconstruction is called an ambiguous representation. 3 . In<br />

this case, the reconstruction problem is ill-posed (it has more than one solution).<br />

Although the representation problem constitutes a direct problem, the computation<br />

of the representation operator could lead to an <strong>in</strong>verse problem. An example<br />

of such a situation can be seen <strong>in</strong> the po<strong>in</strong>t sampl<strong>in</strong>g representation of an implicit<br />

shape, F −1 (0), where F : R 3 → R. In this case, the representation consists <strong>in</strong><br />

obta<strong>in</strong><strong>in</strong>g solutions of the equation F (x, y, z) = 0, which constitutes an <strong>in</strong>verse<br />

problem.<br />

Figure 1.9: Representation <strong>and</strong> reconstruction.<br />

The representation takes a cont<strong>in</strong>uous graphical object O <strong>and</strong> associates to it<br />

a discrete representation O d . The reconstruction process does the opposite, that<br />

is, from the discrete representation O d we obta<strong>in</strong> a reconstructed graphical object<br />

O r . When O d = O r we say that we have exact reconstruction, <strong>and</strong> the representation<br />

is also called exact. We should remark however that <strong>in</strong> general we do<br />

not have exact reconstruction, that is, <strong>in</strong> general we obta<strong>in</strong> upon reconstruction<br />

2 Direct <strong>and</strong> <strong>in</strong>verse problems are discussed <strong>in</strong> Chapter 1.<br />

3 It would be more correct to adopt the term “ambiguous reconstruction”, but the term “ambiguous<br />

representation” is already widely adopted <strong>in</strong> computer graphics.


16 CHAPTER 1. COMPUTER GRAPHICS<br />

only an approximation O r ≃ O, <strong>and</strong> this approximation is enough for most of the<br />

applications.<br />

The reader might be <strong>in</strong>trigued because we are talk<strong>in</strong>g about a cont<strong>in</strong>uous graphical<br />

object <strong>in</strong> the context of computer applications. How is it possible to have a<br />

cont<strong>in</strong>uous graphical object on the computer? Let’s make this clear: we say that<br />

a graphical object f : U → R n is cont<strong>in</strong>uous when we are able acess any po<strong>in</strong>t<br />

x ∈ U, <strong>and</strong> compute the value f(x) of the attribute function at that po<strong>in</strong>t.<br />

Example 5 (Circle). Consider the unit circle S 1 = {(x, y) ∈ R 2 ; x 2 + y 2 =<br />

1} of the plane. A representation of S 1 can be obta<strong>in</strong>ed as follows: take the<br />

parameterization ϕ: [0, 1] → S 1 , ϕ(θ) = (cos 2πθ, 2π s<strong>in</strong> θ), a partition 0 =<br />

θ 1 < θ 2 L · · · < θ n , <strong>and</strong> the representation is given by the sample vector Sr 1 =<br />

(ϕ(θ 1 ), . . . , ϕ(θ n )) (Figure 1.10(a) shows a representation for n = 5). A nonexact<br />

reconstruction of the circle from Sr 1 is given by the polygon with vertices<br />

ϕ(θ 1 ), . . . , ϕ(θ n ), ϕ(θ 1 ) (Figure 1.10(b) illustrates this reconstruction for n = 5)<br />

(a)<br />

(b)<br />

Figure 1.10: Uniform sampl<strong>in</strong>g the circle with reconstruction.<br />

On the other h<strong>and</strong>, we know that the circle is completely def<strong>in</strong>ed by three dist<strong>in</strong>ct<br />

po<strong>in</strong>ts, therefore, by tak<strong>in</strong>g a three sample representation we are able to obta<strong>in</strong><br />

an exact representation of the circle.<br />

We should remark that exact representations, although very useful <strong>in</strong> the solution<br />

of many problems, do not constitute the “sa<strong>in</strong>t graal”. In fact, the polygonal<br />

representation of the circle, <strong>in</strong> spite of be<strong>in</strong>g non-exact, is very useful to visualize<br />

it (i.e. draw the circle on some output graphics device). In fact, draw<strong>in</strong>g l<strong>in</strong>e segments<br />

is a very effective operation for any output graphical device; on the other


1.5. DESCRIPTION, REPRESENTATION AND RECONSTRUCTION 17<br />

h<strong>and</strong>, as the number of samples <strong>in</strong>crease, the reconstructed polygon converges<br />

uniformly to the circle, thus the reconstructed object provides a good approximation<br />

to the circle both visually <strong>and</strong> numerically.<br />

Example 6 (Terra<strong>in</strong>). The uniform representation of a terra<strong>in</strong> (see Section 1.2.1),<br />

conta<strong>in</strong>s only height <strong>in</strong>formation of the terra<strong>in</strong> <strong>in</strong> a f<strong>in</strong>ite number of po<strong>in</strong>ts. This<br />

implies a severe loss of <strong>in</strong>formation about the terra<strong>in</strong>. Is it possible to reconstruct<br />

the terra<strong>in</strong> data from its uniform representation? This is a very important question<br />

that can be restated as: is it possible to devise an <strong>in</strong>terpolation techique that allows<br />

us to <strong>in</strong>terpolate the uniform samples of the height function <strong>in</strong> order to obta<strong>in</strong> the<br />

terra<strong>in</strong> height at any po<strong>in</strong>t (x, y) of the doma<strong>in</strong>? Shannon Theorem (see (Gomes<br />

& <strong>Velho</strong>, 1997)) provides a very general answer to this question.<br />

Figure 1.11 shows two different reconstructions from the same terra<strong>in</strong> data.<br />

Note that the reconstructed terra<strong>in</strong> on the left has more geometric detail.<br />

Figure 1.11: Different reconstructions from the same terra<strong>in</strong> data.<br />

1.5.4 Semantics <strong>and</strong> Reconstruction<br />

The reconstruction o a graphical object determ<strong>in</strong>es its semantics. Therefore, it<br />

is of great importance <strong>in</strong> the various processes of computer graphics. We can<br />

listen to a sound when it is reconstructed by a loudspeaker; we can see an image,<br />

when it is reconstructed on the screen of a graphics display or pr<strong>in</strong>ted on paper.<br />

Therefore, reconstruction techniques are present, at least, whenever we need to<br />

“display” <strong>in</strong>formation about the graphical object.


18 CHAPTER 1. COMPUTER GRAPHICS<br />

For the non-conv<strong>in</strong>ced reader, we provide below four reasons to highlight the<br />

importance of reconstruction techniques:<br />

1. When we need to obta<strong>in</strong> an alternative representation of an object, we can<br />

reconstruct it <strong>and</strong> then, build a new representation from the reconstructed<br />

object.<br />

2. The reconstruction is useful when we need to work <strong>in</strong> the cont<strong>in</strong>uous doma<strong>in</strong><br />

to m<strong>in</strong>imize numerical errors.<br />

3. In the visualization process the output graphics device performs a reconstruction<br />

of the graphical object from its representation.<br />

4. In general, the user describes a graphical object directly on some representation,<br />

because an user <strong>in</strong>terface allows only the <strong>in</strong>put of a f<strong>in</strong>ite set of<br />

parameters. The object must be reconstructed from this “user representation”.<br />

In this way, reconstruction methods play a fundamental role <strong>in</strong> visualization. It<br />

is desirable to avoid ambiguous representations, which can make the reconstruction<br />

problem non-unique. A geometric example of an ambiguous representation is<br />

shown <strong>in</strong> Figure 1.12, where the object <strong>in</strong> image (a) is represented by a wireframe<br />

model. In images (b), (c) <strong>and</strong> (d), we can see three possible reconstructions of this<br />

object.<br />

The reconstruction operation is totally dependent of the representation <strong>and</strong>,<br />

<strong>in</strong> general, it is difficult to be computed. Exact representations are rare. In the<br />

follow<strong>in</strong>g chapters we will <strong>in</strong>vestigate how optimization techniques allow us to<br />

obta<strong>in</strong> good reconstructions of graphical objects.


1.5. DESCRIPTION, REPRESENTATION AND RECONSTRUCTION 19<br />

(a)<br />

(b)<br />

(c)<br />

(d)<br />

Figure 1.12: Ambiguous representation.


20 CHAPTER 1. COMPUTER GRAPHICS


Chapter 2<br />

<strong>Optimization</strong>: an overview<br />

2.1 What is optimization?<br />

Intuitively, optimization refers to the class of problems that consists <strong>in</strong> choos<strong>in</strong>g<br />

the best among a set of alternatives.<br />

Even <strong>in</strong> this simple, imprecise statement, one can identify the two fundamental<br />

elements of an optimization problem: best, that conveys a choice of criterium<br />

used to choose the solution; this is usually expressed by means of a function, that<br />

should be m<strong>in</strong>imized or maximized; alternatives, that refers to the set of possible<br />

solutions that must be satisfied by any c<strong>and</strong>idate solution. A simple example will<br />

help us clarify these remarks.<br />

Example 7 (Hotel-to-conference problem). F<strong>in</strong>d the best street path to go from<br />

the hotel where you are stay<strong>in</strong>g to the Convention Center. The alternatives here<br />

consists of all of the streets (or parts of streets) that when jo<strong>in</strong>ed provide a path<br />

to go from your hotel to the Convention Center. It is clearly a f<strong>in</strong>ite set (provided<br />

that we avoid paths conta<strong>in</strong><strong>in</strong>g loops). We have different choices for the term best:<br />

• The street path that takes the smallest time;<br />

• The shortest street path;<br />

• The street path that has the best view of the l<strong>and</strong>scape, etc.<br />

21


22 CHAPTER 2. OPTIMIZATION: AN OVERVIEW<br />

From a mathematical viewpo<strong>in</strong>t an optimization problem may be posed as follows:<br />

m<strong>in</strong>imize the function f : S → R, that is, compute the pair (x, f(x)) such<br />

that f(x) is the m<strong>in</strong>imum value element of the set {f(x); x ∈ S}. We will also<br />

use the two simplified forms below:<br />

m<strong>in</strong> f(x) or m<strong>in</strong>{f(x); x ∈ S}.<br />

x∈S<br />

The function f : S → R is called objective function, <strong>and</strong> its doma<strong>in</strong> S, the<br />

set of possible solutions, is called the solution set. We may restrict attention to<br />

m<strong>in</strong>imization problems, s<strong>in</strong>ce any maximization problem can be converted <strong>in</strong>to<br />

a m<strong>in</strong>imization one, by just replac<strong>in</strong>g f by −f. An optimality condition is a<br />

necessary or sufficient condition for the existence of a solution to an optimization<br />

problem.<br />

The nature of the solution set S is quite general as illustrated by the three<br />

examples below.<br />

1. The hotel-to-conference problem from Example 7;<br />

2. Maximize the function f : S 2 ⊂ R 3 → R, def<strong>in</strong>ed by f(x, y, z) = z, where<br />

S 2 is the unit sphere.<br />

3. F<strong>in</strong>d the shortest path that connects two po<strong>in</strong>ts p 1 <strong>and</strong> p 2 of a surface M ⊂<br />

R n .<br />

In the first problem S is a f<strong>in</strong>ite set; <strong>in</strong> the second problem, S = S 2 , a 2-<br />

dimensional surface of R 3 ; <strong>in</strong> the third problem S, is the set of rectifyable curves<br />

of the surface jo<strong>in</strong><strong>in</strong>g the two po<strong>in</strong>ts (<strong>in</strong> general, an <strong>in</strong>f<strong>in</strong>ite dimensional set).<br />

2.1.1 Classification of optimization problems<br />

The classification of optimization problems is very important because it will guide<br />

us to devise strategies <strong>and</strong> techniques to solve the problems from different classes.<br />

<strong>Optimization</strong> problems can be classified accord<strong>in</strong>g to several criteria, related to the<br />

properties of the objective function <strong>and</strong> also of the solution set S. Thus, possible<br />

classifications take <strong>in</strong>to account:<br />

• the nature of the solution set S;


2.2. CONTINUOUS OPTIMIZATION PROBLEMS 23<br />

• the description (def<strong>in</strong>ition) of the solution set S;<br />

• the properties of the objective function f.<br />

The most important classification is the one based on the nature of the solution<br />

set S, that leads us to classify<strong>in</strong>g optimization problems <strong>in</strong>to four classes:<br />

cont<strong>in</strong>uous, discrete, comb<strong>in</strong>atorial <strong>and</strong> variational.<br />

A po<strong>in</strong>t x of a topological space S is an accumulation po<strong>in</strong>t, if, <strong>and</strong> only if, for<br />

any open ball B x , with x ∈ B x , there exists an element y ∈ S such that y ∈ B x .<br />

A po<strong>in</strong>t x ∈ S which is not an accumulation po<strong>in</strong>t is called an isolated po<strong>in</strong>t of S.<br />

Thus, x is isolated if there exists an open ball B x , such that x ∈ B x <strong>and</strong> no other<br />

po<strong>in</strong>t of S belongs to B x .<br />

A topological space is discrete if it conta<strong>in</strong>s no accumulation po<strong>in</strong>ts. It is cont<strong>in</strong>uous<br />

when all of its po<strong>in</strong>ts are accumulation po<strong>in</strong>ts.<br />

Example 8 (Discrete <strong>and</strong> cont<strong>in</strong>uous sets). Any f<strong>in</strong>ite subset of an euclidean<br />

space is obviously discrete. The set R n itself is cont<strong>in</strong>uous. The set Z n =<br />

{(i 1 , . . . , i n ); i j ∈ Z} is discrete. The set {0} ∪ {1/n; n ∈ Z} is not discrete because<br />

0 is an accumulation po<strong>in</strong>t (no matter how small we take a ball with center<br />

at 0, it will conta<strong>in</strong> some element 1/n for some large n); this set is not cont<strong>in</strong>uous<br />

either, because with the exception of 0 all of the po<strong>in</strong>ts are isolated. On the other<br />

h<strong>and</strong>, the set {1/n; n ∈ Z} (without the 0) is discrete.<br />

2.2 Cont<strong>in</strong>uous optimization problems<br />

An optimization problem is called cont<strong>in</strong>uous when the solution set S is a cont<strong>in</strong>uous<br />

subset of R n .<br />

The most common cases occur when S is a differentiable m-dimensional surface<br />

of R n (with or without boundary), or S ⊆ R n is a region of R n :<br />

• Maximize the function f(x, y) = 3x + 4y + 1 on the set S = {(x, y) ∈<br />

R 2 ; x 2 + y 2 = 1}. The solution set S is the unit circle S 1 , which is a curve<br />

of R 2 (i.e. a one-dimensional “surface”, see Figure 2.1(a)).<br />

• Maximize the function f(x, y) = 3x + 4y + 1 on the set S = {(x, y) ∈<br />

R 2 ; x 2 + y 2 ≤ 1}. The solution set is the disk of R 2 , a two-dimensional<br />

surface with boundary (see Figure 2.1(b)).


24 CHAPTER 2. OPTIMIZATION: AN OVERVIEW<br />

• Maximize the function f(x, y) = 3x + 4y + 1 on the set S, def<strong>in</strong>ed by the<br />

<strong>in</strong>equalities x 2 + y 2 ≤ 1 <strong>and</strong> y − x 2 ≥ 0. The solution set is a region,<br />

def<strong>in</strong>ed by the <strong>in</strong>tersection of two surfaces with boundary, as depicted <strong>in</strong><br />

Figure 2.1(c).<br />

(a) (b) (c)<br />

Figure 2.1: Three examples of solution set.<br />

2.3 Discrete optimization problems<br />

An optimization problem is called discrete when the solution set S is a discrete<br />

set (i.e., S has no accumulation po<strong>in</strong>ts).<br />

The most frequent case <strong>in</strong> the applications occurs for S ⊆ Z n =<br />

{(i 1 , . . . , i n ); i n ∈ Z}.<br />

• Maximize the function f(x, y) = x + y on the set S = {(x, y) ∈ Z 2 ; 6x +<br />

7y = 21, x ≥ 0, y ≥ 0}. The solution set S is a f<strong>in</strong>ite set; namely, the set of<br />

po<strong>in</strong>ts with <strong>in</strong>teger coord<strong>in</strong>ates <strong>in</strong> the triangle whose sides are the axes <strong>and</strong><br />

the l<strong>in</strong>e 6x + 7y = 21, see Figure 2.2(a)).<br />

• Maximize the function f(x, y) = (x − 1) 2 + (y − 6) 2 on the set S =<br />

{(x, y); y ≤ x, x ≥ 0, y ≥ 0}. The solution set S is <strong>in</strong>f<strong>in</strong>ite; the problem<br />

asks for f<strong>in</strong>d<strong>in</strong>g, among all po<strong>in</strong>ts with <strong>in</strong>teger coord<strong>in</strong>ates <strong>in</strong> a cone, the<br />

one which is closest to the po<strong>in</strong>t (1, 6) (see Figure 2.2(b)).


2.3. DISCRETE OPTIMIZATION PROBLEMS 25<br />

y<br />

3<br />

2<br />

1<br />

y<br />

6<br />

5<br />

4<br />

3<br />

2<br />

(1, 6)<br />

1<br />

2 3<br />

x<br />

1<br />

1<br />

2 3<br />

4 5<br />

x<br />

(a)<br />

(b)<br />

Figure 2.2: Discrete solution sets.<br />

It is important to remark that some types of discrete problems m<strong>in</strong>{f(x); x ∈ S}<br />

with solution set S, may be solved more easily by embedd<strong>in</strong>g S <strong>in</strong>to a cont<strong>in</strong>uous<br />

doma<strong>in</strong> S ′ , solv<strong>in</strong>g the new, cont<strong>in</strong>uous, optimization problem m<strong>in</strong>{f(x); x ∈<br />

S ′ }, <strong>and</strong> obta<strong>in</strong><strong>in</strong>g the solution of the orig<strong>in</strong>al problem from the solution <strong>in</strong> the<br />

cont<strong>in</strong>uous doma<strong>in</strong>. The example below illustrates this approach.<br />

Example 9. Consider the discrete optimization problem max{f(x) = 3x −<br />

2x 2 ; x ∈ Z}. In order to solve this problem we consider the similiar problem<br />

on R, that is, max{f(x) = 3x − 2x 2 ; x ∈ R}.<br />

The solution of the cont<strong>in</strong>uous problem is simple. In fact, s<strong>in</strong>ce f(x) is a<br />

quadratic function with negative second derivative, it has a unique maximum po<strong>in</strong>t<br />

given by f ′ (x) = 0, that is, 3 − 4x = 0. Therefore, the solution is m = 3/4, <strong>and</strong><br />

the maximum value is 9/8. Now we obta<strong>in</strong> the solution of the discrete problem<br />

from this solution.<br />

Note that the solution obta<strong>in</strong>ed for S = R is not a solution for the discrete<br />

case (because m ∉ Z). But it is possible to compute the solution <strong>in</strong> the discrete<br />

case from the solution of the cont<strong>in</strong>uous problem. For this, we just need to choose<br />

the <strong>in</strong>teger number which is closest to m. It is possible to prove that this provides<br />

<strong>in</strong>deed the solution. In fact, s<strong>in</strong>ce f is an <strong>in</strong>creas<strong>in</strong>g function <strong>in</strong> the <strong>in</strong>terval<br />

[−∞, m], <strong>and</strong> decreas<strong>in</strong>g <strong>in</strong> the <strong>in</strong>terval [m, ∞] <strong>and</strong> its graph is symmetrical with


26 CHAPTER 2. OPTIMIZATION: AN OVERVIEW<br />

respect to the l<strong>in</strong>e x = m, the furthest x is from m, the smaller is the value of<br />

f(x). From this we conclude that the <strong>in</strong>teger n closest to m is <strong>in</strong>deed the po<strong>in</strong>t<br />

where f atta<strong>in</strong>s its maximum on the set Z.<br />

We should remark that <strong>in</strong> general the solution of a discrete problem cannot<br />

be obta<strong>in</strong>ed from a correspond<strong>in</strong>g cont<strong>in</strong>uous problem <strong>in</strong> such a simple way as<br />

shown <strong>in</strong> the example above. As an example, consider the polynomial function<br />

f : R → R, f(x) = x 4 − 14x. It has a m<strong>in</strong>imum at x ≈ 1.52. The closest <strong>in</strong>teger<br />

value to this m<strong>in</strong>imum po<strong>in</strong>t is 2, where f assumes the value −12. Nevertheless<br />

the m<strong>in</strong>imum of f(x) when x is an <strong>in</strong>teger number occurs for x = 1, where f<br />

atta<strong>in</strong>s the value −13.<br />

Thus, although solv<strong>in</strong>g the cont<strong>in</strong>uous version of a discrete problem is part<br />

of the several exist<strong>in</strong>g strategies <strong>in</strong> comput<strong>in</strong>g a solution for a discrete problem,<br />

<strong>in</strong> general f<strong>in</strong>d<strong>in</strong>g the discrete solution requires more specific techniques that go<br />

beyond round<strong>in</strong>g off solutions.<br />

2.4 Comb<strong>in</strong>atorial optimization problems<br />

An optimization problem is said to be comb<strong>in</strong>atorial when its solution set S is<br />

f<strong>in</strong>ite. Usually the elements of S are not explicitly determ<strong>in</strong>ed. Instead, they are<br />

<strong>in</strong>directly specified through comb<strong>in</strong>atorial relations. This allows S to be specified<br />

much more compactly than by simply enumerat<strong>in</strong>g its elements.<br />

In contrast with cont<strong>in</strong>uous optimization, whose study has its roots <strong>in</strong> classical<br />

calculus, the <strong>in</strong>terest <strong>in</strong> solution methods for comb<strong>in</strong>atorial optmization problems<br />

is relatively recent <strong>and</strong> significantly associated with computer technology. Before<br />

computers, comb<strong>in</strong>atorial optmization problems were less <strong>in</strong>terest<strong>in</strong>g, s<strong>in</strong>ce they<br />

admit an obvious method of solution, that consists <strong>in</strong> exam<strong>in</strong><strong>in</strong>g all possible solutions<br />

<strong>in</strong> order to f<strong>in</strong>d the best one. The relative efficiency of the possible methods<br />

of solution was not a very relevant issue: for real problems, <strong>in</strong>volv<strong>in</strong>g sets with<br />

a large number of elements, any solution method, efficient or not, was <strong>in</strong>herently<br />

unfeasible, for requir<strong>in</strong>g a number of operations too large to be done by h<strong>and</strong>.<br />

With computers, the search for efficient solution methods became imperative:<br />

the practical feasibility of solv<strong>in</strong>g a large scale problem by computational methods<br />

depends on the availability of a efficient solution method.


2.5. VARIATIONAL OPTIMIZATION PROBLEMS 27<br />

Example 10 (The Travel<strong>in</strong>g Salesman Problem). This need is well illustrated by<br />

the Travel<strong>in</strong>g Salesman Problem. Given n towns, one wishes to f<strong>in</strong>d the m<strong>in</strong>imum<br />

length route that starts <strong>in</strong> a given town, goes through each one of the others <strong>and</strong><br />

ends at the start<strong>in</strong>g town.<br />

The comb<strong>in</strong>atorial structure of the problem is quite simple. Each possible route<br />

corresponds to one of the (n − 1)! circular permutations of the n towns. Thus, it<br />

suffices to enumerate these permutations, evaluate their lengths, <strong>and</strong> choose the<br />

optimal route.<br />

This, however, becomes unpractical even for moderate values of n. For <strong>in</strong>stance,<br />

for n = 50 there are approximatelly 10 60 permutations to exam<strong>in</strong>e. Even<br />

if 1 billion of them were evaluated per second, exam<strong>in</strong><strong>in</strong>g all would require about<br />

10 51 seconds, or 10 43 years! However, there are techniques that allow solv<strong>in</strong>g this<br />

problem, <strong>in</strong> practice, even for larger values of n.<br />

2.5 Variational optimization problems<br />

An optimization problem is called a variational problem when its solution set S<br />

is an <strong>in</strong>f<strong>in</strong>ite dimensional subset of a space of functions.<br />

Among the most important examples we could mention the path <strong>and</strong> surface<br />

problems. The problems consist <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g the “best path” (“best surface”) satisfy<strong>in</strong>g<br />

some conditions which def<strong>in</strong>e the solution set.<br />

Typical examples of variational problems are:<br />

Example 11 (Geodesic problem). F<strong>in</strong>d the path of m<strong>in</strong>imum length jo<strong>in</strong><strong>in</strong>g two<br />

po<strong>in</strong>ts p 1 <strong>and</strong> p 2 of a given surface.<br />

Example 12 (M<strong>in</strong>imal surface problem). F<strong>in</strong>d the surface of m<strong>in</strong>imum area for a<br />

given boundary curve.<br />

Variational problems are studied <strong>in</strong> more detaill <strong>in</strong> Chapter 4.<br />

2.6 Other classifications<br />

Other forms of classify<strong>in</strong>g optimization problems are based on characteristics<br />

which can be exploited <strong>in</strong> order to devise strategies for the solution.


28 CHAPTER 2. OPTIMIZATION: AN OVERVIEW<br />

2.6.1 Classification based on constra<strong>in</strong>ts<br />

In many cases, the solution set S is specified by describ<strong>in</strong>g constra<strong>in</strong>ts that must<br />

be satisfyed by its elements. A very common way to def<strong>in</strong>e constra<strong>in</strong>ts consists<br />

<strong>in</strong> us<strong>in</strong>g equalities <strong>and</strong> <strong>in</strong>equalities. The classification based on constra<strong>in</strong>ts takes<br />

<strong>in</strong>to account the nature of the constra<strong>in</strong>ts as well as the properties of the functions<br />

that describe them.<br />

With respect to the nature of the constra<strong>in</strong>t functions we obta<strong>in</strong> the follow<strong>in</strong>g<br />

classification:<br />

• equality constra<strong>in</strong>ts: h i (x) = 0, i = 1, . . . , m;<br />

• <strong>in</strong>equality constra<strong>in</strong>ts: g j (x) ≤ 0.<br />

In the first case the solution set S consists of the po<strong>in</strong>ts that satisfy simultaneously<br />

the equations h i (x) = 0. We should remark that <strong>in</strong> general the equation<br />

h i (x) = 0 def<strong>in</strong>es an m-dimensional surface of R n , therefore the simultaneous set<br />

of equations h i (x) = 0, i = 1, . . . , m def<strong>in</strong>e the <strong>in</strong>tersection of m surfaces.<br />

In the second case, each <strong>in</strong>equality g j (x) ≤ 0 <strong>in</strong> general represents a surface<br />

of R n with boundary (the boundary is given by the equality g j (x) = 0). Thus<br />

the solution set is the <strong>in</strong>tersection of a f<strong>in</strong>ite number of surfaces with boundary, <strong>in</strong><br />

general, this <strong>in</strong>tersection is a region of R n (see Figure 2.1(c)). A particular case<br />

of great importance occurs when the functions g j are l<strong>in</strong>ear. Then the region is a<br />

solid polyhedron of R n .<br />

As we will see later on (Chapter 5), equality <strong>and</strong> <strong>in</strong>equality constra<strong>in</strong>ts lead us<br />

to different algorithmic strategies when look<strong>in</strong>g for a solution.<br />

The algorithms are also affected by the properties of the functions that def<strong>in</strong>e<br />

the constra<strong>in</strong>ts. Among the particular <strong>and</strong> relevant special cases that can be exploited<br />

are the ones where the constra<strong>in</strong>t functions are l<strong>in</strong>ear, quadratic, convex<br />

(or concave), or sparse.<br />

2.6.2 Classification based on the objective function<br />

The properties of the objective function are also fundamental <strong>in</strong> order to devise<br />

strategies for the solution of optimization problems. The special cases that lead<br />

to particular solution strategies are the ones where the objective function isl<strong>in</strong>ear,<br />

quadradic, convex (or concave), sparse or separable.


2.7. THE PROBLEM OF POSING PROBLEMS 29<br />

L<strong>in</strong>ear programs<br />

A very important situation occurs when both the objective <strong>and</strong> the constra<strong>in</strong>t functions<br />

are l<strong>in</strong>ear <strong>and</strong>, moreover, the constra<strong>in</strong>ts are def<strong>in</strong>ed either by equalities or<br />

<strong>in</strong>equalities, that is, the constra<strong>in</strong>ts are given by l<strong>in</strong>ear equations <strong>and</strong> l<strong>in</strong>ear <strong>in</strong>equalities.<br />

As we have remarked before, the solution set is a solid polyhedron<br />

of the euclidean space. <strong>Optimization</strong> problems of this nature are called l<strong>in</strong>ear<br />

programs <strong>and</strong> their study constitute a sub-area of optimization. In fact, several<br />

practical problems can be posed as l<strong>in</strong>ear programs, <strong>and</strong> there exists a huge number<br />

of techniques that solve them, exploit<strong>in</strong>g its characteristics: the l<strong>in</strong>earity of<br />

the objective function <strong>and</strong> the piecewise l<strong>in</strong>ear structure of the solution set (polyhedra).<br />

2.7 The problem of pos<strong>in</strong>g problems<br />

Most of the problems <strong>in</strong> general, <strong>and</strong> <strong>in</strong> particular the problems from computer<br />

graphics, can be posed us<strong>in</strong>g the concept of operator between two spaces. 1 These<br />

problems can be classified <strong>in</strong> two categories: direct <strong>and</strong> <strong>in</strong>verse problems.<br />

Direct problem. Given two spaces O 1 <strong>and</strong> O 2 of graphical objects, an operator<br />

T : O 1 → O 2 , <strong>and</strong> a graphical object x ∈ O 1 . Problem: compute the object<br />

y = T (x) ∈ O 2 . That is, we are given the operator T <strong>and</strong> a po<strong>in</strong>t x <strong>in</strong> its doma<strong>in</strong>,<br />

<strong>and</strong> we must compute the image y = T (x)<br />

Because T is an operator, it is not multi-valued, therefore the direct problem<br />

always has a unique solution.<br />

Inverse problems.<br />

There are two types of <strong>in</strong>verse problems:<br />

Inverse problem of first k<strong>in</strong>d. Given two spaces O 1 e O 2 of graphic objects,<br />

an operator T : O 1 → O 2 , <strong>and</strong> an element y ∈ O 2 , determ<strong>in</strong>e <strong>and</strong> object x ∈ O 1<br />

tal que T (x) = y. That is, we are given an operator T <strong>and</strong> a po<strong>in</strong>t y, belong<strong>in</strong>g to<br />

1 In some books the use of the term operator implies that it is l<strong>in</strong>ear. Here we use the term to<br />

mean a cont<strong>in</strong>uous function.


30 CHAPTER 2. OPTIMIZATION: AN OVERVIEW<br />

the arrival set of the operator, <strong>and</strong> we must compute a po<strong>in</strong>t x <strong>in</strong> its doma<strong>in</strong> whose<br />

image is y.<br />

Inverse problem of second k<strong>in</strong>d. Given a f<strong>in</strong>ite sequence of elements<br />

x 1 , x 2 , . . . , x n ∈ O 1 <strong>and</strong> a f<strong>in</strong>ite sequence y 1 , y 2 , . . . , y n ∈ O 2 , determ<strong>in</strong>e an operator<br />

T : O 1 → O 2 such that T (x i ) = y i , i = 1, 2, . . . , n. That is, we are given<br />

a f<strong>in</strong>ite sequence (x i ) of po<strong>in</strong>ts <strong>in</strong> the doma<strong>in</strong> <strong>and</strong> its image (y i ) = (T (x i )), we<br />

must compute the operator T .<br />

If the operator T is <strong>in</strong>vertible with <strong>in</strong>verse T −1 , the <strong>in</strong>verse problem of first<br />

k<strong>in</strong>d admits a unique solution x = T −1 (y). However <strong>in</strong>vertibility is not necessary.<br />

In fact a weaker condition occurs when T has a left <strong>in</strong>verse T + (that is<br />

T + T =Identity), <strong>and</strong> the solution is given by T + (y). In this case we cannot guarantee<br />

uniqueness of the solution because we do not have uniqueness of the left<br />

<strong>in</strong>verse.<br />

The <strong>in</strong>verse problem of second type has two <strong>in</strong>terest<strong>in</strong>g <strong>in</strong>terpretations:<br />

1. We are given an operator ¯T : {x 1 , . . . , x n } → {y 1 , . . . , y n } <strong>and</strong> we must<br />

compute an extension T of ¯T to the entire space O 1 . Clearly, <strong>in</strong> general this<br />

solution, if it exists is not unique.<br />

2. The solution operator T def<strong>in</strong>es an <strong>in</strong>terpolation from the samples<br />

(x i , T (x i )), i = 1, . . . , n.<br />

From the above <strong>in</strong>terpretations it is easy to conclude that, <strong>in</strong> general, the solution<br />

for any type of <strong>in</strong>verse problems is not unique.<br />

2.7.1 Well-posed problems<br />

Hadamard 2 has established the concept of well-posed problem as a problem <strong>in</strong><br />

which the follow<strong>in</strong>g conditions are satisfied:<br />

1. There is a solution;<br />

2. The solution is unique;<br />

2 Jacques Hadamard (1865-1963), French mathematician.


2.7. THE PROBLEM OF POSING PROBLEMS 31<br />

3. The solution depends cont<strong>in</strong>uously on the <strong>in</strong>itial conditions.<br />

When at least one of the above conditions is not satisfied, we say that the<br />

problem is ill-posed.<br />

We have seen previously that <strong>in</strong> general <strong>in</strong>verse problems of the first <strong>and</strong> second<br />

k<strong>in</strong>d are ill-posed <strong>in</strong> the sense of Hadamard because of the non-uniqueness of the<br />

solution,<br />

The concept of ill-posed problem needs to be further <strong>in</strong>vestigated very carefully.<br />

The cont<strong>in</strong>uity condition is important because it guarantees that small variations<br />

of the <strong>in</strong>itial conditions will cause only small perturbations of the solution.<br />

Recall that, <strong>in</strong> practical problems, small variations <strong>in</strong> the <strong>in</strong>itial conditions are<br />

quite common (<strong>and</strong> sometimes, <strong>in</strong>evitable) due to measurement imprecision or<br />

numerical errors. The non-uniqueness of solutions, on the other h<strong>and</strong>, should not<br />

be necessarily an obstacle. An example will make this po<strong>in</strong>t clear.<br />

Example 13. Consider the operator F : R 2 → R def<strong>in</strong>ed by F (x, y) = x 2 +y 2 −1<br />

<strong>and</strong> the <strong>in</strong>verse problem of the first k<strong>in</strong>d F (x, y) = 0. That is, we must solve the<br />

quadratic equation x 2 + y 2 − 1 = 0. It is clear that it admits an <strong>in</strong>f<strong>in</strong>ite number of<br />

solutions, therefore it is ill-posed <strong>in</strong> the sense of Hadamard. Geometrically, these<br />

solutions constitute the unit circle S 1 of the plane. The non-uniqueness of the<br />

solutions is very important <strong>in</strong> the representation of the circle. In fact, <strong>in</strong> order to<br />

obta<strong>in</strong> a sampled representation of the circle we need to determ<strong>in</strong>e a f<strong>in</strong>ite subset<br />

of these solutions. Figure 2.3 shows the use of seven solutions of the equation that<br />

allows us to obta<strong>in</strong> an approximate reconstruction of the circle by an heptagon.<br />

Figure 2.3: Polygonal representation of a circle.


32 CHAPTER 2. OPTIMIZATION: AN OVERVIEW<br />

The use of the term ill-posed to designate the class of problems that do not<br />

satisfy one of the three Hadamard conditions may <strong>in</strong>duce the reader to conclude<br />

that ill-posed problems cannot be solved, but this is far from the truth. Interpret<strong>in</strong>g<br />

ill-posed problems as a category of <strong>in</strong>tractable problems, the reader would be<br />

discouraged to study computer graphics, an area <strong>in</strong> which there is a large number<br />

of ill-posed problems, as we will see <strong>in</strong> this chapter.<br />

In fact, ill-posed problems constitute a fertile ground for the use of optimization<br />

methods. When a problem presents an <strong>in</strong>f<strong>in</strong>ite number of solutions <strong>and</strong> we wish<br />

to have unicity, optimization methods make possible to reformulate the problem,<br />

so that a unique solution can be obta<strong>in</strong>ed. The general pr<strong>in</strong>ciple is to def<strong>in</strong>e an<br />

objective function such that an “optimal solution” can be chosen among the set<br />

of possible solutions. In this way, when we do not have a unique solution to a<br />

problem we can have the “best” solution, (which is unique).<br />

It is <strong>in</strong>terest<strong>in</strong>g to note that, sometimes, well-posed problems may not present<br />

a solution due to the presence of noise or numerical errors. The example below<br />

shows one such case.<br />

Example 14. Consider the solution of the l<strong>in</strong>ear system<br />

x + 3y = a<br />

2x + 6y = 2a,<br />

where a ∈ R. In other words, we want to solve the <strong>in</strong>verse problem T X = A,<br />

where T is the l<strong>in</strong>ear transformation given by the matrix<br />

( ) 1 3<br />

,<br />

2 6<br />

X = (x, y) <strong>and</strong> A = (a, 2a). It is easy to see that the image of R 2 by T is the l<strong>in</strong>e<br />

r given by the equation 2x − y = 0. Because the po<strong>in</strong>t A belongs to the l<strong>in</strong>e r, the<br />

problem has a solution (<strong>in</strong> fact an <strong>in</strong>f<strong>in</strong>ite number of solutions).<br />

However, a small perturbation <strong>in</strong> the value of a can move the po<strong>in</strong>t A out of the<br />

l<strong>in</strong>e r, <strong>and</strong> the problem fails to have a solution.<br />

To avoid this situation, a more appropriate way to pose the problem employs a<br />

formulation based on optimization: determ<strong>in</strong>e the element P ∈ R 2 such that the<br />

distance from T (P ) to the po<strong>in</strong>t A is m<strong>in</strong>imal. This form of the problem always


2.8. HOW TO SOLVE IT? 33<br />

has a solution. Moreover, this solution will be close to the po<strong>in</strong>t A. Geometrically,<br />

the solution is the po<strong>in</strong>t P , such that its image T (P ) is the closest po<strong>in</strong>t of the l<strong>in</strong>e<br />

r to the po<strong>in</strong>t A (see Figure 2.4).<br />

A<br />

T (P )<br />

Figure 2.4: Well-posed formulation of a problem.<br />

2.8 How to solve it?<br />

It is the purpose of this book to use optimization techniques to solve computer<br />

graphics problems.<br />

Later on we will present several examples of computer graphics problems that<br />

can be posed, <strong>in</strong> a natural way, as optimization problems. Nevertheless we should<br />

rem<strong>in</strong>d that pos<strong>in</strong>g a problem as an optimization problem does not mean that it is<br />

automatically solved, at least <strong>in</strong> the sense of obta<strong>in</strong><strong>in</strong>g an exact solution.<br />

In fact optimization problems are, <strong>in</strong> general, very difficult to solve. Some of<br />

the difficulties are discussed below.


34 CHAPTER 2. OPTIMIZATION: AN OVERVIEW<br />

2.8.1 Global versus local solutions.<br />

In cont<strong>in</strong>uous optimization problems, most of the optimality conditions are related<br />

with local optimality problems. The computation of global optimality solutions is,<br />

<strong>in</strong> general, very expensive from the computational po<strong>in</strong>t of view. Thus, <strong>in</strong> practice,<br />

we have to settle for the best local optimal solution found by the algorithm.<br />

2.8.2 NP-complete problems<br />

.<br />

There are comb<strong>in</strong>atorial problems – for example, the so-called NP-complete<br />

problems – for which there are no efficient methods (mean<strong>in</strong>g runn<strong>in</strong>g <strong>in</strong> polynomial<br />

time on the size of the <strong>in</strong>stance to be solved) that provide exact solutions<br />

<strong>in</strong> all cases. The Travel<strong>in</strong>g Salesman Problem, mentioned before, is <strong>in</strong> that class.<br />

These problems are related <strong>in</strong> such a way that if a algorithm with polynomiallybounded<br />

runn<strong>in</strong>g time is found for one of them, then polynomial time algorithms<br />

will be available for all of them. This makes it unlikely that such an algorithm will<br />

ever be found. When solv<strong>in</strong>g such problems, many times we have to use heuristic<br />

solution processes, that provide ”good” (but not optimal) solutions.<br />

2.9 Comments <strong>and</strong> references<br />

The bibliography about optimization is huge <strong>and</strong> it is not our <strong>in</strong>tent <strong>in</strong> this book to<br />

go over a complete review of the area. In this section we provide some suggestions<br />

of textbooks that cover, with a lot of more details, the topics we have discussed <strong>in</strong><br />

this chapter.<br />

(Luenberger, 1984) provides a broad overview of cont<strong>in</strong>uous optimization,<br />

cover<strong>in</strong>g both l<strong>in</strong>ear <strong>and</strong> non-l<strong>in</strong>ear problems. For discrete <strong>and</strong> comb<strong>in</strong>atorial optimization<br />

we suggest (Papadimitriou & Steiglitz, 1982). (We<strong>in</strong>stock, 1974) gives<br />

a nice <strong>in</strong>troduction to variational optimization, with emphasis <strong>in</strong> applications.<br />

There are many books <strong>in</strong> the literature dedicated to l<strong>in</strong>ear programm<strong>in</strong>g <strong>and</strong> its<br />

applications to comb<strong>in</strong>atorial problems (<strong>and</strong> <strong>in</strong> particular fo net flows). (Chvatal,<br />

1983) is a good example. (Fang & Puthenpura, 1993) is a more recent book, which<br />

emphasizes the most recent algorithms of <strong>in</strong>terior po<strong>in</strong>ts for l<strong>in</strong>ear programm<strong>in</strong>g.


2.9. COMMENTS AND REFERENCES 35<br />

On the other h<strong>and</strong>, (Gill et al. , 1981) is mostly concerned to cover non-l<strong>in</strong>ear<br />

optimization emphasiz<strong>in</strong>g the practical aspect of choos<strong>in</strong>g the most appropriate<br />

algorithm for some given optimization problem.


36 CHAPTER 2. OPTIMIZATION: AN OVERVIEW


Chapter 3<br />

<strong>Optimization</strong> <strong>and</strong> computer<br />

graphics<br />

In this chapter we will show the importance of optimization methods <strong>in</strong> computer<br />

graphics. We will pose several problems <strong>in</strong> the area, whose solutions use<br />

optimization techniques. The problems we will pose here will be solved <strong>in</strong> the<br />

follow<strong>in</strong>g chapters with the appropriate optimization technique: cont<strong>in</strong>uous, discrete,<br />

comb<strong>in</strong>atorial or variational.<br />

3.1 A unified view of computer graphics problems<br />

Computer graphics problems are, <strong>in</strong> general, posed by consider<strong>in</strong>g two spaces O 1<br />

<strong>and</strong> O 2 of graphical objects, <strong>and</strong> relations between them. 1 In several practical<br />

situations, the spaces O 1 <strong>and</strong> O 2 are topological spaces <strong>and</strong> the relation is an<br />

operator (i.e. a cont<strong>in</strong>uous function) from O 1 to O 2 . An even more well behaved<br />

condition occurs when O 1 <strong>and</strong> O 2 are vector spaces <strong>and</strong> the operator is l<strong>in</strong>ear.<br />

We will consider the <strong>in</strong>termediate case when we have two topological spaces<br />

of graphical objects O 1 <strong>and</strong> O 2 , <strong>and</strong> an operator R: O 1 → O 2 from O 1 to O 2 .<br />

From this we have the follow<strong>in</strong>g generic problems:<br />

• For a given graphical object x ∈ O 1 , compute T (x) (direct problem);<br />

1 A relation is a subset of the cartesian product O 1 × O 2 .<br />

37


38 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

• Given y ∈ O 2 compute the graphical object T −1 (y) (<strong>in</strong>verse problem);<br />

• Given graphical objects x, y, with x ∈ O 1 , <strong>and</strong> y ∈ O 2 , compute the operator<br />

T such that T −1 (y) (<strong>in</strong>verse problem).<br />

In this chapter we will study several particular cases of the above problems.<br />

3.1.1 The classification problem<br />

Classification problems, also called discrim<strong>in</strong>ation problems allow the recognition<br />

of graphical objects based on the identification of the class to which they belong.<br />

In this section we give a generic description of the classification problem <strong>and</strong><br />

will demonstrate how it leads to optimization methods <strong>in</strong> a natural manner.<br />

A relation ≡ def<strong>in</strong>ed on a set O of graphical objects is an equivalence relation<br />

if it satisfyes the follow<strong>in</strong>g conditions:<br />

1. O α ≡ O α (reflexivity);<br />

2. O α ≡ O β ⇒ O β ≡ O α (symmetry);<br />

3. O α ≡ O β <strong>and</strong> O β ≡ O θ ⇒ O α ≡ O θ (transitivity).<br />

The equivalence class [x] of an element x ∈ O, is the set [x] = {y ∈ O; y ≡<br />

x}. From the properties of the equivalence relation it follows that two equivalence<br />

classes co<strong>in</strong>cide or are disjo<strong>in</strong>t. Moreover, the union of all equivalence classes is<br />

the set O. Thus, the equivalence classes constitute a partition of the space O of<br />

graphical objects.<br />

Thus, whenever we have an equivalence relation <strong>in</strong> a set O of graphical objects,<br />

we obta<strong>in</strong> a classification technique, which consists <strong>in</strong> obta<strong>in</strong><strong>in</strong>g the correspond<strong>in</strong>g<br />

partition of the set O .<br />

How to obta<strong>in</strong> a classification on a set O of graphical objects? A powerful <strong>and</strong><br />

widely used method consists <strong>in</strong> us<strong>in</strong>g a discrim<strong>in</strong>at<strong>in</strong>g function, which is a real<br />

function F : O → U ⊂ R. The equivalence relation ≡ <strong>in</strong> O is def<strong>in</strong>ed by<br />

x ≡ y if, an only if, F (x) = F (y).<br />

It is easy to prove that ≡ is <strong>in</strong>deed an equivalence relation. For each element<br />

u ∈ U, the <strong>in</strong>verse image F −1 (u) = {x ∈ O; F (x) = u} def<strong>in</strong>es an equivalence<br />

class of the relation.


3.2. GEOMETRIC MODELING 39<br />

When we have a classification of a set O of graphical objects, a natural problem<br />

consists <strong>in</strong> choos<strong>in</strong>g an optimal representative, u <strong>in</strong> each equivalence class<br />

F −1 (u). For this, we def<strong>in</strong>e a distance function d: O × O → R on the space O. 2<br />

Then, we choose the best representative element u of the class [u] as the element<br />

that m<strong>in</strong>imizes the function ∑<br />

d(o, u).<br />

o∈[u]<br />

An <strong>in</strong>terest<strong>in</strong>g case occurs when the set O of graphical objects is f<strong>in</strong>ite, O =<br />

{o 1 , . . . , o m }. A classification amounts to divid<strong>in</strong>g O <strong>in</strong> a partition with n sets.<br />

[u 1 ], . . . , [u n ], n < m (<strong>in</strong> general n much smaller than m). This problem can be<br />

naturally posed as an optimization problem: What is the optimal partition of the<br />

space O?<br />

In some classification problems it is possible to def<strong>in</strong>e a probability distribution<br />

p on the set O of graphical objects. In this case, a more suitable measure for an<br />

optimal partition would be<br />

m∑ ∑<br />

E = p(o)d(o, u).<br />

i=1 o∈[u]<br />

The function E depends on the partition <strong>in</strong> classes <strong>and</strong> the choice of representative<br />

for each class. The m<strong>in</strong>imization of the function E over all possible partitions of<br />

the set O is a problem of comb<strong>in</strong>atorial optimization. This problem is known <strong>in</strong><br />

the literature as cluster analysis.<br />

Example 15 (Database queries). In the area of multimedia databases (image<br />

databases, for example) the classification methods play a fundamental role <strong>in</strong> the<br />

problem of <strong>in</strong>dexation <strong>and</strong> query of objects <strong>in</strong> the database.<br />

3.2 Geometric Model<strong>in</strong>g<br />

Large part of the problems <strong>in</strong> model<strong>in</strong>g are related to the description, representation,<br />

<strong>and</strong> reconstruction of geometric models. As we have already mentioned,<br />

there are essentially three methods to describe a geometric shape:<br />

2 A distance function differs from a metric because it does not necessarily satisfy the triangle<br />

<strong>in</strong>equality.


40 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

1. implicit description;<br />

2. parametric description;<br />

3. piecewise description (implicit or parametric).<br />

The description is usually given <strong>in</strong> functional form, either determ<strong>in</strong>istic or<br />

probabilistic.<br />

A model is called procedural when its describ<strong>in</strong>g function is an algorithm over<br />

some virtual mach<strong>in</strong>e.<br />

In general, the geometry of a model is specified by the user through a f<strong>in</strong>ite<br />

number of parameters; based on this specification the model is reconstructed.<br />

Some common cases are:<br />

• reconstruction of a curve or surface from a set of po<strong>in</strong>ts. In this case the<br />

reconstruction employs some <strong>in</strong>terpolation method for scattered data;<br />

• reconstruction of surfaces based on a f<strong>in</strong>ite set of curves.<br />

It is clear that the two problems above lack uniqueness of solution. Additional<br />

conditions are required for unicity. These conditions can be obta<strong>in</strong>ed by algebraic<br />

methods or us<strong>in</strong>g optimization. In the first case, we may impose that the curve<br />

or surface belong to some specific class, for example, def<strong>in</strong>ed by a polynomial of<br />

degree 3. In the second case, we may look for a curve or surface that m<strong>in</strong>imize<br />

some given objective function.<br />

3.2.1 Model reconstruction<br />

This is a very important problem <strong>in</strong> geometric model<strong>in</strong>g. In general, a model is<br />

specifyed by a f<strong>in</strong>ite number of parameters, that represent its geometry, topology<br />

<strong>and</strong> attributes.<br />

An important task consists <strong>in</strong> reconstruct<strong>in</strong>g the model from these f<strong>in</strong>ite number<br />

of parameters.<br />

- the user specifies a f<strong>in</strong>ite number of parameters us<strong>in</strong>g some <strong>in</strong>put device<br />

- we obta<strong>in</strong> samples from a terra<strong>in</strong> data<br />

- samples from a CT or MRI.


3.3. VISUALIZATION 41<br />

As particular cases of this model we have the problems of curve model<strong>in</strong>g <strong>and</strong><br />

surface model<strong>in</strong>g. The methods to create models us<strong>in</strong>g optimization are known<br />

<strong>in</strong> the literature by the name of variational model<strong>in</strong>g.<br />

3.3 Visualization<br />

The visualization process consists <strong>in</strong> the generation of images from geometric<br />

models. Formally, we def<strong>in</strong>e a render<strong>in</strong>g operator which associates to each po<strong>in</strong>t<br />

<strong>in</strong> the ambient space to a correspond<strong>in</strong>g po<strong>in</strong>t <strong>in</strong> the image with its respective<br />

color.<br />

The render<strong>in</strong>g operator, R : O 1 → O 2 , maps graphical objetcs from the space<br />

of scene O 1 to graphical objects on the space of the visaualization device O 2 .<br />

Usually, O 1 is the space of three dimensional objects <strong>and</strong> O 2 is the space of two<br />

dimensional regions of the screen.<br />

It is possible to formalize the complete render<strong>in</strong>g pipel<strong>in</strong>e <strong>in</strong> terms of operators<br />

on graphical objects. Jonh Hart has used this methodology to analyze the<br />

hardware rtender<strong>in</strong>g pipel<strong>in</strong>e on OpenGl accelerated graphics cards (Hart et al. ,<br />

2002).<br />

However, the solution of this direct problem is not simple, because the render<strong>in</strong>g<br />

operator depends on many factors: i.e. light sources, camera transformation,<br />

geometry <strong>and</strong> material properties of the objects of the scene, <strong>and</strong> ultimately of the<br />

propagation of radiant energy through the scene reach<strong>in</strong>g the image plane. Depend<strong>in</strong>g<br />

on the type of illum<strong>in</strong>ation model adopted, the render<strong>in</strong>g computation<br />

can be a direct or <strong>in</strong>verse problem.<br />

There are many optimization problems that arise from the <strong>in</strong>verse computation<br />

of the render<strong>in</strong>g operator. For example, we may wish to determ<strong>in</strong>e where to position<br />

light sources <strong>in</strong> a scene such that the shadows they cast <strong>in</strong>to the objects have<br />

a certa<strong>in</strong> effect.<br />

3.4 Computer <strong>Vision</strong><br />

In this area we are look<strong>in</strong>g for physical, geometrical or topological <strong>in</strong>formation<br />

about the objects of a scene which is depicted <strong>in</strong> an image. In a certa<strong>in</strong> way,


42 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

the image is a representation of the scene, <strong>and</strong> the vision problem consists <strong>in</strong> the<br />

reconstruction of the three-dimensional scene from its two-dimensional representation.<br />

In the visualization process that generates an image there is a great loss<br />

of <strong>in</strong>formation implied by the projection that changes the dimensionality of the<br />

representation. The image, therefore is an extremely ambiguous representation.<br />

As described above, computer vision is clearly an <strong>in</strong>verse problem.<br />

Even the reconstruction of the scene by our sophisticated visual system may<br />

present ambiguities. Two classic examples of such semantic ambiguity are shown<br />

<strong>in</strong> Figure 3.1. In (a) we can see an image which can be <strong>in</strong>terpreted either as the<br />

face of an elderly woman or of a young lady (the nose of the old woman is the ch<strong>in</strong><br />

of the lady). In (b) we can <strong>in</strong>terpret the figure either as the silhouette of two faces<br />

over a white background or as a white glass over a black background. Note that<br />

the ambiguity of the second image is related with the problem of dist<strong>in</strong>guish<strong>in</strong>g<br />

between foreground <strong>and</strong> background <strong>in</strong> the reconstruction.<br />

(a)<br />

(b)<br />

Figure 3.1: Ambiguous reconstructions.


3.5. THE VIRTUAL CAMERA 43<br />

The complexity of reconstruction <strong>in</strong> computer vision orig<strong>in</strong>ates different versions<br />

of the reconstruction problem. Two important cases of the problem are:<br />

• Shape from shad<strong>in</strong>g: Obta<strong>in</strong> the geometry of the objects based on <strong>in</strong>formation<br />

about the radiance of the pixels <strong>in</strong> the image.<br />

• Shape from texture: Obta<strong>in</strong> <strong>in</strong>formation about the geometry of the scene<br />

based on a segmentation <strong>and</strong> analysis of the textures <strong>in</strong> the image.<br />

3.5 The Virtual Camera<br />

One of the important elements of the visualization process is the virtual camera.<br />

The simplest model of a virtual camera is the p<strong>in</strong>hole camera, that has seven<br />

degrees of freedom represent<strong>in</strong>g the parameters:<br />

• position (3 degrees of freedom);<br />

• orientation (3 degrees of freedom);<br />

• focal distance (1 degree of freedom).<br />

These parameters determ<strong>in</strong>e a projective transformation T which associates to<br />

each po<strong>in</strong>t p ∈ R 3 a po<strong>in</strong>t <strong>in</strong> the image plane (for more details, see (Gomes &<br />

<strong>Velho</strong>, 1998)). In other words, given a po<strong>in</strong>t p ∈ R 3 , the virtual camera transformation<br />

generates the po<strong>in</strong>t T (p) = p ′ <strong>in</strong> the image.<br />

We have two <strong>in</strong>verse problems related <strong>in</strong> a natural way with the virtual camera:<br />

1. Given the po<strong>in</strong>t p ′ <strong>in</strong> the image, determ<strong>in</strong>e the po<strong>in</strong>t p of the scene, assum<strong>in</strong>g<br />

that the camera parameters are known.<br />

2. Given po<strong>in</strong>t p <strong>and</strong> p ′ , determ<strong>in</strong>e the camera transformation T such that<br />

T (p) = p ′ .<br />

In the first problem, we could have many po<strong>in</strong>ts <strong>in</strong> the scene which correspond<br />

to the po<strong>in</strong>t p ′ <strong>in</strong> the image (i.e. 3D po<strong>in</strong>ts that are projected <strong>in</strong> the same 2D image<br />

po<strong>in</strong>t). In the second case, it is clear that the knowledge of just one 3D po<strong>in</strong>t <strong>and</strong><br />

its 2D projection is not sufficient to determ<strong>in</strong>e the seven camera parameters.


44 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

One important problem is the application of (2) above to the case of images<br />

produced by real cameras. In other words, given an image acquired by some <strong>in</strong>put<br />

device (video camera or photographic camera), determ<strong>in</strong>e the camera parameters<br />

(position, orientation, focal length, etc.). This problem is known as camera calibration.<br />

Camera calibration has a large number of applications. Among them we could<br />

mention:<br />

• In virtual reality applications, it is often necessary to comb<strong>in</strong>e virtual images<br />

with real ones. In this case, we need to synchronize the parameters of the<br />

virtual camera with those of the real camera <strong>in</strong> order to obta<strong>in</strong> a correct<br />

alignment of the elements <strong>in</strong> the image composition.<br />

• TV broadcast of soccer games often uses 3D reconstruction of a given play<br />

to help spectators better underst<strong>and</strong> what has happened. In order to do this,<br />

it is necessary to know the camera parameters. More <strong>in</strong>formation about this<br />

type of application can be obta<strong>in</strong>ed <strong>in</strong> the website juiz virtual,<br />

http://www.visgraf.impa.br/juizvirtual/.<br />

3.5.1 Camera Specification<br />

An important problem <strong>in</strong> graphics is the camera specification. In the direct specification,<br />

the user provides the seven camera parameter that are required to def<strong>in</strong>e the<br />

transformation T . Then, we have the direct problem of comput<strong>in</strong>g image po<strong>in</strong>ts<br />

p ′ = T (p).<br />

A related <strong>in</strong>verse problem arises <strong>in</strong> the study of user <strong>in</strong>terfaces for camera specification.<br />

In general, as we have seen, the user specifies the transformation T<br />

though the seven camera parameters, lead<strong>in</strong>g to an easy to solve direct problem.<br />

However, this <strong>in</strong>terface is not appropriate when the user wants to obta<strong>in</strong> specific<br />

views of the scene (for example, keep<strong>in</strong>g the focus of attention on a certa<strong>in</strong> object).<br />

A suitable <strong>in</strong>terface should allow the user to frame an object directly on the<br />

image. Such a technique is called <strong>in</strong>verse camera specification.<br />

In the <strong>in</strong>verse specification, the seven degrees of freedom of the camera are<br />

specified <strong>in</strong>directly by the user. The user wishes to obta<strong>in</strong> a certa<strong>in</strong> fram<strong>in</strong>g of the<br />

scene <strong>and</strong> the specification must produce the desired result.


3.5. THE VIRTUAL CAMERA 45<br />

The idea of the <strong>in</strong>verse specification is to allow the def<strong>in</strong>ition of camera parameters<br />

<strong>in</strong>spired on the “camera man paradigm”: the user (<strong>in</strong> the role of a film<br />

director) describes the desired view directly on the image, <strong>and</strong> the system adjusts<br />

the camera parameters to obta<strong>in</strong> that view.<br />

The <strong>in</strong>verse camera specification facilitates the view description by the user at<br />

the cost of hav<strong>in</strong>g to solve an ill-posed mathematical problem. Let’s see how it<br />

works us<strong>in</strong>g a concrete example.<br />

Example 16. Fram<strong>in</strong>g a po<strong>in</strong>t <strong>in</strong> the image. Consider the situation illustrated <strong>in</strong><br />

Figure 3.2. Po<strong>in</strong>t P <strong>in</strong> space is fixed <strong>and</strong> the observer is located at po<strong>in</strong>t O, this<br />

po<strong>in</strong>t is projected <strong>in</strong> po<strong>in</strong>t A on the screen. The user requests a new view, subject<br />

to the constra<strong>in</strong>t that po<strong>in</strong>t A is at the center of the image. Therefore, we need<br />

to obta<strong>in</strong> camera parameters, such that po<strong>in</strong>t P will now be projected <strong>in</strong> po<strong>in</strong>t B<br />

which is located at the center of the screen<br />

Figure 3.2 shows a new position O ′ , <strong>and</strong> a new orientation of the camera that<br />

solves the problem. Observe that this camera sett<strong>in</strong>g is not the unique solution to<br />

the problem.<br />

P<br />

A<br />

B<br />

A<br />

O<br />

`<br />

O<br />

Figure 3.2: Inverse specification to frame a po<strong>in</strong>t <strong>in</strong> the image (Gomes & <strong>Velho</strong>,<br />

1998).<br />

From the mathematical po<strong>in</strong>t of view we have a transformation T : R 7 → R 2 ,<br />

where y = T (x) is the projection of the parametrization space of the camera on


46 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

the euclidean plane. The solution to our problem is given by the <strong>in</strong>verse T −1 (B).<br />

However, the transformation T is not l<strong>in</strong>ear <strong>and</strong>, <strong>in</strong> general, it does not have an<br />

<strong>in</strong>verse (why?). Therefore, the solution has to be computed us<strong>in</strong>g optimization<br />

techniques. That is, we need to look for the “best” solution <strong>in</strong> some sense.<br />

3.6 Image Process<strong>in</strong>g<br />

Image process<strong>in</strong>g studies different operations with images. These operations are<br />

characterized by operators <strong>in</strong> spaces of images.<br />

Among the many problems that need to be solved <strong>in</strong> image process<strong>in</strong>g, we<br />

can highlight the classification problem. Two important cases of classifications<br />

problems are: image segmentation <strong>and</strong> image quantization. In the first case, classification<br />

is posed on the geometric support of the image. While <strong>in</strong> the second<br />

case, classification is done <strong>in</strong> the range of the image function, i.e., the color space.<br />

3.6.1 Warp<strong>in</strong>g <strong>and</strong> morph<strong>in</strong>g<br />

An important operation consists <strong>in</strong> the warp<strong>in</strong>g of the doma<strong>in</strong> U of an image<br />

f : U ⊂ R 2 → R. More specifically, given an image f : U → R, a warp<strong>in</strong>g of U<br />

is a diffeomorphism g : V ⊂ R 2 → U. This diffeomorphism def<strong>in</strong>es a new image<br />

given by h: V → R, h = f ◦ g −1 . An example of such an operation is illustrated<br />

<strong>in</strong> Figure 3.3.<br />

Image warp<strong>in</strong>g has many applications. Among them, we could mention: image<br />

morph<strong>in</strong>g, registration <strong>and</strong> correction.<br />

Image Morph<strong>in</strong>g.<br />

Figure 3.4);<br />

It deforms an image <strong>in</strong>to another <strong>in</strong> a cont<strong>in</strong>uous way (see<br />

Registration of medical images. It is very common <strong>in</strong> diagnosis the comparison<br />

of different medical images. It is also common to comb<strong>in</strong>e images of the<br />

same part of the body obta<strong>in</strong>ed from different sources. In all these cases, the<br />

images need to be perfectly aligned.


3.6. IMAGE PROCESSING 47<br />

f<br />

R<br />

g<br />

g<br />

−1<br />

h = f ◦ g<br />

−1<br />

Figure 3.3: Image deformation.<br />

Figure 3.4: Image metamorphosis.<br />

Correct<strong>in</strong>g image distortions. Many times, the process of captur<strong>in</strong>g or generat<strong>in</strong>g<br />

an image <strong>in</strong>troduces distortions. In such cases, it is necessary to correct the<br />

distortions with a warp<strong>in</strong>g. We employ a st<strong>and</strong>ard pattern to measure distortions<br />

<strong>and</strong> based on those measures we correct other images produced by the process.<br />

Figure 3.5 shows an example of a st<strong>and</strong>ard pattern <strong>and</strong> its deformation due to<br />

some image capture process.<br />

One of the ma<strong>in</strong> problems <strong>in</strong> the area of warp<strong>in</strong>g <strong>and</strong> morph<strong>in</strong>g is the specification<br />

of the warp<strong>in</strong>g operator. There are many techniques for this purpose. A<br />

simple <strong>and</strong> very popular method is the punctual specification, illustrated <strong>in</strong> Fig-


48 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

Figure 3.5: Correct<strong>in</strong>g distortions.<br />

ure 3.6: In order to deform the bird <strong>in</strong>to the dog, we select some po<strong>in</strong>ts <strong>in</strong> the<br />

outl<strong>in</strong>e curve of the bird <strong>and</strong> their correspond<strong>in</strong>g po<strong>in</strong>ts <strong>in</strong> the outl<strong>in</strong>e of the dog.<br />

Through the knowledge of the deformation act<strong>in</strong>g on this set of po<strong>in</strong>ts it is possible<br />

to reconstruct the desired deformation over the whole space.<br />

With punctual specification, the problem of f<strong>in</strong>d<strong>in</strong>g a deformation reduces to<br />

the problem of determ<strong>in</strong><strong>in</strong>g a warp<strong>in</strong>g g : R 2 → R 2 such that<br />

g(p i ) = q i , i = 1, . . . , n,<br />

where p i <strong>and</strong> q i are po<strong>in</strong>ts of R 2 . It is clear that this problem does not admit a<br />

unique solution, <strong>in</strong> general.<br />

Suppose that g(x, y) = (g 1 (x, y), g 2 (x, y)), where g 1 , g 2 : U → R are the components<br />

of g. In this case, the equations g(p i ) = q i can be written <strong>in</strong> the form<br />

g 1 (p i x, p i y) = q i x;<br />

g 2 (p i x , pi y ) = qi y ,<br />

with i = 1, . . . , n. That is, the problem amounts to determ<strong>in</strong><strong>in</strong>g two surfaces<br />

(x, y, g 1 (x, y)),<br />

(x, y, g 2 (x, y)),<br />

based on the knowledge of a f<strong>in</strong>ite number of po<strong>in</strong>ts on each surface. Note that<br />

we encounter this same problem <strong>in</strong> the area of model<strong>in</strong>g.


3.7. IMAGE ANALYSIS 49<br />

Figure 3.6: Punctual specification.<br />

The way that the deformation problem was posed us<strong>in</strong>g po<strong>in</strong>t specification is<br />

too rigid: If we restrict the class of warp<strong>in</strong>gs that are allowed, the problem may<br />

not have a solution. In addition, the above formulation makes the problem very<br />

sensitive to noise. A more convenient way to pose the problem consists <strong>in</strong> forc<strong>in</strong>g<br />

that g(p i ) is close to p i . In this new formulation, the solution is less sensitive<br />

to noise <strong>and</strong> it is possible to restrict the class of deformations <strong>and</strong> still obta<strong>in</strong> a<br />

solution. The new problem is therefore an optimization problem.<br />

Particularly <strong>in</strong>terest<strong>in</strong>g cases of the formulation above consist <strong>in</strong> consider<strong>in</strong>g<br />

space deformations <strong>in</strong>duced by rigid motions, l<strong>in</strong>ear or projective transformations.<br />

A survey discuss<strong>in</strong>g several approaches to solve the problem us<strong>in</strong>g rigid motions<br />

can be seen <strong>in</strong> (Goodrich et al. , 1999). A comprehensive treatment of warp<strong>in</strong>g<br />

<strong>and</strong> morph<strong>in</strong>g of graphical objects can be found <strong>in</strong> (Gomes et al. , 1998).<br />

3.7 Image Analysis<br />

A classical problem <strong>in</strong> the area of image analysis is edge detection. Intuitively,<br />

edges are the boundaries of objects <strong>in</strong> the image. This is shown <strong>in</strong> Figure 3.7,<br />

where the white pixels <strong>in</strong> image (b) <strong>in</strong>dicate the edges of image (a).<br />

The first researcher to call attention to the importance of edge detection <strong>in</strong>


50 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

(a)<br />

(b)<br />

Figure 3.7: Edges of an image.<br />

image analysis was D. Marr 3 . He posed an important problem <strong>in</strong> this area which<br />

became known as the Marr’s Conjecture: an image is completely characterized<br />

by its edges. In order to pose this problem more precisely it is necessary to def<strong>in</strong>e<br />

3 David Marr (1945-1980), English scientist <strong>and</strong> MIT researcher, was one of the pioneers <strong>in</strong> the<br />

area of computer vision.


3.8. ANIMATION AND VIDEO 51<br />

the concept of edge, but we will not do this here. We will resort, <strong>in</strong>stead, to a<br />

high-level discussion of the problem.<br />

Edge detection methods can be divided <strong>in</strong>to two categories:<br />

• Frequency based methods;<br />

• Geometric methods.<br />

Frequency-based methods characterize edges as regions of high frequency of<br />

the image (i.e., regions where the image function exhibit large variations). The<br />

theory of operators over images is used to determ<strong>in</strong>e those regions. The reconstruction<br />

of an image from its edges is therefore related with the <strong>in</strong>vertibility of<br />

these operators. The wavelet transform plays here a significant role.<br />

Geometric methods try to describe edges us<strong>in</strong>g planar curves over the image<br />

support. This approach employs largely optimization methods. A classic example<br />

is given by the snakes, that we will briefly describe next.<br />

Snakes. These curves reduce the edge detection problem to a variational problem<br />

of curves <strong>in</strong> the doma<strong>in</strong> of the image. We use the image function to def<strong>in</strong>e an<br />

external energy <strong>in</strong> the space of curves. This external energy is comb<strong>in</strong>ed with an<br />

<strong>in</strong>ternal energy of stretch<strong>in</strong>g <strong>and</strong> bend<strong>in</strong>g, <strong>in</strong> order to create an energy functional.<br />

The functional is such that m<strong>in</strong>imizer curves will be aligned with the edges of regions<br />

<strong>in</strong> the image. The computation of snakes is usually done us<strong>in</strong>g a relaxation<br />

process start<strong>in</strong>g with a <strong>in</strong>itial curve near the region. Figure 3.8 illustrates edge<br />

detection us<strong>in</strong>g snakes: (a) shows the <strong>in</strong>itial curve; <strong>and</strong> (b) shows the f<strong>in</strong>al curve<br />

fully aligned with the edges of a region.<br />

3.8 Animation <strong>and</strong> Video<br />

The area of animation is concerned with the problem of time-vary<strong>in</strong>g graphical<br />

objects. There are important three problems <strong>in</strong> animation:<br />

• Motion synthesis;<br />

• Motion analysis;


52 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS<br />

(a)<br />

(b)<br />

Figure 3.8: Edge detection us<strong>in</strong>g snakes: (a) <strong>in</strong>itial curve; (b) f<strong>in</strong>al edge curve<br />

(P. Brigger & Unser, 1998).<br />

• Motion process<strong>in</strong>g.<br />

In general, motion synthesis is a direct problem <strong>and</strong> motion analysis leads to<br />

various types of <strong>in</strong>verse problems.<br />

Motion synthesis is closely related with the are of motion control <strong>and</strong> employs<br />

extensively optimization methods (e.g. optimal control).<br />

Motion visualization, or animation, is done through a sequence of images,<br />

which constitute essentially a temporal sampl<strong>in</strong>g of the motion of the objects <strong>in</strong> a<br />

scene. Such an image sequence is also know as digital video. Note that this opens<br />

up a perspective for new problems: analysis <strong>and</strong> process<strong>in</strong>g of video.<br />

A particularly <strong>in</strong>terest<strong>in</strong>g problem consists <strong>in</strong> recover<strong>in</strong>g the motion of 3D objects<br />

<strong>in</strong> the scene based on the analysis of their 2D motion <strong>in</strong> the video, which<br />

is known as motion capture. This is another example example of a highly illposed<br />

problem <strong>in</strong> the sense of Hadamard, where optimization methods are largely<br />

employed to f<strong>in</strong>d adequate solutions.


3.9. CHARACTER RECOGNITION 53<br />

3.9 Character recognition<br />

A classification problem of practical importance is the automatic character recognition:<br />

Given a b<strong>in</strong>ary image of a scanned text, identify all the pr<strong>in</strong>ted glyphs of<br />

the text. This problem has applications <strong>in</strong> OCR (optical character recognition)<br />

systems, which transform a scanned image <strong>in</strong> a text that can be edited <strong>in</strong> a word<br />

processor.<br />

Note that each character represents a region of the plane, which can be <strong>in</strong>terpreted<br />

as a planar graphical object. We need to identify the associated letter of<br />

the alphabet for each of these regions. If we def<strong>in</strong>e the depiction of a character<br />

as another planar graphical object, then the problem is to f<strong>in</strong>d a discrim<strong>in</strong>at<strong>in</strong>g<br />

function to measure the similarity of this type of objects (Atallah, 1984). After<br />

that, we proceed as <strong>in</strong> the general classification problem described above.


54 CHAPTER 3. OPTIMIZATION AND COMPUTER GRAPHICS


Chapter 4<br />

Variational optimization<br />

4.1 Variational problems<br />

A large number of the optimization problems that occur <strong>in</strong> computer graphics are<br />

variational problems. In those problems, the set of possible solutions is the space<br />

of functions def<strong>in</strong>ed on a cont<strong>in</strong>uous doma<strong>in</strong>.<br />

The most classical variational problem is the follow<strong>in</strong>g:<br />

Problem: F<strong>in</strong>d a function y : [x 1 , x 2 ] → R n , satisfy<strong>in</strong>g the conditions y(x 1 ) = y 1<br />

∫x 2<br />

<strong>and</strong> y(x 2 ) = y 2 , that m<strong>in</strong>imizes the <strong>in</strong>tegral f(x, y ′ , y)dx, where f is a known<br />

x 1<br />

real function.<br />

The optimality conditions for the problem above are expressed by a system of<br />

partial differential equations, given below:<br />

Theorem 1 (Euler-Lagrange conditions). Let y : [x 1 , x 2 ] → R n be a twice differentiable<br />

function that solves the problem above. Then y must satisfy the follow<strong>in</strong>g<br />

system of partial differential equations:<br />

( ∂f<br />

∂f<br />

∂y i<br />

= ∂<br />

∂x<br />

∂y i<br />

′<br />

)<br />

, i = 1, . . . , n<br />

In most cases, one cannot solve these equations analytically. For this reason,<br />

the solution of these problems are usually based on numerical techniques,<br />

which require some k<strong>in</strong>d of discretization. There is no general rule to obta<strong>in</strong> this<br />

55


56 CHAPTER 4. VARIATIONAL OPTIMIZATION<br />

discretization, <strong>and</strong> different techniques are found on the literature of numerical<br />

methods for partial differential equations. We use the example below, to discuss<br />

possible solution strategies.<br />

Example 17. Let us consider the problem of f<strong>in</strong>d<strong>in</strong>g a m<strong>in</strong>imum length curve<br />

jo<strong>in</strong><strong>in</strong>g po<strong>in</strong>ts P 1 = (x 1 , y 1 , g(x 1 , y 1 )) <strong>and</strong> P 2 = (x 2 , y 2 , g(x 2 , y 2 )) on a surface <strong>in</strong><br />

R 3 hav<strong>in</strong>g equation z = g(x, y) (see Figure 4.1).<br />

P 1<br />

P 2<br />

Figure 4.1: M<strong>in</strong>imum-length path on a surface<br />

as<br />

Let u : [0, 1] → R 2 be a parametrization of the curve. Its length can be written<br />

∫ 1<br />

0<br />

||u ′ (t)||dt. Therefore, the problem to be solved is that of m<strong>in</strong>imiz<strong>in</strong>g this<br />

<strong>in</strong>tegral subject to the condition that u(0) = P 1 <strong>and</strong> u(1) = P 2 . Hence, this is a<br />

particular case of the problem above.<br />

With little modification, this mathematical formulation applies to problems that<br />

are apparently quite different <strong>in</strong> nature. For <strong>in</strong>stance, consider the problem of<br />

detect<strong>in</strong>g a segment of the boundary (between two given po<strong>in</strong>ts) of the region<br />

correspond<strong>in</strong>g to the bra<strong>in</strong>, <strong>in</strong> the image <strong>in</strong> Figure 4.2. An image can be seen as a<br />

surface z = g(x, y), where z is the gray-level at position (x, y).<br />

The boundary segment can aga<strong>in</strong> be found by m<strong>in</strong>imiz<strong>in</strong>g an appropriate func-


4.1. VARIATIONAL PROBLEMS 57<br />

Figure 4.2: Detect<strong>in</strong>g a boundary segment<br />

tional, of the form<br />

∫ 1<br />

||u ′ (t)||w(u(t))dt (4.1)<br />

0<br />

over all curves u(t) jo<strong>in</strong><strong>in</strong>g the two given po<strong>in</strong>ts. The w factor should be small<br />

at the boundary po<strong>in</strong>ts, <strong>in</strong> order to make them more attractive when m<strong>in</strong>imiz<strong>in</strong>g<br />

the functional. Observe that the case of the m<strong>in</strong>imum length curve is a particular<br />

<strong>in</strong>stance of the this formulation, for w(u) = 1.<br />

For boundary detection, we observe that the magnitude of the gradient tends to<br />

be large at po<strong>in</strong>ts on the boundary. Thus, for the case at h<strong>and</strong>, choos<strong>in</strong>g w(u) =<br />

1<br />

1+|∇f(u)|<br />

gives good results.<br />

Once the problem has been formulated as a variational problem, we face the


58 CHAPTER 4. VARIATIONAL OPTIMIZATION<br />

task of f<strong>in</strong>d<strong>in</strong>g a solution. There are three possible approaches:<br />

4.1.1 Solve the variational problem directly<br />

This requires solv<strong>in</strong>g directly the partial differential equation which express the<br />

optimality conditions for the problem. Even for this simple case, the Euler-<br />

Lagrange conditions result <strong>in</strong> a system of differential equations that cannot be<br />

solve analytically, except for special cases (for <strong>in</strong>stance, when the surface is a<br />

plane).<br />

S<strong>in</strong>ce analytical solutions are not available, the equations must be solved numerically,<br />

which dem<strong>and</strong>s for the <strong>in</strong>troduction of some discretization scheme.<br />

4.1.2 Reduce to a cont<strong>in</strong>uous optimization problem<br />

This strategy consists <strong>in</strong> project<strong>in</strong>g the <strong>in</strong>f<strong>in</strong>ite dimensional solution space <strong>in</strong>to a<br />

space of f<strong>in</strong>ite dimension. In general the solution obta<strong>in</strong>ed is only an approximation<br />

to the solution of the orig<strong>in</strong>al problem. As an example, we could consider the<br />

problem of m<strong>in</strong>imiz<strong>in</strong>g the length of parametric cubic curves which are def<strong>in</strong>ed<br />

by the the equation<br />

α(t) = (c 10 + c 11 t + c 12 t 2 + c 13 t 3 , c 20 + c 21 t + c 22 t 2 + c 23 t 3 ).<br />

This reduces our search to an 8-dimensional space of the parameters c ij , on which<br />

we can use methods <strong>and</strong> techniques from cont<strong>in</strong>uous optimization. We should<br />

remark however that even <strong>in</strong> this cont<strong>in</strong>uous problem we must resort to some<br />

discretization <strong>in</strong> order to compute, approximately, the <strong>in</strong>tegral that provides the<br />

length of the curve.<br />

4.1.3 Reduce to a discrete optimization problem<br />

Here, the discretization occurs <strong>in</strong> the beg<strong>in</strong>n<strong>in</strong>g of the process. The doma<strong>in</strong> of<br />

the objective function f is discretized us<strong>in</strong>g a regular grid <strong>and</strong> we consider the<br />

graph G whose vertices are the grid <strong>in</strong>tersections, whose edges correspond to pairs<br />

of vertices <strong>and</strong> the length of each edge is the length, on the surface geometry,<br />

of the curve generated by the l<strong>in</strong>e segment that connects the two vertices (see<br />

Figure 4.1.3).


4.2. APPLICATIONS IN COMPUTER GRAPHICS 59<br />

P 1<br />

P 2<br />

Figure 4.3: Discretiz<strong>in</strong>g a variational problem<br />

The curve of m<strong>in</strong>imal length is found by solv<strong>in</strong>g a problem of m<strong>in</strong>imum path<br />

<strong>in</strong> the graph G (see Chapter 6). The quality of the approximation obta<strong>in</strong>ed by this<br />

method certa<strong>in</strong>ly depends, on the resolution of the grid (number of vertices). We<br />

should remark, however, that the more we <strong>in</strong>crease the grid resolution the greater<br />

will be the computational cost of the solution.<br />

4.2 Applications <strong>in</strong> computer graphics<br />

In this section we study the follow<strong>in</strong>g problems, from computer graphics, that can<br />

be modeled as a variational problem:<br />

1. Variational model<strong>in</strong>g of curves;<br />

2. Variational model<strong>in</strong>g of surfaces;<br />

The first two problems orig<strong>in</strong>ate from the area of variational model<strong>in</strong>g which<br />

consists <strong>in</strong> us<strong>in</strong>g variational techniques <strong>in</strong> the area of geometric model<strong>in</strong>g.


60 CHAPTER 4. VARIATIONAL OPTIMIZATION<br />

4.2.1 Variational Model<strong>in</strong>g of Curves<br />

The basic problem of describ<strong>in</strong>g curves is related to the creation of geometric<br />

objects <strong>in</strong> the context of specific applications. A generic method consists <strong>in</strong> determ<strong>in</strong><strong>in</strong>g<br />

a curve that passes through a f<strong>in</strong>ite set of po<strong>in</strong>ts p 1 , . . . , p n <strong>and</strong> at the same<br />

time satisfy some application-dependent criteria. In general, besides obta<strong>in</strong><strong>in</strong>g a<br />

curve with a given shape, many other conditions that measure the “quality” of the<br />

curve can be imposed. Among those conditions we could mention:<br />

• Cont<strong>in</strong>uity;<br />

• Tangent l<strong>in</strong>e at control po<strong>in</strong>ts;<br />

• Cont<strong>in</strong>uous curvature, etc.<br />

Each condition imposes some constra<strong>in</strong>t on the curve to be constructed. These<br />

constra<strong>in</strong>ts can be of analytical, geometric or topological nature. A convenient<br />

way to obta<strong>in</strong> curves that satisfy some set of conditions is to pose the problem<br />

<strong>in</strong> the context of optimization: def<strong>in</strong>e an energy functional such that the curves<br />

which are m<strong>in</strong>imizers of this functional automatically satisfy the desired criteria.<br />

D. Bernoulli 1 was the first mathematician to propose a functional to measure<br />

the energy associated with the tension of a curve. This energy is called tension<br />

energy <strong>and</strong> is proportional to the square of the curvature k of the curve:<br />

∫<br />

E tension (α) = k 2 ds, (4.2)<br />

where s is the arc length of the curve α. Note that the tension energy of a straight<br />

l<strong>in</strong>e is zero because its curvature is identically null. On the other h<strong>and</strong>, if the tension<br />

energy of a curve is high, then it “bends” significantly. Curves that m<strong>in</strong>imize<br />

the tension energy are called non-l<strong>in</strong>ear spl<strong>in</strong>es.<br />

The geometric constra<strong>in</strong>ts that are imposed can be of punctual or directional<br />

nature. An example of punctual constra<strong>in</strong>t is to force the curve to pass through<br />

a f<strong>in</strong>ite set of po<strong>in</strong>ts. An example of directional constra<strong>in</strong>t is to force the tangent<br />

vector of the curve to have a certa<strong>in</strong> direction, previously specified.<br />

1 Daniel Bernoulli (1700-1782), Swiss mathematician known by its work <strong>in</strong> hydrodynamics<br />

<strong>and</strong> mechanics.<br />

α


4.2. APPLICATIONS IN COMPUTER GRAPHICS 61<br />

The geometric constra<strong>in</strong>ts are <strong>in</strong> general modeled by an energy functional associated<br />

with external forces that act on the curve. The tension energy discussed<br />

earlier is an <strong>in</strong>ternal energy to the curve. In this way, we have a unification of the<br />

optimization problem look<strong>in</strong>g for the m<strong>in</strong>ima of an energy functional E that has<br />

an <strong>in</strong>ternal component <strong>and</strong> an external component:<br />

E total (α) = E <strong>in</strong>t (α) + E ext (α).<br />

A simplification adopted very often consists <strong>in</strong> decompos<strong>in</strong>g the <strong>in</strong>ternal energy<br />

<strong>in</strong>to a bend<strong>in</strong>g <strong>and</strong> an stretch components. In this way, the <strong>in</strong>ternal energy is<br />

given by the l<strong>in</strong>ear comb<strong>in</strong>ation<br />

E <strong>in</strong>t (α) = µ E bend<strong>in</strong>g +(1 − µ) E stretch ,<br />

µ ∈ [0, 1]. The bend<strong>in</strong>g energy is given by equation (4.2), <strong>and</strong> the stretch energy<br />

is given by<br />

∫<br />

E stretch (α) = ||α ′ (t)|| dt.<br />

Therefore, the <strong>in</strong>ternal energy controls the energy accumulated by the curve<br />

when it is bended or stretched. A m<strong>in</strong>imization of this functional results <strong>in</strong> a<br />

curve that is as short <strong>and</strong> straight as possible. If the curve is constra<strong>in</strong>ed to pass<br />

through two po<strong>in</strong>ts, the solution of the m<strong>in</strong>imization problem is certa<strong>in</strong>ly a l<strong>in</strong>e<br />

segment. The external energy results <strong>in</strong> non-trivial solutions to the problem.<br />

Intuitively, the external energy deforms the curve <strong>in</strong> many ways, bend<strong>in</strong>g <strong>and</strong><br />

stretch<strong>in</strong>g it. In general, the user specifies various components of external energy,<br />

which are l<strong>in</strong>early comb<strong>in</strong>ed <strong>in</strong> order to obta<strong>in</strong> the desired deformation. This<br />

energy is def<strong>in</strong>ed <strong>in</strong> the form of attractors or repulsors, which be can be of two<br />

k<strong>in</strong>ds: positional or directional. We will give below two examples to clarify these<br />

concepts.<br />

Example 18 (Punctual attractors). An example of a positional attractor is the<br />

punctual attractor, whose energy functional is given by<br />

E punctual (α) = d(α, p) 2 = m<strong>in</strong> t ||α(t) − p|| 2 .<br />

Figure 4.4(a) shows an example of a curve that passes through po<strong>in</strong>ts p 0 <strong>and</strong> p 1<br />

while be<strong>in</strong>g deformed by attraction to po<strong>in</strong>t p.<br />

α


62 CHAPTER 4. VARIATIONAL OPTIMIZATION<br />

p<br />

p<br />

1<br />

p<br />

1<br />

r<br />

p<br />

0<br />

(a)<br />

p<br />

0<br />

(b)<br />

Figure 4.4: Attractors: punctual (a) <strong>and</strong> directional (b).<br />

Example 19 (Directional attractor). An example of a directional attractor is based<br />

on specify<strong>in</strong>g a l<strong>in</strong>e r(t) = p + tv. The curve starts at po<strong>in</strong>t p 0 <strong>and</strong> ends at po<strong>in</strong>t<br />

p 1 . The external energy makes the curve to approach po<strong>in</strong>t p <strong>and</strong> the tangent to<br />

the curve at po<strong>in</strong>ts near p will l<strong>in</strong>e up with vector v. (see Figure 4.4(b)). This<br />

condition can be obta<strong>in</strong>ed by an energy functional given by the equation<br />

E dir (α) = m<strong>in</strong><br />

t<br />

||α ′ (t) ∧ v|| 2 .<br />

Accord<strong>in</strong>g to the classification <strong>in</strong> Chapter 2, the problem of variational model<strong>in</strong>g<br />

of curves consists <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g a curve satisfy<strong>in</strong>g certa<strong>in</strong> constra<strong>in</strong>ts (for example,<br />

pass through a f<strong>in</strong>ite number of po<strong>in</strong>ts p 1 , p 2 , . . . , p n ), that m<strong>in</strong>imize a certa<strong>in</strong> energy<br />

functional (the objective function). This functional is def<strong>in</strong>ed <strong>in</strong> such a way<br />

that the curves that m<strong>in</strong>imize it have some of the desired characteristics. Among<br />

these characteristics we can mention: be<strong>in</strong>g smooth, to stay close to some prespecified<br />

po<strong>in</strong>ts, etc.<br />

In order to fix the ideas, we consider the case of f<strong>in</strong>d<strong>in</strong>g a curve α that passes<br />

through p 1 , p 2 , . . . , p n <strong>and</strong> m<strong>in</strong>imizes the energy functional<br />

where<br />

∫<br />

E <strong>in</strong>t (α) = µE tension + (1 − µ)E stretch = µ<br />

E = E <strong>in</strong>t + Eext (4.3)<br />

α<br />

∫<br />

k 2 ds + (1 − µ) |α ′ (t)| dt<br />

α


4.2. APPLICATIONS IN COMPUTER GRAPHICS 63<br />

<strong>and</strong><br />

Eext = c<br />

m∑<br />

j=1<br />

m<strong>in</strong><br />

t<br />

d(α(t), q j ) 2<br />

In the above equations, µ <strong>and</strong> c are constants <strong>and</strong> {q 0 , q 1 , . . . , q m } is a set of<br />

attractors.<br />

This is a variational problem. That is, the solution set has <strong>in</strong>f<strong>in</strong>ite dimension<br />

<strong>and</strong> its elements consist of all of the smooth curves that pass through the given<br />

po<strong>in</strong>ts. An approach to solve the problem consists <strong>in</strong> us<strong>in</strong>g the Euler-Lagrange<br />

equation to formulate it as a partial derivative problem. Another approach, which<br />

we adopt here, is that of solv<strong>in</strong>g an approximation to the problem by project<strong>in</strong>g it<br />

onto an appropriate f<strong>in</strong>ite dimensional space.<br />

In (Wessel<strong>in</strong>k, 1996) the space of uniform cubic B-spl<strong>in</strong>es curves is used with<br />

this purpose. These curves are described by the equations<br />

∑n−1<br />

α(t) = P i Ni 3 (t), t ∈ [2, n − 1] (4.4)<br />

i=0<br />

where P 0 , P 1 , . . . , P n are called control po<strong>in</strong>ts <strong>and</strong> N 3 1 , N 3 2 , . . . , N 3 n−1 constitute a<br />

basis of cubic uniform B-spl<strong>in</strong>es, (see (Bartels et al. , 1987)), def<strong>in</strong>ed by<br />

{ 1, se i − 1 ≤ t < i<br />

Ni 0 (u) = 0, otherwise<br />

N d i (u) = t − i + 1<br />

d<br />

N d−1<br />

i<br />

(t) + i + d − t<br />

d − 1<br />

N d−1<br />

i+1 (t)<br />

It is possible to prove that, for every t ∈ [2, n − 1], we have ∑ n−1<br />

i=1 N i 3 (t) = 1.<br />

Moreover, Ni 3(t) ≥ 0 (i = 1, . . . , n−1). Therefore, each po<strong>in</strong>t α(t) of the curve<br />

is a convex comb<strong>in</strong>ation of control po<strong>in</strong>ts. The basis functions, whose graphs<br />

are depicted <strong>in</strong> Figure 4.2.1. determ<strong>in</strong>e the weight of each control po<strong>in</strong>t <strong>in</strong> α(t)<br />

(Fig 4.2.1).<br />

There are several reasons to justify the choice of the space of cubic B-spl<strong>in</strong>e<br />

curves:<br />

• Cubic B-spl<strong>in</strong>es are frequently used <strong>in</strong> <strong>in</strong>teractive model<strong>in</strong>g of curves<br />

(ma<strong>in</strong>ly because of the properties described below). This enables us to add<br />

variational model<strong>in</strong>g techniques to exist<strong>in</strong>g model<strong>in</strong>g systems.


P 0<br />

P 5<br />

64 CHAPTER 4. VARIATIONAL OPTIMIZATION<br />

1.0<br />

N 0 N 1 N 2 N 3 N 4 N 5<br />

1 2 3 4 5 6 7 8 9<br />

Figure 4.5: Basis of B-spl<strong>in</strong>e curves.<br />

P 2<br />

P 3<br />

P 1<br />

P 4<br />

Figure 4.6: Control po<strong>in</strong>ts <strong>and</strong> B-spl<strong>in</strong>e curves.


4.2. APPLICATIONS IN COMPUTER GRAPHICS 65<br />

• Cubic B-spl<strong>in</strong>es are piecewise polynomial curves, with the smallest degree<br />

<strong>in</strong> such a way as to provide the adequate level of cont<strong>in</strong>uity for most of the<br />

applications (of first order <strong>in</strong> the entire curve, <strong>and</strong> of second order except at<br />

a f<strong>in</strong>ite number of po<strong>in</strong>ts).<br />

• Cubic B-spl<strong>in</strong>es have local control. That is, each basis function has compact<br />

support, <strong>and</strong> as a consequence each control po<strong>in</strong>t <strong>in</strong>fluences only part of the<br />

curve which is very useful. In <strong>in</strong>teractive model<strong>in</strong>g it implies that changes<br />

<strong>in</strong> control po<strong>in</strong>ts affect only part of the curve. On the other h<strong>and</strong> it results<br />

from the compactness of the support that the <strong>in</strong>terpolation constra<strong>in</strong>ts are<br />

expressed by an sparse system of equations.<br />

The problem of f<strong>in</strong>d<strong>in</strong>g a curve given by 4.4 which m<strong>in</strong>imizes the functional<br />

given <strong>in</strong> 4.3 is still difficult from the computational po<strong>in</strong>t of view, because the<br />

terms of external energy <strong>in</strong>volve <strong>in</strong>tegrals for which there are no analytical expression<br />

available. Moreover, the constra<strong>in</strong>ts associated with the requirement that<br />

the curve conta<strong>in</strong> a set of po<strong>in</strong>ts, as formulated above, do not specify the values<br />

of the parameter t correspond<strong>in</strong>g to those po<strong>in</strong>ts. If the variables correspond<strong>in</strong>g<br />

to these parameter values are <strong>in</strong>troduced <strong>in</strong> the problem, such constra<strong>in</strong>ts become<br />

non-l<strong>in</strong>ear. An analogous difficulty occurs when comput<strong>in</strong>g the attraction energy.<br />

Thus, it is necessary to <strong>in</strong>troduce additional simplifications:<br />

• The energies of stretch<strong>in</strong>g <strong>and</strong> bend<strong>in</strong>g can be approximated by quadratic<br />

functions of the control po<strong>in</strong>ts, us<strong>in</strong>g<br />

∫<br />

E bend<strong>in</strong>g ≈ |α(t)′′| 2 dt<br />

<strong>and</strong><br />

α<br />

∫<br />

E stretch<strong>in</strong>g ≈<br />

α<br />

|α(t)′| 2 dt<br />

Both <strong>in</strong>tegrals above are of the form ∑ i,j a ijP i P j , where the constants a ij<br />

are <strong>in</strong>tegrals expressed <strong>in</strong> terms of the basis functions.<br />

• In the <strong>in</strong>terpolation constra<strong>in</strong>ts <strong>and</strong> <strong>in</strong> the attraction functionals we specify<br />

the correspond<strong>in</strong>g parametric values. This forces that the <strong>in</strong>terpolation constra<strong>in</strong>ts<br />

become l<strong>in</strong>ear as a function of the control po<strong>in</strong>ts <strong>and</strong> the attraction<br />

functional becomes a quadratic function of the same po<strong>in</strong>ts.


66 CHAPTER 4. VARIATIONAL OPTIMIZATION<br />

Thus, the problem is reduced to the one of m<strong>in</strong>imiz<strong>in</strong>g a quadratic function of<br />

the control po<strong>in</strong>ts subject to l<strong>in</strong>ear constra<strong>in</strong>ts, <strong>and</strong> it can be solved us<strong>in</strong>g the first<br />

order optimality condition.<br />

The model<strong>in</strong>g process described <strong>in</strong> (Wessel<strong>in</strong>k, 1996) is the follow<strong>in</strong>g:<br />

1. An <strong>in</strong>itial curve is modeled <strong>in</strong> the conventional way, us<strong>in</strong>g the manipulation<br />

of the control po<strong>in</strong>ts.<br />

2. On this curve the user specifies the <strong>in</strong>terpolation po<strong>in</strong>ts <strong>and</strong> outside of the<br />

curve the user specifies the attractor po<strong>in</strong>ts.<br />

3. The control po<strong>in</strong>ts are then recomputed <strong>in</strong> such a way to m<strong>in</strong>imize the total<br />

energy accord<strong>in</strong>g the the <strong>in</strong>terpolation constra<strong>in</strong>ts.<br />

4. The user may <strong>in</strong>sert new control po<strong>in</strong>ts or adjust the curve manually <strong>and</strong><br />

return to step 2 above.<br />

Figure 4.7 shows, on the right, the result of specify<strong>in</strong>g two attractors to the<br />

curve on the left, which was required to pass through the two extreme po<strong>in</strong>ts<br />

<strong>and</strong> the central po<strong>in</strong>t, <strong>and</strong> ma<strong>in</strong>ta<strong>in</strong> the vertical direction at the other two marked<br />

po<strong>in</strong>ts.<br />

4.3 Comments <strong>and</strong> references<br />

The bibliography on variational methods is quite extensive <strong>and</strong> can be mathematically<br />

challeng<strong>in</strong>g. We refer the reader to (We<strong>in</strong>stock, 1974) for a <strong>in</strong>troductory text<br />

on variational optimization, with emphasis <strong>in</strong> applications.


4.3. COMMENTS AND REFERENCES 67<br />

Figure 4.7: Interactive variational model<strong>in</strong>g.


68 CHAPTER 4. VARIATIONAL OPTIMIZATION


Chapter 5<br />

Cont<strong>in</strong>uous optimization<br />

In this chapter we study the cont<strong>in</strong>uous optimization methods <strong>and</strong> techniques. As<br />

we know from Chapter 2 cont<strong>in</strong>uous optimization problems are posed <strong>in</strong> the form<br />

m<strong>in</strong><br />

x∈S f(x),<br />

where f is a real function <strong>and</strong> S is a subset of R n , described us<strong>in</strong>g a, possibly<br />

empty, set of equations <strong>and</strong> <strong>in</strong>equalities. Thus, the general problem that we will<br />

study is<br />

m<strong>in</strong><br />

f(x)<br />

subject to h i (x) = 0, i = 1, . . . , m<br />

g j (x) ≥ 0,<br />

j = 1, . . . , p<br />

We will dedicate special attention to the case where m = p = 0. In this case,<br />

the solution set is the entire space R n , <strong>and</strong> the optimization problem is called<br />

unconstra<strong>in</strong>ed.<br />

Consider a cont<strong>in</strong>uous optimization problem m<strong>in</strong> x∈R nf(x), <strong>and</strong> a po<strong>in</strong>t x 0 ∈<br />

R n . The methods <strong>and</strong> techniques studied <strong>in</strong> this chapter will provide answers to<br />

the follow<strong>in</strong>g questions:<br />

• Is x 0 a solution of the optimization problem?<br />

69


70 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

• If not, how can we compute a better alternative than the one provided by<br />

x 0 ?<br />

As we will see, the answers are extensions, to R n , of the well known classical<br />

theory of maxima <strong>and</strong> m<strong>in</strong>ima for real functions of real variables. The ma<strong>in</strong> tool<br />

used <strong>in</strong> the proofs is Taylor’s theorem, which provides a polynomial approximation<br />

to a function f : R n → R <strong>in</strong> a neighborhood of a po<strong>in</strong>t a us<strong>in</strong>g the value of f<br />

<strong>and</strong> its derivatives at the po<strong>in</strong>t a.<br />

In particular, we will use the second order approximation given by the <strong>in</strong>f<strong>in</strong>itesimal<br />

version of Taylor’s theorem (see (Rud<strong>in</strong>, 1976))<br />

Theorem 2. Consider a function f : R n → R <strong>and</strong> a po<strong>in</strong>t x 0 , h ∈ R n . Suppose<br />

that f is twice differentiable at x 0 . Def<strong>in</strong>e r : R n → R by<br />

f(x 0 + h) = f(x 0 ) + ∇f(x 0 ) · h + 1 2 h⊤ ∇ 2 f(x 0 ) · h + r(h).<br />

Then lim h→0 r(h)/|h| = 0.<br />

In the above theorem, ∇f <strong>and</strong> ∇ 2 f denote, respectively, the gradient vector<br />

<strong>and</strong> the Hessian matrix of the function f, def<strong>in</strong>ed by<br />

∇f(x) =<br />

(<br />

∂f<br />

∂x 1<br />

(x) · · ·<br />

∂f<br />

∂x n<br />

(x)<br />

)<br />

<strong>and</strong><br />

∇ 2 f(x) =<br />

⎛<br />

⎜<br />

⎝<br />

∂ 2 f<br />

∂ 2 f<br />

∂x 1 ∂x 1<br />

(x) · · ·<br />

∂x 1 ∂x n<br />

(x)<br />

.<br />

. .. .<br />

∂ 2 f<br />

∂<br />

∂x n∂x 1<br />

(x) · · ·<br />

2 f<br />

∂x n∂x n<br />

(x)<br />

⎞<br />

⎟<br />

⎠<br />

5.1 Optimality conditions<br />

The theorems that follow establish necessary <strong>and</strong> sufficient conditions <strong>in</strong> order<br />

that x 0 ∈ R n be a local m<strong>in</strong>imum or maximum po<strong>in</strong>t of a function f.<br />

We start with the theorem that establishes, for functions of n variables, that the<br />

derivative must vanish at the local extrema.


5.1. OPTIMALITY CONDITIONS 71<br />

Theorem 3 (First order necessary conditions). Let f : R n → R be a function of<br />

class C 1 . If x 0 is a local m<strong>in</strong>imum (or maximum) po<strong>in</strong>t of f, then ∇f(x 0 ) = 0.<br />

Proof. Suppose that ∇f(x 0 ) ≠ 0. Take an arbitrary vector d such that ∇f(x 0 ) ·<br />

d < 0 (e.g. we may choose d = −∇f(x 0 )). We have, from the first order Taylor’s<br />

formula (or simply from the def<strong>in</strong>ition of derivative):<br />

f(x 0 + λd) = f(x 0 ) + λ∇f(x) · d + r(λ)<br />

= f(x 0 ) + λ(∇f(x 0 ) · d + r(λ)<br />

λ ),<br />

r(λ)<br />

where r(λ) is such that lim λ→0 = 0. For λ > 0 <strong>and</strong> sufficiently small,<br />

λ<br />

∇f(x 0 ) · d + r(λ) < 0, which implies that f(x<br />

λ<br />

0 + λd) < f(x 0 ), <strong>and</strong> this shows<br />

that x 0 is not a local m<strong>in</strong>imum. By tak<strong>in</strong>g −d <strong>in</strong>stead of d we show, <strong>in</strong> the same<br />

way, that x 0 is not a local maximum either.<br />

The above proof has an important ideia, which is used by several algorithms:<br />

if ∇f(x 0 ) ≠ 0, then it is possible, start<strong>in</strong>g from x 0 , to f<strong>in</strong>d direction vectors d<br />

such that the value of f decreases along these directions (at least locally). These<br />

directions, which are characterized by satisfy<strong>in</strong>g the <strong>in</strong>equality ∇f(x 0 )d < 0, are<br />

called descent directions (see Figure 5.1). Note that the condition ∇f(x 0 ) = 0<br />

expresses the fact that there does not exist either descent or ascent directions at x 0 .<br />

For this reason, the po<strong>in</strong>ts where this condition is satisfied are called stationary<br />

po<strong>in</strong>ts of f.<br />

f( x , y )<br />

0 0<br />

( x , y )<br />

0 0<br />

level curve of f<br />

descent directions<br />

Figure 5.1: Descent directions.


72 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

We should remark that the condition stated <strong>in</strong> Proposition 3 is only necessary.<br />

The po<strong>in</strong>ts that satisfy these conditions, i.e. the po<strong>in</strong>ts where the gradient vector<br />

is null, are called critical po<strong>in</strong>ts of the function f. As <strong>in</strong> the one-dimensional<br />

case, it is possible to use the second derivative of f at x 0 <strong>in</strong> order to obta<strong>in</strong> more<br />

<strong>in</strong>formation about the critical po<strong>in</strong>t x 0 .<br />

Theorem 4 (Second order necessary conditions). Let f : R n → R be a function<br />

of class C 2 . If x 0 is a local m<strong>in</strong>imum of f, then ∇f(x 0 ) = 0 <strong>and</strong> the Hessian<br />

matrix H = ∇ 2 f(x 0 ) is non-negative (i.e. the Hessian matrix satifyes d ⊤ Hd ≥ 0,<br />

for every d ∈ R n ).<br />

Proof. Let x 0 be a local m<strong>in</strong>imum of f. From the first order conditions, we<br />

have ∇f(x 0 ) = 0. Suppose, then, that there exists a direction vector d such<br />

that d ⊤ Hd < 0. From Teorema 2, we have:<br />

f(x 0 + λd) = f(x 0 ) + λ∇f(x 0 ).d + 1 2 λ2 d ⊤ Hd + r(h)<br />

= f(x 0 ) + λ 2 ( 1 2 d⊤ Hd + r(λ)<br />

λ 2 )<br />

r(λ)<br />

where lim λ→0 = 0. S<strong>in</strong>ce d ⊤ Hd < 0, it follows that d ⊤ Hd+ r(λ) < 0. Therefore,<br />

we have f(x 0 + λd) < f(x 0 ), for |λ| sufficiently small, <strong>and</strong> this contradicts<br />

λ 2<br />

λ 2<br />

the fact that x 0 is a local m<strong>in</strong>imum. Therefore, we must have d ⊤ Hd ≥ 0 for every<br />

d.<br />

The condition d ⊤ Hd ≥ 0 for every d is analogous to the non-negativeness of<br />

the second derivative at a local m<strong>in</strong>imum of a real function of one real variable.<br />

We must impose extra conditions on H <strong>in</strong> order to guarantee the x 0 is <strong>in</strong>deed a<br />

local m<strong>in</strong>imum of f. In the one-dimensional case it is sufficient to impose that the<br />

second derivative be positive at x 0 ; <strong>in</strong> a similar way, <strong>in</strong> the case of n variables, we<br />

must dem<strong>and</strong> that he Hessian matrix be positive. This is the content of<br />

Theorem 5 (Second order necessary <strong>and</strong> sufficient conditions). Let f : R n → R<br />

be a function of class C 2 . Suppose that ∇f(x 0 ) = 0 <strong>and</strong> that the Hessian matrix<br />

H = ∇ 2 f(x 0 ) is positive (that is, d ⊤ Hd > 0, for every d ≠ 0). Then x 0 is an<br />

strict local m<strong>in</strong>imum of f.


5.1. OPTIMALITY CONDITIONS 73<br />

Proof. Let x 0 be a po<strong>in</strong>t of R n satisfy<strong>in</strong>g the conditions of the theorem. S<strong>in</strong>ce<br />

d ⊤ Hd > 0 for every d ≠ 0, there exists a > 0 such that d ⊤ Hd > a|d| 2 for every<br />

d ≠ 0 (it is enough to take a as be<strong>in</strong>g the maximum value of d ⊤ Hd <strong>in</strong> the compact<br />

set {d ∈ R n ||d| = 1}). Then, from the second order Taylor’s formula,<br />

f(x + d) = f(x 0 ) + f(x 0 ).d + 1 2 d⊤ Hd + r(d)<br />

> f(x 0 ) + 1 2 a|d|2 + r(d)<br />

= f(x 0 ) + |d| 2 ( a 2 + r(d)<br />

|d| 2 )<br />

r(d)<br />

where r is such that lim d→0 = 0. Hence, for |d| sufficiently small, we have<br />

|d| 2<br />

a + r(d) > 0 <strong>and</strong> as a consequece we obta<strong>in</strong> f(x + d) > f(x<br />

|d| 2<br />

0 ). Therefore, f has<br />

a strict local m<strong>in</strong>imum at x 0 .<br />

Summariz<strong>in</strong>g this discussion, if x 0 is a critical po<strong>in</strong>t of f (that is, such that<br />

∇f(x 0 ) = 0) <strong>and</strong> H is the Hessian matrix of f at x 0 , then we have the follow<strong>in</strong>g<br />

cases:<br />

a) If d ⊤ Hd > 0 for every d ≠ 0, then x 0 is a relative m<strong>in</strong>imum po<strong>in</strong>t of f.<br />

b) If d ⊤ Hd < 0 for every d ≠ 0, then x 0 is a relative maximum po<strong>in</strong>t of f.<br />

c) If there exist d 1 <strong>and</strong> d 2 such that d 1 ⊤ Hd 1 > 0 <strong>and</strong> d 2 ⊤ Hd 2 < 0, then x 0<br />

is neither a relative m<strong>in</strong>imum nor a relative maximum of f. In fact, f has<br />

a m<strong>in</strong>imum at x 0 along the direction d 1 <strong>and</strong> a maximum at x 0 along the<br />

direction d 2 . In this case we say that x 0 is a saddle po<strong>in</strong>t of f.<br />

d) If d ⊤ Hd ≥ 0 for every d (with d 1 ⊤ Hd 1 = 0 for some direction d 1 ≠ 0 <strong>and</strong><br />

d 2 ⊤ Hd 2 > 0 for some direction d 2 ), then f may have a local m<strong>in</strong>imum at<br />

x 0 (<strong>and</strong> certa<strong>in</strong>ly does not have a local maximum there). On the other h<strong>and</strong>,<br />

if d ⊤ Hd ≤ 0 for every d (with d 1 ⊤ Hd 1 = 0 for some direction d 1 ≠ 0 <strong>and</strong><br />

d 2 ⊤ Hd 2 < 0 for some direction d 2 ), then f may have a local maximum at<br />

x 0 (<strong>and</strong> certa<strong>in</strong>ly does not have a local m<strong>in</strong>imum there).


74 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

It is possible to illustrate all of the cases described above us<strong>in</strong>g quadratic functions<br />

of two variables. That is, functions of the form f(x) = c + b ⊤ x + 1 2 x⊤ Hx,<br />

where b <strong>and</strong> c are 2 × 1 vectors e H is a symmetric 2 × 2 matrix. It is easy to<br />

verify that the gradient of f is ∇f(x) = (b + Hx) ⊤ <strong>and</strong> that its Hessian matrix<br />

is ∇ 2 f(x) = H. Therefore, the critical po<strong>in</strong>ts of f are the solutions of the<br />

equation Hx = −b. If H is not s<strong>in</strong>gular then there exists a unique critical po<strong>in</strong>t<br />

x 0 = −H −1 b. In this case, the occurrence of a maximum or m<strong>in</strong>imum po<strong>in</strong>t at<br />

this critical po<strong>in</strong>t depends on the positivity or negativity of the matrix H, that is,<br />

on the sign of its eigenvalues. If H is not s<strong>in</strong>gular, f may or may not have critical<br />

po<strong>in</strong>ts. The examples below illustrate these situations.<br />

a) H has 2 positive eigenvalues (e.g. if f(x, y) = 2x + x 2 + 2y 2 ). In this case,<br />

f has a unique local m<strong>in</strong>imum at the po<strong>in</strong>t x 0 = −H −1 b = (1, 0). The level<br />

curves of f are illustrated <strong>in</strong> Figura 5.1. In fact, f has a global m<strong>in</strong>imum at<br />

x 0 , s<strong>in</strong>ce one can easily verify that:<br />

f(x) = 1 2 (x − x 0) ⊤ H(x − x 0 ) + f(x 0 ).<br />

(Analogously, if H has two negative eigenvalues, then f has a unique local<br />

maximum, which will also be a global maximum).<br />

b) H has a positive <strong>and</strong> a negative eigenvalue (e.g., if f(x, y) = 2x+x 2 −2y 2 ).<br />

In this case the unique critical po<strong>in</strong>t of f (which is x 0 = −H −1 b = (1, 0))<br />

is a saddle po<strong>in</strong>t <strong>and</strong> f does not have relative extrema. Figure 5.1 shows the<br />

level curves of f for this case.<br />

c) H has a positive <strong>and</strong> a null eigenvalue. If b does not belong to the onedimensional<br />

space generated by the columns of H, then f does not have<br />

critical po<strong>in</strong>ts <strong>and</strong>, therefore, it does not have relative extrema (Figure 5.1).<br />

This is the case for the function f(x) = 2x + 2y 2 . However, if b belongs<br />

to the space generated by the columns of H, then the set of solutions of<br />

the equation Hx 0 = −b is a straight l<strong>in</strong>e. Each po<strong>in</strong>t x 0 of this l<strong>in</strong>e is a<br />

critical po<strong>in</strong>t of f. Moreover, s<strong>in</strong>ce f(x) = (x − x 0 ) ⊤ H(x − x 0 ) + f(x 0 ),<br />

each one of these critical po<strong>in</strong>ts is a local (<strong>and</strong> global) m<strong>in</strong>imum po<strong>in</strong>t of<br />

f. (This illustrates the fact that the sufficient conditions of second order are


5.1. OPTIMALITY CONDITIONS 75<br />

y<br />

4<br />

2<br />

po<strong>in</strong>t of<br />

m<strong>in</strong>imum<br />

-4<br />

-2<br />

2<br />

4<br />

x<br />

-2<br />

-4<br />

Figure 5.2: Quadratic function with a unique local m<strong>in</strong>imum.<br />

only sufficient: H is not positive at the critical po<strong>in</strong>ts, but they are local<br />

m<strong>in</strong>ima). As an example, for f(x) = 2x + x 2 , the m<strong>in</strong>ima are the po<strong>in</strong>ts of<br />

the l<strong>in</strong>e x = 1. Figure 5.1 illustrates this last situation.<br />

5.1.1 Convexity <strong>and</strong> global m<strong>in</strong>ima<br />

The theorems of the previous section provide necessary <strong>and</strong> sufficient conditions<br />

to verify the occurrence of local m<strong>in</strong>ima or maxima of a function. In practice,<br />

however, we are <strong>in</strong>terested <strong>in</strong> global maxima <strong>and</strong> m<strong>in</strong>ima. General techniques<br />

to obta<strong>in</strong> these extrema are studied <strong>in</strong> Chapter 7. However there exists a class of<br />

functions, called convex, for which we can guarantee the ocurrence of the same<br />

phenomena that we observed <strong>in</strong> the examples (a) <strong>and</strong> (c) above, where the local<br />

m<strong>in</strong>ima that we computed were, <strong>in</strong> fact, global m<strong>in</strong>ima.


76 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

4<br />

y<br />

2<br />

saddle<br />

po<strong>in</strong>t<br />

X<br />

-4<br />

-2<br />

2<br />

4<br />

-2<br />

-4<br />

Figure 5.3: Quadratic function without local extrema.<br />

A function f : R n → R is convex when<br />

f(λx 1 + (1 − λ)x 2 ) ≤ λf(x 1 ) + (1 − λ)f(x 2 ),<br />

for every x 1 , x 2 ∈ R n e 0 < λ < 1. That is, the value of f at an <strong>in</strong>terior pont of a<br />

segment with extremes x 1 <strong>and</strong> x 2 is at most equal to the result obta<strong>in</strong>ed by l<strong>in</strong>ear<br />

<strong>in</strong>terpolation between f(x 1 ) <strong>and</strong> f(x 2 ).<br />

Theorem 6. If f : R n → R is convex <strong>and</strong> f has a local m<strong>in</strong>imum at x 0 , then f<br />

has a global m<strong>in</strong>imum at x 0 .<br />

Proof. Suppose that x 0 is not a global m<strong>in</strong>imum po<strong>in</strong>t of f. Then, there exists<br />

x 1 such that f(x 1 ) < f(x 0 ). Thus, for every po<strong>in</strong>t λx 0 + (1 − λ)x 1 <strong>in</strong> the l<strong>in</strong>e<br />

segment x 0 x 1 we have<br />

f(λx 0 + (1 − λ)x 1 ) ≤ λf(x 0 ) + (1 − λ)f(x 1 ) < f(x 0 ).


5.1. OPTIMALITY CONDITIONS 77<br />

y<br />

4<br />

2<br />

X<br />

-4<br />

-2<br />

2<br />

4<br />

-2<br />

-4<br />

Figure 5.4: Quadratic function without local extrema.<br />

Thus, f cannot have a local m<strong>in</strong>imum at x 0 . Therefore, if x 0 is a local m<strong>in</strong>imum,<br />

then it is also necessarily a global m<strong>in</strong>imum po<strong>in</strong>t.<br />

It is not always easy to verify the convexity of a function us<strong>in</strong>g the def<strong>in</strong>ition.<br />

The follow<strong>in</strong>g characterization, for functions of class C 2 , is used <strong>in</strong> practice (see<br />

(Luenberger, 1984)).<br />

Theorem 7. A function f : R n → R, of class C 2 , is convex if, <strong>and</strong> only if, the<br />

Hessian matrix H(x) = ∇ 2 f(x) is non-negative for every x (that is, d ⊤ H(x)d ≥<br />

0, for every x <strong>and</strong> every d).<br />

The examples given <strong>in</strong> the previous sections, with quadratic functions, are,<br />

therefore, particular cases of the results stated above: functions of type f(x) =<br />

b ⊤ x + x ⊤ Hx are convex when H is non-negaitive. Therefore, when they possess<br />

a local m<strong>in</strong>imum at a given po<strong>in</strong>t, they also have a global m<strong>in</strong>imum there.


78 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

4<br />

y<br />

2<br />

-4<br />

-3<br />

-2<br />

-1<br />

1<br />

2<br />

3<br />

x<br />

-2<br />

-4<br />

po<strong>in</strong>ts of<br />

m<strong>in</strong>imum<br />

Figure 5.5: Quadratic function with an <strong>in</strong>f<strong>in</strong>ite number of local extrema.<br />

5.2 Least squares<br />

Maybe the most frequent optimization problem <strong>in</strong> applied mathematics is the one<br />

of adjust<strong>in</strong>g a certa<strong>in</strong> l<strong>in</strong>ear model to a data set, <strong>in</strong> such a way that the error is as<br />

small as possible. The simplest case is the follow<strong>in</strong>g: given a list of ordered pairs<br />

of observations (x i , y i ), i = 1, . . . , n, we must f<strong>in</strong>d numbers a <strong>and</strong> b <strong>in</strong> such a way<br />

that the straight l<strong>in</strong>e of equation y = ax + b provides the best adjustment to the<br />

data set (see Figure 5.2).<br />

This problem, known as the problem of l<strong>in</strong>ear regression can be given the<br />

follow<strong>in</strong>g <strong>in</strong>terpretation. We would like, if possible, to f<strong>in</strong>d values of a <strong>and</strong> b such


5.2. LEAST SQUARES 79<br />

y = ax + b<br />

Figure 5.6: L<strong>in</strong>ear regression.<br />

that y i = ax i + b for i = 1, . . . , n. That is, we wish to solve the system of n<br />

equations with two variables<br />

Xβ = Y,<br />

where<br />

( a<br />

β =<br />

b<br />

⎛<br />

)<br />

⎜<br />

, X = ⎝<br />

x 1 1<br />

. .<br />

x n 1<br />

⎞<br />

⎟<br />

⎠ <strong>and</strong> Y =<br />

⎛<br />

⎜<br />

⎝<br />

⎞<br />

y 1<br />

⎟<br />

. ⎠ .<br />

y n<br />

Typically, n is greater than 2 <strong>and</strong>, <strong>in</strong> general, the above system has no solutions.<br />

That is, the vector Y is not the subspace generated by the columns of X. Thus,<br />

we must settle for a vector β for which the error |Y − Xβ| is as small as possible.<br />

That is, we need to m<strong>in</strong>imize |Y − Xβ| (or, equivalently, |Y − Xβ| 2 ) for β ∈ R 2 .<br />

But<br />

|Y − Xβ| 2 = (Y − Xβ) ⊤ (Y − Xβ) = β ⊤ (X ⊤ X)β − 2Y ⊤ Xβ + Y ⊤ Y.<br />

Therefore, the problem is a particular case of the cases we studied <strong>in</strong> Section 5.1.<br />

Clearly, the matrix X ⊤ X is non-negative. Moreover, s<strong>in</strong>ce the space generated<br />

by the columns of X ⊤ X is equal to the space generated by the columns of X, the<br />

vector Y ⊤ X belongs to the subspace generated by the columns of X ⊤ X. Therefore,<br />

f has m<strong>in</strong>imum at every po<strong>in</strong>t β 0 satisfy<strong>in</strong>g<br />

(X ⊤ X)β 0 = Y ⊤ X


80 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

(the so called normal equations of the least square problem). If X ⊤ X is nons<strong>in</strong>gular<br />

(this occurs when the columns of X are l<strong>in</strong>early <strong>in</strong>dependent), then the<br />

system of normal equations has a unique solution, given by<br />

β 0 = (X ⊤ X) −1 Y ⊤ X.<br />

If X ⊤ X is non-s<strong>in</strong>gular, then there exists an <strong>in</strong>f<strong>in</strong>ite number of vectors β 0 satisfy<strong>in</strong>g<br />

the equation (but only one β 0 of m<strong>in</strong>imum length). In fact, we have the<br />

follow<strong>in</strong>g general result.<br />

Theorem 8. Let A be an arbitrary matrix of order m × n. For each b ∈ R m ,<br />

there exists a unique vector x 0 of m<strong>in</strong>imal length such that |Ax 0 − b| is m<strong>in</strong>imum.<br />

Moreover, the transformation that associates to each b the correspondent x 0 is<br />

l<strong>in</strong>ear. That is, there exists a n × m matrix A + such that A + b = x 0 for all b.<br />

Proof. We will use geometric arguments <strong>in</strong> the proof (a more algebraic proof is<br />

provided <strong>in</strong> (Strang, 1988)). In the subspace L = {Ax|x ∈ R n }, there exists a<br />

unique vector y 0 whose distance to b is m<strong>in</strong>imum (y 0 is the orthogonal projection<br />

of b on L) (see Figure 5.2). On the other h<strong>and</strong>, there exists a unique vector x 0 of<br />

m<strong>in</strong>imum length <strong>in</strong> the l<strong>in</strong>ear manifold M(y 0 ) = {x|Ax = y 0 } (x 0 is the orthogonal<br />

projection of the orig<strong>in</strong> over M(y 0 )). Thus we have proved the existence <strong>and</strong><br />

the uniqueness of x 0 . Moreover x 0 is obta<strong>in</strong>ed by the composition of two l<strong>in</strong>ear<br />

transformations: the one that maps b ∈ R m to its orthogonal projection y 0 onto L<br />

<strong>and</strong> the one that maps y 0 to the projection of the orig<strong>in</strong> onto M(y 0 ).<br />

The matrix A + <strong>in</strong> Proposition 8 is called the pseudo-<strong>in</strong>verse of A. When A<br />

is n × n <strong>and</strong> is non-s<strong>in</strong>gular, we have m<strong>in</strong> |Ax 0 − b| = 0, which occurs exactly<br />

at x 0 = A −1 b. Thus, the concept of pseudo-<strong>in</strong>verse generalizes the concept of<br />

<strong>in</strong>verse matrix.<br />

We should remark that the geometric <strong>in</strong>terpretation given above can be related<br />

to the normal equations of the least squares problem. In fact, the normal equations<br />

can be written <strong>in</strong> the form A ⊤ (Ax 0 −b) = 0, which says that x 0 must be chosen <strong>in</strong><br />

such a way that the vector Ax 0 − b is orthogonal to the vector subspace {Ax|x ∈<br />

R n }.


5.2. LEAST SQUARES 81<br />

b<br />

O<br />

y 0<br />

x 0<br />

L = {Ax | x R n }<br />

M(y 0 ) = {x R n | Ax = y 0 }<br />

Figure 5.7: Geometric <strong>in</strong>terpretation of the pseudo-<strong>in</strong>verse.<br />

5.2.1 Solv<strong>in</strong>g least squares problems<br />

As we discussed above, it is natural do model the problem of “solv<strong>in</strong>g” an overconstra<strong>in</strong>ed<br />

system Ax = b with more equations than variables (therefore, <strong>in</strong> general,<br />

without solutions) as the problem of m<strong>in</strong>imiz<strong>in</strong>g the error |Ax − b|. Also, the<br />

result<strong>in</strong>g problem is equivalent to m<strong>in</strong>imiz<strong>in</strong>g a quadratic function <strong>and</strong> therefore<br />

it can be solved us<strong>in</strong>g the system of normal equations (A ⊤ A)x 0 = A ⊤ b.<br />

This assertion, however, hides some difficulties with respect to the computation<br />

of x 0 . It is very common, <strong>in</strong> this type of problem, that, even though the matrix<br />

A ⊤ A is non-s<strong>in</strong>gular, it is “almost” s<strong>in</strong>gular. As a result, the solution of normal<br />

equations us<strong>in</strong>g traditional methods (e.g. Gaussian elim<strong>in</strong>ation) is unstable from<br />

the numerical po<strong>in</strong>t of view (that is, small perturbations <strong>in</strong> b might result <strong>in</strong> huge<br />

changes <strong>in</strong> x 0 ). One way around this difficulty is to use specific methods for<br />

symmetric positive matrices, such as Cholesky’s method (see (Golub & Loan,<br />

1983)). Another alternative, which has the advantage of contemplat<strong>in</strong>g also the<br />

case where A T A is s<strong>in</strong>gular, consists <strong>in</strong> us<strong>in</strong>g the technique of s<strong>in</strong>gular value<br />

decomposition. The follow<strong>in</strong>g theorem, whose proof may be found <strong>in</strong> (Golub &<br />

Loan, 1983), can be <strong>in</strong>terpreted as be<strong>in</strong>g an extension of the Spectral Theorem.<br />

Theorem 9 (S<strong>in</strong>gular value decomposition). If A is a matrix of order m × n (with<br />

m > n) then there exist orthogonal matrices U <strong>and</strong> V of order, respectively, m×m


82 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

<strong>and</strong> n × n, such that A = UΣV , where<br />

⎛<br />

σ 1 · · · 0<br />

. . .. .<br />

0 · · · σ<br />

Σ =<br />

n<br />

0 · · · 0<br />

⎜<br />

⎝ .<br />

. .. .<br />

0 · · · 0<br />

⎞<br />

⎟<br />

⎠<br />

<strong>and</strong> σ 1 ≥ σ 2 ≥ . . . ≥ σ p > 0 = σ p+1 = . . . = σ n<br />

There are efficient methods to compute such a decomposition (see, for <strong>in</strong>stance<br />

(Press et al. , 1988)). This decomposition deals very well with the fact that the<br />

columns of the matrix A might be “almost” l<strong>in</strong>early dependent. This reflects <strong>in</strong><br />

the magnitude of the factors σ i . Thus, we may fix a threshold T such that every<br />

σ i < T is made equal to 0.<br />

The reason for which this decompositon solves the problem of least squares<br />

resides <strong>in</strong> the fundamental fact that orthogonal transformations preserve distance.<br />

In fact, m<strong>in</strong>imiz<strong>in</strong>g |Ax−b| is equivalent to m<strong>in</strong>imiz<strong>in</strong>g |UAx−Ub|. On the other<br />

h<strong>and</strong>, |y| = |V y| for every y. Thus, by substitut<strong>in</strong>g x = V y, the problem reduces<br />

to the one of obta<strong>in</strong><strong>in</strong>g y hav<strong>in</strong>g m<strong>in</strong>imum norm such that |UAV y−Ub| = |Σy−c|<br />

is m<strong>in</strong>imum (where c = Ub). But, because of the particular form of Σ, this<br />

problem is trivial. We must have y i = c i<br />

σ i<br />

, for i = 1, . . . , p <strong>and</strong> y i = 0, para i > p.<br />

Once we have y, the desired solution if obta<strong>in</strong>ed by tak<strong>in</strong>g x = V y.<br />

5.3 Algorithms<br />

In the previous sections we studied unconstra<strong>in</strong>ed optimization problems hav<strong>in</strong>g<br />

quadratic objective function. For those problems, it is possible to apply the optimality<br />

conditions to obta<strong>in</strong> optimal solutions. They are obta<strong>in</strong>ed by solv<strong>in</strong>g, <strong>in</strong><br />

an appropriate form, a system of l<strong>in</strong>ear equations. Moreover, quadratic problems<br />

have the outst<strong>and</strong><strong>in</strong>g quality that the optimal local solutions are also global solutions.<br />

For generic optimization problems, this does not occur: local extreme are<br />

not necessarily global. Even the determ<strong>in</strong>ation of local extrema is complicated,


5.3. ALGORITHMS 83<br />

s<strong>in</strong>ce the optimality conditions generate non-l<strong>in</strong>ear equations, which makes it difficult<br />

to f<strong>in</strong>d the critical po<strong>in</strong>ts. To f<strong>in</strong>d these po<strong>in</strong>ts, one has to resort to iterative<br />

algorithms, almost always <strong>in</strong>spired <strong>in</strong> the behavior of quadratic functions: the basic<br />

ideia is that, close to a local m<strong>in</strong>imum, a function can be approximated, us<strong>in</strong>g<br />

Taylor’s formula, by a quadratic function with the same po<strong>in</strong>t of m<strong>in</strong>imum.<br />

5.3.1 Newton’s method<br />

One of the better known methods to solve optimization problems is Newton’s<br />

method. The fundamental ideia of this method is to replace the function f, to be<br />

m<strong>in</strong>imized or maximized, by a quadratic approximation, based on the first <strong>and</strong><br />

second derivatives of f. More precisely, let f : R n → R be a function of class<br />

C 2 <strong>and</strong> x 0 a po<strong>in</strong>t “sufficiently close” to the desired solution. Taylor’s formula of<br />

second order at x 0 provides a quadratic approximation to f, given by<br />

˜f(x 0 + h) = f(x 0 ) + ∇f(x 0 )h + h ⊤ ∇ 2 f(x 0 )h.<br />

The approximation ˜f has m<strong>in</strong>imum for h = (∇ 2 f(x 0 )) −1 ∇f(x 0 ) ⊤ (provided<br />

that the Hessian matrix ∇ 2 f(x 0 ) is non-negative).<br />

This step h generates a new approximation x 1 = x 0 +h for the optimal solution,<br />

which is the start<strong>in</strong>g po<strong>in</strong>t for the next iteration. Thus, Newton’s method generates<br />

a sequence def<strong>in</strong>ed recursively by<br />

x k+1 = x k + (∇ 2 f(x k )) −1 ∇f(x k ) ⊤ .<br />

It is <strong>in</strong>terest<strong>in</strong>g to observe that the terms of the sequence depend only on the<br />

derivatives of f <strong>and</strong> not on the values of the function f itself. In fact, Newton’s<br />

method tries to f<strong>in</strong>d an stationary po<strong>in</strong>t of f, that is, a po<strong>in</strong>t where the gradient<br />

is null. We have seen that this is a necessary condition, but it is not sufficient, <strong>in</strong><br />

order that f has a m<strong>in</strong>imum at this po<strong>in</strong>t. If the sequence (x k ) converges (which<br />

is not guaranteed) it will converge to a po<strong>in</strong>t where ∇f = 0. But this po<strong>in</strong>t is<br />

not necessarily a m<strong>in</strong>imum of f. What is possible to guarantee is that if the po<strong>in</strong>t<br />

x 0 is choosen “sufficiently close” to a m<strong>in</strong>imum x ∗ of f, then the sequence will<br />

converge to x ∗ , provided that f be sufficiently well-behaved <strong>in</strong> the neighborhood<br />

of x∗. There are several criteria that def<strong>in</strong>e this good behavior. One such criterium<br />

is given by the theorem below, whose proof may be found <strong>in</strong> (Luenberger, 1984).


84 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

Theorem 10. Let f : R n → R be a function of class C 2 . Suppose that f<br />

has a local m<strong>in</strong>imum at x ∗ , <strong>and</strong> there exists a constant M <strong>and</strong> a neighborhood<br />

of x ∗ such that |∇ 2 f(x)| > M <strong>in</strong> this neighborhood. Then there exists<br />

a neighborhood V of x ∗ such that if x 0 ∈ V then the sequence def<strong>in</strong>ed by<br />

x k+1 = x k + (∇ 2 f(x k )) −1 ∇f(x k ) ⊤ converges to x ∗ . Moreover, the rate of convergence<br />

is quadratic, that is, there exists c such that |x k+1 − x ∗ | < c|x k − x ∗ | 2<br />

for every k.<br />

Newton’s method is em<strong>in</strong>ently local <strong>and</strong> cannot be used as a general m<strong>in</strong>imization<br />

method, start<strong>in</strong>g from an arbitrary <strong>in</strong>itial solution. Its use requires that other<br />

methods, such as the ones described below, be used to obta<strong>in</strong> a first approximation<br />

of the desired solution. Once this is done, Newton’s method is an excelent<br />

ref<strong>in</strong>ement method, with an extremely fast convergence to a local extremum.<br />

5.3.2 Unidimensional search algorithms<br />

The fundamental ideia of these algorithms is that of, at each iteration, reduc<strong>in</strong>g the<br />

n-dimensional problem of m<strong>in</strong>imiz<strong>in</strong>g a function f : R n → R to a simpler onedimensional<br />

optimization problem. To do that, they choose a promis<strong>in</strong>g search<br />

direction at the current solution <strong>and</strong> solve the one-dimensional optimization problem<br />

obta<strong>in</strong>ed by restrict<strong>in</strong>g the doma<strong>in</strong> of f to this direction.<br />

As an example, a more robust form of Newton’s method can be obta<strong>in</strong>ed if x k+1<br />

is the po<strong>in</strong>t that m<strong>in</strong>imizes f along the l<strong>in</strong>e conta<strong>in</strong><strong>in</strong>g x k <strong>and</strong> has the direction<br />

(∇ 2 f(x k )) −1 ∇f(x k ) ⊤ provided by the method.<br />

Another natural choice for the search direction is the direction where f has<br />

the greatest local descent rate, that is, the direction along which the directional<br />

derivative of f has the smallest negative value. It is easy to prove that this direction<br />

is given by −∇f. Therefore, from a given po<strong>in</strong>t x k , the next value is found by<br />

tak<strong>in</strong>g<br />

x k+1 = m<strong>in</strong> f(x k + α k d),<br />

where d = −∇f(x k ). This algorithm is known as the steepest descent method.<br />

Depend<strong>in</strong>g on the problem, obta<strong>in</strong><strong>in</strong>g the exact solution of the above problem<br />

might not be practical. In fact, it is possible to prove the convergence of the<br />

method to a stationary po<strong>in</strong>t of f, even if the search of the next solution along d


5.3. ALGORITHMS 85<br />

is <strong>in</strong>exact; it suffices for this that f be “sufficiently reduced” at each iteration (see<br />

(Luenberger, 1984))<br />

The steepest descent method has a great advantage <strong>and</strong> a great disadvantage.<br />

The advantage is that, under reasonable conditions, it has a guaranteed convergence<br />

to a po<strong>in</strong>t where ∇f(x ∗ ) = 0. The disadvantage is that this convergence<br />

may be very slow. This can be easily illustrated by observ<strong>in</strong>g its behavior for<br />

quadratic functions, represented <strong>in</strong> Figure 5.8 for f(x, y) = x 2 + 4y 2 . Evidently,<br />

the m<strong>in</strong>imum of f is atta<strong>in</strong>ed at (0, 0). Start<strong>in</strong>g from an arbitrary pont, the sequence<br />

of solutions determ<strong>in</strong>e an <strong>in</strong>f<strong>in</strong>ite oscilation (<strong>in</strong> each step, the search is<br />

done <strong>in</strong> the direction of the normal to the level curve correspond<strong>in</strong>g to the current<br />

solution, that never conta<strong>in</strong>s the orig<strong>in</strong>).<br />

1<br />

x 0<br />

x 2<br />

x 3<br />

x 1<br />

2<br />

Figure 5.8: Gradient descent.<br />

It is possible to shows that, for quadratic functions of the form f(x) = b ⊤ x +<br />

x ⊤ Hx, with H positive, the sequence generated by the method converges to the<br />

m<strong>in</strong>imum po<strong>in</strong>t x ∗ <strong>in</strong> such a way that<br />

|x k+1 − x ∗ |<br />

|x k − x ∗ |<br />

<<br />

(A − a)2<br />

(A + a) 2<br />

where a <strong>and</strong> A are, respectively, the smallest <strong>and</strong> the greatest eigenvalues of H.


86 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

If the ratio A is large, then the ga<strong>in</strong> <strong>in</strong> each iteration is small, <strong>and</strong> this leads to a<br />

a<br />

slow convergence process.<br />

5.3.3 Conjugate gradient<br />

The conjugate gradient algorithm also employs the idea, present <strong>in</strong> the steepest<br />

descent algorithm, of do<strong>in</strong>g searches along promis<strong>in</strong>g directions, but it tries to<br />

choose these directions <strong>in</strong> such a way that the m<strong>in</strong>imum po<strong>in</strong>t is reached after a<br />

f<strong>in</strong>ite number of steps (at least for quadratic functions).<br />

Aga<strong>in</strong>, we use the two-dimensional case to illustrate these ideas. There exists<br />

a case <strong>in</strong> which the steepest descent algorithm works <strong>in</strong> a f<strong>in</strong>ite number of steps<br />

for a quadratic function: when the level curves are circles, the normal to one of<br />

these curves at any po<strong>in</strong>t passes through the center. Therefore, the algorithm converges<br />

<strong>in</strong> a s<strong>in</strong>gle iteration. When the level curves are ellipses, it is also possible<br />

to do the same, provided that the search direction be choosen not as a direction<br />

perpendicular to the tangent to the level curve, but as a direction conjugate to it.<br />

This can be understood <strong>in</strong> the follow<strong>in</strong>g way: The level curves of a quadratic<br />

function f(x) = x ⊤ (G ⊤ G)x (ellipses, <strong>in</strong> the general case) can be obta<strong>in</strong>ed from<br />

the level curves of f(x) = x ⊤ x (which are circles) by apply<strong>in</strong>g the l<strong>in</strong>ear transformation<br />

x → Gx. Conjugate directions of the ellipse are the images, by G, of<br />

the orthogonal directions (see Figure 5.3.3).<br />

In general, if H is a positive matrix, two directions d 1 <strong>and</strong> d 2 are conjugate<br />

with respect to H when d ⊤ 1 Hd 2 = 0.<br />

Given n mutually conjugate directions of R n , the m<strong>in</strong>imum of a quadratic<br />

function f : R n → R is computed by mak<strong>in</strong>g n one-dimensional searches, each<br />

along one of these directions.<br />

For non-quadratic objective functions, the algorithms uses the Hessian matrix<br />

at x 0 to generate a set of n mutually conjugate directions. After search<strong>in</strong>g along<br />

these directions, a better approximation to optimal solution is found <strong>and</strong> a new set<br />

of directions is found.<br />

The rate of convergence of the algorithm, under similar hyphotesis to those of<br />

Theorem 10 is quadratic, but with a constant greater than <strong>in</strong> Newton’s method.<br />

On the other h<strong>and</strong>, because it <strong>in</strong>corporates <strong>in</strong>formation about the way the function<br />

decreases, the algorithm is more robust <strong>and</strong> it is not necessary to provide an <strong>in</strong>itial<br />

solution too close to the m<strong>in</strong>imum po<strong>in</strong>t.


5.3. ALGORITHMS 87<br />

d<br />

d’<br />

Figure 5.9: d e d ′ are conjugate directions.<br />

5.3.4 Quasi-Newton algorithms<br />

The conjugate gradient algorithm, studied <strong>in</strong> the previous section, puts together<br />

good qualities both of Newton’s method (rate of convergente) <strong>and</strong> steepest descent<br />

(robustness). These features are also found <strong>in</strong> the class of methods known as<br />

quasi-Newton methods.<br />

These algorithms produce a sequence of solutions of the form<br />

x k+1 = x k − α k S k ∇f(x k ) ⊤ ,<br />

where S k is a symmetric matrix <strong>and</strong> α k is usually choosen <strong>in</strong> such a way as to<br />

m<strong>in</strong>imize f(x k+1 ). Both Newton’s method (for which S k = (∇ 2 f(x k )) −1 ), <strong>and</strong><br />

steepest descent (where S k = I) are particular cases of this general form.<br />

An important aspect that is considered <strong>in</strong> the construction of these algorithms<br />

is the fact that it may be impractical to compute the Hessian matrix of f (for<br />

example, <strong>in</strong> the cases where f is not known analytically). Thus, the methods<br />

of this family usually construct approximations to the Hessian matrix based on<br />

the values of f <strong>and</strong> of ∇f <strong>in</strong> the last elements of the iteration sequence (see<br />

(Luenberger, 1984) for details.


88 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

5.4 Constra<strong>in</strong>ed optimization<br />

In this section we briefly discuss cont<strong>in</strong>uous optimization problems with constra<strong>in</strong>ts<br />

def<strong>in</strong>ed by equations or <strong>in</strong>equalities. That is, problems of the form<br />

m<strong>in</strong> f(x)<br />

subject to h i (x) = 0, i = 1, . . . , m<br />

g j (x) ≤ 0, j = 1, . . . , p<br />

where every function is smooth (tipically of class C 2 ).<br />

5.4.1 Optimality conditions<br />

The central idea of the first order necessary conditions, <strong>in</strong> the unconstra<strong>in</strong>ed case,<br />

is that a local m<strong>in</strong>imum occurs at po<strong>in</strong>ts where every direction is stationary, that is,<br />

those po<strong>in</strong>ts where ∇f(x).d = 0 for every d, which is equivalent to the condition<br />

∇f(x) = 0. In the proof, we argued that, if f were not stationary along some<br />

direction d, we could obta<strong>in</strong> values smaller than f(x) by mov<strong>in</strong>g along d or −d.<br />

When the problem has constra<strong>in</strong>ts, there may be non-stationary directions at a<br />

local m<strong>in</strong>imum po<strong>in</strong>t. Consider, for <strong>in</strong>stance, the case where we wish to m<strong>in</strong>imize<br />

f(x) subject to the s<strong>in</strong>gle constra<strong>in</strong>t h(x) = 0. The feasible solutions for this<br />

problem are the po<strong>in</strong>ts of a surface S of R n (Figure 5.4.1). A po<strong>in</strong>t x 0 can be<br />

a local m<strong>in</strong>imum of f restricted to S without hav<strong>in</strong>g ∇f(x 0 ) = 0. In fact, the<br />

component of ∇f(x 0 ) <strong>in</strong> the direction normal to the plane is irrelevant for the<br />

optimality of x 0 because, locally, x cannot move along this direction. Thus, we<br />

may have optimality, provided that the projection of ∇f(x 0 ) onto the tangent<br />

plane to S at x 0 is null (that is, when we have ∇f(x 0 ) = λ∇h(x 0 ) for some real<br />

number λ).<br />

Suppose now that we have a s<strong>in</strong>gle constra<strong>in</strong>t of the form g(x) ≤ 0. Aga<strong>in</strong>, it is<br />

possible to have optimality at x 0 with f(x 0 ) ≠ 0. This can occur only if g(x 0 ) = 0,<br />

that is, if x 0 is a po<strong>in</strong>t of the boundary surface of the region def<strong>in</strong>ed by g(x 0 ) ≤<br />

0 (we say <strong>in</strong> this case that the <strong>in</strong>equality constra<strong>in</strong>t is active at x 0 ). Moreover,<br />

∇f(x 0 ) must be of the form λ∇g(x 0 ). But now λ must be greater then or equal


5.4. CONSTRAINED OPTIMIZATION 89<br />

f(x 0 )<br />

x 0<br />

h(x 0 )<br />

h(x 0 ) = 0<br />

Figure 5.10: Equality constra<strong>in</strong>t.<br />

to zero: on the contrary, it would be possible to f<strong>in</strong>d solutions whose values are<br />

smaller then f(x 0 ) by mov<strong>in</strong>g along the direction −∇g(x 0 ) (see Figure 5.4.1).<br />

These ideas constitute the basis for the demonstration of the theorem that follows,<br />

which provides first order necessary conditions for the general case of m<br />

equality constra<strong>in</strong>ts <strong>and</strong> p equality constra<strong>in</strong>ts. We say that x 0 is a regular po<strong>in</strong>t<br />

for these constra<strong>in</strong>ts when the vectors ∇f i (x 0 ) (i = 1, . . . , n) <strong>and</strong> ∇g j (x 0 ) (j<br />

such that g j (x 0 ) = 0) are l<strong>in</strong>early <strong>in</strong>dependent.<br />

Theorem 11 (Kuhn-Tucker condition). Let x 0 be a relative m<strong>in</strong>imum po<strong>in</strong>t to the<br />

problem<br />

m<strong>in</strong><br />

f(x)<br />

subject to h i (x) = 0, i = 1, . . . , m<br />

g j (x) ≤ 0,<br />

j = 1, . . . , p<br />

<strong>and</strong> suppose that x 0 is a regular po<strong>in</strong>t for these constra<strong>in</strong>ts. Then there exists a


90 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

f(x 0 )<br />

x 0<br />

g(x 0 )<br />

g(x 0 ) 0<br />

Figure 5.11: Inequality constra<strong>in</strong>t.<br />

vector λ ∈ R m <strong>and</strong> a vector µ ∈ R p , with µ ≥ 0, such that<br />

n∑<br />

p∑<br />

∇f(x 0 ) + λ i ∇h i (x 0 ) + µ j ∇g j (x 0 ) = 0<br />

i=1<br />

j=1<br />

p∑<br />

µ j g j (x 0 ) = 0<br />

The theorem above generalizes the classical theory of Lagrange multipliers for<br />

the case where <strong>in</strong>equality constra<strong>in</strong>ts are present.<br />

Also, there are theorems that provide second order necessary <strong>and</strong> sufficient<br />

conditions to ensure that a given po<strong>in</strong>t x 0 is a local m<strong>in</strong>imum. In the same way<br />

as <strong>in</strong> the unconstra<strong>in</strong>ed case, such conditions refer to the non-negativeness (or the<br />

positiveness) of a certa<strong>in</strong> Hessian matrix.<br />

In the constra<strong>in</strong>ed case, those conditions apply to the Hessian of the lagrangian<br />

function, given by<br />

n∑<br />

p∑<br />

L(x) = f(x) + λ i h i (x) + µ j g j (x 0 )<br />

i=1<br />

j=1<br />

j=1


5.4. CONSTRAINED OPTIMIZATION 91<br />

at the po<strong>in</strong>t x 0 (where the λ i <strong>and</strong> the µ j are the multipliers provided by the conditions<br />

of first order) <strong>and</strong> the non-negativeness (or positiveness) is imposed only<br />

for directions of the tangent space, at x 0 , to the surface determ<strong>in</strong>ed by the active<br />

constra<strong>in</strong>ts at this po<strong>in</strong>t (see (Luenberger, 1984) for details).<br />

5.4.2 Least squares with l<strong>in</strong>ear constra<strong>in</strong>ts<br />

We use the least squares problem with l<strong>in</strong>ear constra<strong>in</strong>ts to exemplify the use of<br />

the above theorem. Let us consider the follow<strong>in</strong>g problem:<br />

m<strong>in</strong> |Ax − b|<br />

subject to Cx = D,<br />

where A is m × n (with m > n) <strong>and</strong> C is p × n (with p < n).<br />

In this problem, there exists a set of equations that must be satisfied exactly<br />

<strong>and</strong> another set of equations that will be satisfied <strong>in</strong> the best posible way.<br />

M<strong>in</strong>imiz<strong>in</strong>g |Ax−b| is equivalent to m<strong>in</strong>imiz<strong>in</strong>g |Ax−b| 2 , that is, the problem<br />

is equivalent to<br />

m<strong>in</strong><br />

subject to<br />

x ⊤ (A ⊤ A)x − 2b ⊤ Ax + b ⊤ b<br />

Cx = D<br />

In order that x 0 be a relative m<strong>in</strong>imum po<strong>in</strong>t, it must satisfy the first order<br />

conditions. In this case, where there are only equality constra<strong>in</strong>ts, they say that<br />

there must exist a vector λ = (λ 1 , λ 2 , . . . , λ p ) ⊤ such that<br />

(A ⊤ A)x 0 − A ⊤ b + C ⊤ λ = 0.<br />

S<strong>in</strong>ce x 0 must also satisfy the equality constra<strong>in</strong>ts, we obta<strong>in</strong> the system<br />

( ) ( ( )<br />

A ⊤ A C ⊤ x A<br />

=<br />

C 0 λ)<br />

⊤ b<br />

.<br />

d<br />

It is possible to prove that, provided that the rows of C are l<strong>in</strong>early <strong>in</strong>dependent,<br />

the above system always has at least one solution x 0 , obta<strong>in</strong>ed by f<strong>in</strong>d<strong>in</strong>g the


92 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

closest po<strong>in</strong>t to b among the po<strong>in</strong>ts of the l<strong>in</strong>ear manifold {Ax|Cx = d}. As it<br />

happens <strong>in</strong> the unconstra<strong>in</strong>ed case, such solutions are, <strong>in</strong> fact, global m<strong>in</strong>ima of<br />

the least-squares problem.<br />

In order to effectively solve the above system, it is advisable, aga<strong>in</strong>, to use<br />

the S<strong>in</strong>gular Value Decomposition of A ⊤ A, to avoid the numerical <strong>in</strong>stability that<br />

orig<strong>in</strong>ates from the possible “quasi-s<strong>in</strong>gularity”of A ⊤ A.<br />

5.4.3 Algorithms<br />

There is a huge variety of algorithms for cont<strong>in</strong>uous optimization problems with<br />

constra<strong>in</strong>ts. Some of these algorithms are general, <strong>and</strong> they are capable of deal<strong>in</strong>g<br />

with any comb<strong>in</strong>ation of constra<strong>in</strong>ts by equalities or <strong>in</strong>equalities. Others try to<br />

exploit the nature of the constra<strong>in</strong>ts – equality or <strong>in</strong>equality – <strong>and</strong> properties of<br />

the functions that def<strong>in</strong>e them (<strong>and</strong> those of the objective function).<br />

In what follows we will describe two classes of algorithms for optimization<br />

with constra<strong>in</strong>ts. As we will see, these algorithms exploit the follow<strong>in</strong>g ideas:<br />

• reduce the constra<strong>in</strong>ed problem to a problem without constra<strong>in</strong>ts (or a sequence<br />

of problems of this k<strong>in</strong>d), <strong>in</strong> such a way as to allow the use of the<br />

algorithms previously described.<br />

• adapt an algorithm for unconstra<strong>in</strong>ed optimization to take constra<strong>in</strong>ts under<br />

consideration.<br />

Penalty <strong>and</strong> barrier methods<br />

Methods <strong>in</strong> this class exploit the first idea above, by modify<strong>in</strong>g the objective function<br />

<strong>in</strong> such a way as to take constra<strong>in</strong>ts <strong>in</strong>to account.<br />

Penalty methods are more general than barrier methods <strong>and</strong> work with both<br />

equality <strong>and</strong> <strong>in</strong>equality constra<strong>in</strong>ts. Moreover, they do not require (<strong>in</strong> contrast to<br />

other methods) that we know a feasible <strong>in</strong>itial solution. In fact, penalty methods<br />

constitute a good method to f<strong>in</strong>d such a feasible <strong>in</strong>itial solution.<br />

The basic ideia of the method consists <strong>in</strong>, for each constra<strong>in</strong>t, <strong>in</strong>troduce <strong>in</strong> the<br />

objective function a penalty term that discourages the violation of this constra<strong>in</strong>t.<br />

For a constra<strong>in</strong>t of the form h(x) = 0, we <strong>in</strong>troduce a penalty of the form ch(x) 2 ;


5.4. CONSTRAINED OPTIMIZATION 93<br />

for a constra<strong>in</strong>t of the form g(x) ≤ 0, a penalty of the form c max{0, g(x) 2 } is<br />

appropriate. In this way, for the constra<strong>in</strong>ed problem<br />

m<strong>in</strong> f(x)<br />

subject to h i (x) = 0, i = 1, . . . , m<br />

g j (x) ≤ 0, j = 1, . . . , p<br />

we obta<strong>in</strong> an unconstra<strong>in</strong>ed problem of the form:<br />

( m<br />

)<br />

∑<br />

p∑<br />

m<strong>in</strong> q c (x) = f(x) + c h(x) 2 + max(0, g(x) 2 )<br />

i=1<br />

j=1<br />

(5.1)<br />

The simple <strong>in</strong>troduction of penalties <strong>in</strong> the objective function does not guarantee<br />

that the result<strong>in</strong>g problem has an optimal solution satisfy<strong>in</strong>g the constra<strong>in</strong>ts.<br />

The strategy of the penalty method is to solve a sequence of problems as the one<br />

above, with c tak<strong>in</strong>g the values of an <strong>in</strong>creas<strong>in</strong>g sequence (c k ) such that c k → ∞.<br />

The follow<strong>in</strong>g theorem guarantees that this strategy works.<br />

Theorem 12. Let (c k ) be an <strong>in</strong>creas<strong>in</strong>g sequence such that c k → ∞. For each k<br />

let x k = arg m<strong>in</strong> q ck (x). Then, every adherence po<strong>in</strong>t of 1 of (x k ) is a solution of<br />

5.1.<br />

Barrier methods are quite similar to penalty methods. The difference is that<br />

they operate <strong>in</strong> the <strong>in</strong>terior of the solution set. For this reason, they are tipically<br />

used for problems with <strong>in</strong>equality constra<strong>in</strong>ts.<br />

The idea is to add, for each <strong>in</strong>equality g(x) ≤ 0, a term <strong>in</strong> the objective function<br />

that avoids that the solutions get too close to the surface g(x) = 0. The<br />

<strong>in</strong>troduction of this term, which can be, for <strong>in</strong>stance, of the form − c , works as<br />

g(x)<br />

a barrier that avoids this approximation. The result<strong>in</strong>g problem is given by<br />

m<strong>in</strong> q c (x) = f(x) − c ∑ p 1<br />

j=1 g(x)<br />

subject to g j (x) ≤ 0, j = 1, . . . , m<br />

In spite of the fact that the constra<strong>in</strong>ts are still present <strong>in</strong> the formulation of<br />

the problem, they may be ignored if the optimization method used does searches<br />

1 An adherence po<strong>in</strong>t of a sequence is the limit of any of its subsequences.


94 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

start<strong>in</strong>g from a solution that belongs to the <strong>in</strong>terior of the solution set S: the barrier<br />

<strong>in</strong>troduced <strong>in</strong> the objective function avoids, automatically, that it moves outside<br />

the <strong>in</strong>terior of S.<br />

Aga<strong>in</strong>, it is not enough to solve the above problem; it is necessary to consider<br />

a sequence (c k ) of values of c. Here, however, we must take this sequence <strong>in</strong><br />

such a way that it is decreas<strong>in</strong>g <strong>and</strong> has limit 0 (this is equivalent to remov<strong>in</strong>g,<br />

progressively, the barriers, <strong>in</strong> such a way as to <strong>in</strong>clude all of the set S). As <strong>in</strong> the<br />

case of penalty methods, the sequence of solutions (x k ) generated <strong>in</strong> the process<br />

have the property that its adherence po<strong>in</strong>ts are solutions to the orig<strong>in</strong>al problem.<br />

Penalty methods <strong>and</strong> barrier methods have the virtue of be<strong>in</strong>g very general<br />

methods <strong>and</strong> relatively simple to be implemented, because they use optimization<br />

methods without constra<strong>in</strong>ts. Nevertheless, their convergence is tipically slow.<br />

Projected gradient methods<br />

Here, we consider only problems with equality constra<strong>in</strong>ts:<br />

m<strong>in</strong> f(x)<br />

subject to h i (x) = 0, i = 1, . . . , m<br />

(x ∈ R n )<br />

(5.2)<br />

In pr<strong>in</strong>ciple, the solution set S is a manifold of dimension n − m <strong>and</strong>, ideally,<br />

the above problem could be reduced to an unconstra<strong>in</strong>ed problem of the same<br />

dimension.<br />

One of the few cases where this can be done <strong>in</strong> practice is when the constra<strong>in</strong>ts<br />

are l<strong>in</strong>ear. In fact, if the constra<strong>in</strong>ts are of the form Cx = d (where C is a matrix<br />

m × n of rank m), S is a l<strong>in</strong>ear manifold of dimension n, <strong>and</strong> it can be expressed<br />

<strong>in</strong> the form<br />

S = {P y + q|y ∈ R n−m }<br />

We obta<strong>in</strong>, therefore, the equivalent unconstra<strong>in</strong>ed problem of m<strong>in</strong>imiz<strong>in</strong>g<br />

g(y) = f(P y + q). The derivative of the new objective function may, naturally,<br />

be obta<strong>in</strong>ed from that of f by us<strong>in</strong>g the cha<strong>in</strong> rule.<br />

The above ideas are useful even when the constra<strong>in</strong>ts are not l<strong>in</strong>ear. Let x 0 be<br />

a feasible solution of 5.2 <strong>and</strong> consider the plane H, tangent to S at x 0 , which can<br />

be def<strong>in</strong>ed by:<br />

H = {P y + x 0 |y ∈ R n−m }


5.4. CONSTRAINED OPTIMIZATION 95<br />

where the columns of P constitute a basis for the orthogonal complement of the<br />

subspace generated by ((∇h 1 (x 0 )) ⊤ , . . . (∇h m (x 0 )) ⊤ ).<br />

The plane H can be <strong>in</strong>terpreted as an approximation to S. On the other h<strong>and</strong>,<br />

the problem of m<strong>in</strong>imiz<strong>in</strong>g f on H can be reduced to an unconstra<strong>in</strong>ed problem,<br />

as seen above. The steepest descent algorithm, applied to this problem, performs a<br />

search along the direction given by the projection, onto H, of the gradient vector <strong>in</strong><br />

x 0 . The new solution x ′ thus obta<strong>in</strong>ed does not satisfy, <strong>in</strong> general, the constra<strong>in</strong>ts<br />

h i (x) = 0 but is near the manifold S. From x ′ a new feasible solution x 1 can be<br />

computed (for example, us<strong>in</strong>g a one-dimensional search <strong>in</strong> the direction normal<br />

to S, as illustrated <strong>in</strong> Figure 5.12. The process is, then, iterated, orig<strong>in</strong>at<strong>in</strong>g the<br />

projected gradient algorithm, which has convergence properties similar to those<br />

of the steepest descent for the unconstra<strong>in</strong>ed case.<br />

Other unconstra<strong>in</strong>ed optimization algorithms (for <strong>in</strong>stance, the conjugate gradient<br />

method) can be similarly adapted to operate on the tangent space.<br />

h(x 0 ) f(x 0 )<br />

x’<br />

x 0<br />

x 1<br />

h(x 0 ) = 0<br />

Figure 5.12: Projected gradient algorithm.


96 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

5.5 L<strong>in</strong>ear programm<strong>in</strong>g<br />

Now we will study a class of optimization problems – called l<strong>in</strong>ear programs<br />

– which have, simultaneously, characteristics of cont<strong>in</strong>uous <strong>and</strong> comb<strong>in</strong>atorial<br />

optimization. A l<strong>in</strong>ear program is a problem of the form<br />

m<strong>in</strong><br />

subject to<br />

∑ n ∑ j=1 c jx j<br />

n<br />

j=1 a ijx j ≤ b i , i = 1, . . . m<br />

That is, l<strong>in</strong>ear programs ask for m<strong>in</strong>imiz<strong>in</strong>g a l<strong>in</strong>ear function subject to l<strong>in</strong>ear<br />

<strong>in</strong>equalities. In fact the constra<strong>in</strong>ts can be constituted by any comb<strong>in</strong>ation of l<strong>in</strong>ear<br />

equalities <strong>and</strong> <strong>in</strong>equalites: it is always possible to convert such a problem <strong>in</strong>to<br />

another equivalent one of the above form. As an example, an equation of the form<br />

n∑<br />

a j x j = b<br />

j=1<br />

can be replaced by the pair of <strong>in</strong>equalities<br />

n∑<br />

a j x j ≤ b;<br />

j=1<br />

n∑<br />

a j x j ≥ b.<br />

j=1<br />

L<strong>in</strong>ear programs are clearly cont<strong>in</strong>uous optimization programs with <strong>in</strong>equality<br />

constra<strong>in</strong>ts <strong>and</strong> can be solved by iterative algorithms such as those described <strong>in</strong><br />

Section 5.4. In fact, algorithms of this form, specially formulated for l<strong>in</strong>ear programs,<br />

produce some of the most efficient methods to solve them – the so called<br />

<strong>in</strong>terior po<strong>in</strong>t methods.<br />

Nevertheless, the first succesful approach to l<strong>in</strong>ear programs – mak<strong>in</strong>g it possible<br />

to solve problems hav<strong>in</strong>g hundreds of thous<strong>and</strong>s of variables – was comb<strong>in</strong>atorial<br />

<strong>in</strong> nature. This work was done, <strong>in</strong> the 50’s, by Dantzig, who <strong>in</strong>troduced<br />

perhaps the most famous optimization algorithm – the simplex method.


5.5. LINEAR PROGRAMMING 97<br />

In order to use the simplex method it is more <strong>in</strong>terest<strong>in</strong>g to write a l<strong>in</strong>ear program<br />

<strong>in</strong> the canonical form below:<br />

or, us<strong>in</strong>g matrix notation,<br />

max ∑ c j x j<br />

subject to ∑ a ij x j = b i ,<br />

x j ≥ 0<br />

max cx<br />

subject to Ax = b<br />

x ≥ 0<br />

i = 1, . . . , m<br />

where A is a m × n matrix.<br />

In general, we suppose that the rank of A is equal to m (that is, the rows of A<br />

are l<strong>in</strong>early <strong>in</strong>dependent). In this case, {x|Ax = b} is a l<strong>in</strong>ear manifold (i.e. the<br />

translated of a vector subspace) of dimension n − m.<br />

The fact that l<strong>in</strong>ear programs have an important comb<strong>in</strong>atorial structure follows<br />

from the fact that the solution set S = {x|Ax = b, x ≥ 0} is a convex polyhedra<br />

<strong>and</strong> that all relevant <strong>in</strong>formation for the optimization of a l<strong>in</strong>ear function is<br />

concentrated on the vertices of the polyhedra.<br />

We say that x is a vertex of S when x is not an <strong>in</strong>terior po<strong>in</strong>t of any l<strong>in</strong>e segment<br />

with extremes <strong>in</strong> S. That is, x = λx 1 + (1 − λ)x 2 implies that x = x 1 = x 2 , for<br />

x 1 , x 2 ∈ S, <strong>and</strong> 0 < λ < 1.<br />

The importance of vertices <strong>in</strong> the solution of l<strong>in</strong>ear programs is given by the<br />

follow<strong>in</strong>g theorem.<br />

Theorem 13. If a l<strong>in</strong>ear program has an optimal solution, then it has a vertex<br />

which is an optimal solution (see Figure 5.5).<br />

The geometric notion of vertex has an algebraic equivalent. We say that x is a<br />

basic solution of S = {x|Ax = b}, when:<br />

1. Ax = b.<br />

2. The columns j of A such that x j > 0 are l<strong>in</strong>early <strong>in</strong>dependent (they may,<br />

therefore, be extended to a basis of the space generated by the columns of<br />

A).


98 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

c<br />

P<br />

Figure 5.13: Optimal solutions of l<strong>in</strong>ear programs.<br />

Theorem 14. x is a vertex if, <strong>and</strong> only if, x is a feasible basic solution.<br />

A consequence of the above theorems is that <strong>in</strong> order to solve a l<strong>in</strong>ear program<br />

it suffices to exam<strong>in</strong>e each subset of m columns of A. We have, therefore, an<br />

naive algorithm to solve a l<strong>in</strong>ear program, that consists of the follow<strong>in</strong>g steps:<br />

1. Select m columns of A <strong>and</strong> verify if they constitute a basis;<br />

2. Compute the solution correspond<strong>in</strong>g to the previous step by attribut<strong>in</strong>g the<br />

value 0 to the other variables;<br />

3. Verify if the solution obta<strong>in</strong>ed is feasible.<br />

This sequence of steps must be repeated for every possible basis <strong>and</strong>, among<br />

them, the one that produces the greatest value is the optimal solution. Note that<br />

there are ( n<br />

m)<br />

possible bases, that is, approximately n m alternatives to be compared.<br />

Thus, this algorithm, although f<strong>in</strong>ite, is clearly <strong>in</strong>efficient <strong>and</strong> it is not useful<br />

to solve real problems of l<strong>in</strong>ear programm<strong>in</strong>g. In what follows we describe the<br />

simplex method, which also restricts attention to the basic po<strong>in</strong>ts. However, it<br />

uses the adjacency structure of the polyhedra vertices, produc<strong>in</strong>g a very efficient<br />

algorithm for practical problems.


5.5. LINEAR PROGRAMMING 99<br />

5.5.1 Simplex algorithm for l<strong>in</strong>ear programs<br />

The simplex algorithm generates a sequence of vertices of the solution set S =<br />

{x|Ax = b}, always decreas<strong>in</strong>g the value of f(x) = cx , until if f<strong>in</strong>ds the m<strong>in</strong>imum<br />

value or obta<strong>in</strong>s a proof the the problem is unbounded.<br />

The algorithm consists <strong>in</strong> execut<strong>in</strong>g, repeatedly, the follow<strong>in</strong>g steps:<br />

1. Verify if the current basic solution x is optimal. (For this, we just have to<br />

verify if f(x) ≥ f(x) for all of the vertices adjacent to x. If this occurs, x<br />

it is optimal. If not, we execute the follow<strong>in</strong>g step.)<br />

2. Move along an edge on which the objective function decreases until it hits<br />

a new vertex. (If no vertex is found when mov<strong>in</strong>g over an edge where f is<br />

decreas<strong>in</strong>g, the problem is unbounded.)<br />

As we have already seen, each basic solution is associated to a set of l<strong>in</strong>early <strong>in</strong>dependent<br />

columns of A. The variable correspond<strong>in</strong>g to these columns are called<br />

basic variables for the solution.<br />

The simplex method ma<strong>in</strong>ta<strong>in</strong>s an algebraic representation of the system Ax =<br />

b where the non-basic variables are expressed <strong>in</strong> terms of basic variables. With<br />

this representation it is simple to execute, at each step, the optimality test <strong>and</strong><br />

the passage to an adjacent vertex. Once this is done, it is necessary to update<br />

the system representation, <strong>in</strong> such a way that the non-basic variables are aga<strong>in</strong><br />

expressed <strong>in</strong> terms of the basic variables. This updat<strong>in</strong>g – called pivot<strong>in</strong>g – is<br />

executed us<strong>in</strong>g elementary operations on the equations of the system Ax = b (see<br />

(Chvatal, 1983) for details).<br />

The simplex algorithm needs an <strong>in</strong>itial feasible basic solution, which is not<br />

always obvious. However, it is <strong>in</strong>terest<strong>in</strong>g to remark, such a solution may be<br />

found by solv<strong>in</strong>g another l<strong>in</strong>ear program (us<strong>in</strong>g aga<strong>in</strong> the simplex method) for<br />

which a feasible basic solution is immediately available. For example, to obta<strong>in</strong> a<br />

feasible basic solution of S = {x|Ax = b}, we may solve the problem below, <strong>in</strong><br />

which we <strong>in</strong>troduce artificial variables x a :<br />

m<strong>in</strong> 1 · x a (i.e. ∑ xa i )<br />

Ax + Ix a = b<br />

x, x a ≥ 0


100 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

This second problem has an obvious feasible basic solution given by x = 0<br />

<strong>and</strong> x a = (1, 1, ..., 1). On the other h<strong>and</strong>, it always has an optimal solution,<br />

because 0 is a lower bound of the objective function. If the value of this optimal<br />

solution is 0, all of the artificial variables x a have value 0 <strong>and</strong>, therefore, the nonartificial<br />

variables provide a feasible solution for the orig<strong>in</strong>al problem. Otherwise,<br />

the orig<strong>in</strong>al problem does not have feasible solutions.<br />

5.5.2 The complexity of the simplex method<br />

In the worst case, the simplex method may dem<strong>and</strong> an exponential number of<br />

iterations to solve a l<strong>in</strong>ear program. In practice, however, the number of iterations<br />

is l<strong>in</strong>ear <strong>in</strong> m. On the other h<strong>and</strong>, the algebraic effort <strong>in</strong> each step has complexity<br />

O(mn), which makes the method practical even for problems with a huge number<br />

of variables (large scale problems).<br />

The search<strong>in</strong>g for guaranteed polynomial algorithms for l<strong>in</strong>ear programs was,<br />

dur<strong>in</strong>g some decades, one of the most important open problems <strong>in</strong> optimization.<br />

The question was answered <strong>in</strong> a satisfactory way <strong>in</strong> the late 70’s, when Khachian<br />

((Khachian, 1979)) proved that the so called ellipsoid algorithm, <strong>in</strong>itially developed<br />

to solve non-l<strong>in</strong>ear problems, could be used to solve l<strong>in</strong>ear programs <strong>in</strong> polynomial<br />

time. In the subsequent years, other algorithms were discovered that gave<br />

the same result. The most famous among them is Karmarkar’s algorithm ((Karmarkar,<br />

1984)). These algorithms, <strong>in</strong>stead of mov<strong>in</strong>g along the boundary of the<br />

solution set polyhedra S = {x|Ax = b}, produce a sequence of po<strong>in</strong>ts <strong>in</strong> the<br />

<strong>in</strong>terior of S that converge to an optimal solution.<br />

The l<strong>in</strong>ear programs of greatest <strong>in</strong>terest for computer graphics are, <strong>in</strong> general,<br />

of dimension 2 or 3. It is <strong>in</strong>terest<strong>in</strong>g to observe that that such problems may be<br />

solved <strong>in</strong> l<strong>in</strong>ear time ((Meggido, 1984)). In fact, there exist l<strong>in</strong>ear algorithms<br />

to solve l<strong>in</strong>ear programs with any number of variables, provided that this number<br />

is assumed constant. However, such algorithms are practical only for low<br />

dimensions (<strong>and</strong> even <strong>in</strong> these cases, <strong>in</strong> practice the simplex method presents a<br />

comparable performance).


5.6. APPLICATIONS IN GRAPHICS 101<br />

5.6 Applications <strong>in</strong> <strong>Graphics</strong><br />

In this section we will discuss problems <strong>in</strong> computer graphics which can be posed<br />

as cont<strong>in</strong>uous optimization problems. The list of problems we will cover is:<br />

• Camera calibration;<br />

• Registration of an image sequence;<br />

• Color correction of an image sequence;<br />

• Real time walktrough.<br />

5.6.1 Camera calibration<br />

Problems <strong>in</strong>volv<strong>in</strong>g the estimation of parameters are naturally posed as a cont<strong>in</strong>uous<br />

optimization problem, where the objective is to determ<strong>in</strong>e the set of parameters<br />

that m<strong>in</strong>imize the adjustment error. One important example <strong>in</strong> computer<br />

graphics is the problem of calibrat<strong>in</strong>g a camera, that is, estimat<strong>in</strong>g the <strong>in</strong>tr<strong>in</strong>sic<br />

(optical characteristics of its optical system) <strong>and</strong> extr<strong>in</strong>sic (position <strong>and</strong> orientation)<br />

parameters.<br />

The simplest camera model is the one that has no optical system (no lens):<br />

it corresponds to the rudimentar model of the p<strong>in</strong>hole camera where light enters<br />

through a small hole <strong>in</strong> a box, project<strong>in</strong>g itself onto a plane, as illustrated<br />

<strong>in</strong> Figure 5.14. The produced image is simply a perspective projection of the<br />

three-dimensional scene. This simple camera model is the basic model used <strong>in</strong><br />

computer graphics to def<strong>in</strong>e the virtual camera for produc<strong>in</strong>g synthetic images.<br />

For this reason, applications that <strong>in</strong>volve capture <strong>and</strong> synthesis of images commonly<br />

use this model.<br />

The distance between the optical center of the camera <strong>and</strong> the plane of projection<br />

is called the focal distance, <strong>and</strong> it will be denoted by f. This camera model<br />

has no other <strong>in</strong>tr<strong>in</strong>sic parameter other than the focal distance, <strong>and</strong> its perspective<br />

projection is characterized by this parameter <strong>and</strong> a set of extr<strong>in</strong>sic parameters. In<br />

order to identify these parameters it is convenient to consider three reference systems<br />

(Figure 5.15): a system Oxyz of world coord<strong>in</strong>ates, used to reference the<br />

position of po<strong>in</strong>ts <strong>in</strong> space; a system C ′ x ′ y ′ z ′ of camera coord<strong>in</strong>ates, with orig<strong>in</strong>


102 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

Figure 5.14: The p<strong>in</strong>hole camera.<br />

at the optical center <strong>and</strong> axes aligned with the the camera pr<strong>in</strong>cipal directions (that<br />

is, the directions of the image boundary <strong>and</strong> the normal to the image plane); <strong>and</strong><br />

a system of two-dimensional coord<strong>in</strong>ates, called image coord<strong>in</strong>ate system Cuv,<br />

obta<strong>in</strong>ed by project<strong>in</strong>g the camera reference system on the plane of the virtual<br />

screen.<br />

The camera extr<strong>in</strong>sic parameters may, then, be expressed us<strong>in</strong>g the the rotation<br />

matrix R <strong>and</strong> the translation t which express the coord<strong>in</strong>ate transformation<br />

between the world coord<strong>in</strong>ate system <strong>and</strong> the camera coord<strong>in</strong>ate system. It is important<br />

to observe that, although the matrix R has 9 elements, it has only 3 degrees<br />

of freedom, because it is an orthogonal matrix.<br />

Thus for a given camera with parameters f, t e R, the function that provides<br />

the image m = (u, v) of a po<strong>in</strong>t M = (x, y, z) of space can be computed <strong>in</strong> two<br />

steps.<br />

• In the camera reference system, the coord<strong>in</strong>ates of M are given by por<br />

⎛ ⎞ ⎛ ⎞<br />

x ′ x<br />

⎝y ′ ⎠ = R ⎝y⎠ + t<br />

z ′ z<br />

• In the camera reference system, the image of a po<strong>in</strong>t (x ′ , y ′ , z ′ ) is given by<br />

u = fx′<br />

z ′ ,<br />

v = fy′<br />

z ′


5.6. APPLICATIONS IN GRAPHICS 103<br />

Image plane<br />

z'<br />

M<br />

y'<br />

C<br />

v<br />

u<br />

m<br />

C’<br />

x’<br />

f<br />

y<br />

z<br />

x<br />

O<br />

Figure 5.15: Camera coord<strong>in</strong>ate systems.<br />

Compos<strong>in</strong>g the two transformations, we obta<strong>in</strong> the follow<strong>in</strong>g equation:<br />

⎛ ⎞<br />

⎛ ⎞ ⎛ ⎞ x<br />

u r 1 t x<br />

⎝v<br />

⎠ = ⎝ r 2 t y<br />

⎠ ⎜y<br />

⎟<br />

⎝z⎠ ,<br />

w fr 3 ft z<br />

1<br />

which provides, <strong>in</strong> homogeneous coord<strong>in</strong>ates, the image (u, v, w) of a po<strong>in</strong>t<br />

(x, y, z, 1) of the space by a camera with parameters f, t <strong>and</strong> R. In the above<br />

equation, r 1 , r 2 <strong>and</strong> r 3 are the vectors that constitute the rows of the rotation matrix<br />

R. These equations are used, for example, to generate the projection, <strong>and</strong><br />

generate a synthetic image of a tridimensional scene.<br />

Suppose now that we are given the image of a scene <strong>and</strong> that we need to add to<br />

this image some additional synthetic elements. For this, it is necessary to generate<br />

a synthetic camera with the same parameter values of the camera that captured<br />

the scene. If these parameters are not known, they must be estimated, based on<br />

<strong>in</strong>formation from the image.<br />

Suppose that there exist n po<strong>in</strong>ts <strong>in</strong> the image whose spatial coord<strong>in</strong>ates m i =<br />

(x i , y i , z i ) (i = 1, . . . , n) are known. By associat<strong>in</strong>g these coord<strong>in</strong>ates to the


104 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

correspond<strong>in</strong>g coord<strong>in</strong>ates (u i , v i ) on the image, we obta<strong>in</strong> 2n equations whose<br />

variables are the camera parameters. Therefore, the computation of the parameters<br />

consists <strong>in</strong> solv<strong>in</strong>g this system of equations.<br />

However, <strong>in</strong> general it is not possible (neither desired) to solve this system exactly.<br />

Typically, we use a large number of po<strong>in</strong>ts <strong>in</strong> order to calibrate the camera<br />

because us<strong>in</strong>g a small number of samples leads, <strong>in</strong> general, to considerable estimation<br />

errors. Thus, the system of equations generated by associat<strong>in</strong>g po<strong>in</strong>ts<br />

<strong>in</strong> space to correspond<strong>in</strong>g po<strong>in</strong>ts <strong>in</strong> the image has more equations than variables<br />

<strong>and</strong>, <strong>in</strong> general, does not have a solution. That is, for any set of parameters, there<br />

will be discrepancies between the observed image (u i , v i ) of a po<strong>in</strong>t (x i , y i , z i )<br />

<strong>and</strong> the image (ũ i , ṽ i ) obta<strong>in</strong>ed with these parameters. The set of parameters to<br />

be adopted must, then, be choosen <strong>in</strong> such a way that these discrepancies are as<br />

small as possible. It is very common to use the quadratic error as a measure of the<br />

discrepancy<br />

n∑<br />

(u i − ũ i ) 2 + (v i − ṽ i ) 2 .<br />

i=1<br />

Thus, the camera calibration problem can be posed as<br />

m<strong>in</strong><br />

t,f,R<br />

n∑<br />

(u i − ũ i ) 2 + (v i − ṽ i ) 2 ,<br />

i=1<br />

where R must be an orthogonal matrix, that is, it must satisfy the equations given<br />

by RR ⊤ = I. In order that the problem can be treated as unconstra<strong>in</strong>ed it is<br />

necessary to express R <strong>in</strong> terms of only three parameters (for example, us<strong>in</strong>g<br />

Euler angles or unit quaternions (Watt, 2000)).<br />

Although general optimization methods can be used to solve the above problem,<br />

specific iterative methods have been devised for this purpose. The most<br />

popular is Tsai’s algorithm, ((Tsai, 1987)), which also takes <strong>in</strong>to consideration<br />

the presence of radial deformations caused by an optical system.<br />

In some applications, it might not be necessary to recover the camera parameters.<br />

It is enough to f<strong>in</strong>d a matrix Q such that ( u v w )⊤ = Q ( x y z 1 )⊤ .<br />

This occurs, for example, when we only need to associate other po<strong>in</strong>ts of the space<br />

to its projections, without build<strong>in</strong>g a a synthetic camera. The problem turns out to


5.6. APPLICATIONS IN GRAPHICS 105<br />

be the one of f<strong>in</strong>d<strong>in</strong>g<br />

⎛ ⎞<br />

q 1 q 14<br />

Q = ⎝q 2 q 24<br />

⎠<br />

q 3 q 34<br />

<strong>in</strong> such a way to m<strong>in</strong>imize ∑ n<br />

i=1 (u i − ũ i ) 2 + (v i − ṽ i ) 2 , where<br />

ũ i = q 1m i + q 14<br />

q 3 m i + q 34<br />

e ṽ i = q 2m i + q 24<br />

q 3 m i + q 34<br />

This problem is still non-l<strong>in</strong>ear, but an acceptable solution can be obta<strong>in</strong>ed by<br />

solv<strong>in</strong>g the least squares problem obta<strong>in</strong>ed by multiply<strong>in</strong>g each term of the above<br />

sum by the square of the common denom<strong>in</strong>ator of ũ i e ṽ i . In this way, we can<br />

reduce it to the problem of m<strong>in</strong>imiz<strong>in</strong>g<br />

n∑<br />

(q 1 m i + q 14 − u i (q 3 m i + q 34 )) 2 + (q 2 m i + q 24 − v i (q 3 m i + q 34 )) 2 .<br />

i=1<br />

It is still necessary to add a normalization constant, because the above expression<br />

is obviously m<strong>in</strong>imized when Q = 0. We may adopt, for <strong>in</strong>stance, |q 3 | = 1,<br />

which leads us to an eigenvector problem, as the one discussed <strong>in</strong> (Faugeras, 1993)<br />

or q 3 ¯m = 1, where ¯m = 1 n<br />

∑ n<br />

i=1 m i (which can be solved us<strong>in</strong>g the techniques<br />

from Section 5.2).<br />

The matrix Q obta<strong>in</strong>ed by the above process does not correspond, <strong>in</strong> general,<br />

to a camera. Nevertheless, it can be used as a start<strong>in</strong>g po<strong>in</strong>t to obta<strong>in</strong> a camera if<br />

this is necessary for the application (see (Carvalho et al. , 1987)).<br />

5.6.2 Registration <strong>and</strong> color correction for a sequence of images<br />

An area of computer graphics that has received a great attention recently if that<br />

of image-based render<strong>in</strong>g. The basic idea of this area is to reconstruct a threedimensional<br />

scene model from a f<strong>in</strong>ite number of images of the scene.<br />

Several of these applications make this visualization from a s<strong>in</strong>gle panoramic<br />

image of the scene, that is, from an image taken us<strong>in</strong>g an angle of 360 ◦ around a<br />

viewpo<strong>in</strong>t as shown <strong>in</strong> Figure 5.6.2. An example of an application that uses visualization<br />

based on panoramic images is the Visorama, which uses a special device


106 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

to simulate an observer that observes the scene us<strong>in</strong>g a b<strong>in</strong>ocular (see (Matos et al.<br />

, 1997)).<br />

Figure 5.16: A panoramic image.<br />

Although there are special photographic cameras, capable of captur<strong>in</strong>g<br />

panoramic images, it is possible to construct such images by match<strong>in</strong>g a set of<br />

conventional photographs taken from the same viewpo<strong>in</strong>t <strong>and</strong> cover<strong>in</strong>g an angle<br />

of 360 ◦ of the scene. In order to enable the correct registration <strong>in</strong> the match<strong>in</strong>g<br />

process, consecutive photographs must have an overlapp<strong>in</strong>g area, as illustrated <strong>in</strong><br />

Figure 5.6.2.<br />

The construction of the panoramic image requires the solution of two problems,<br />

<strong>and</strong> both of them can be tackled us<strong>in</strong>g optimization techniques.<br />

The first of these problems is geometric registration. This problem has common<br />

elements with the camera calibration problem studied <strong>in</strong> Section 5.6.1. In<br />

order to construct the panorama image, it is necessary to recover the camera transformations<br />

associated to each photograph. This can be done, for <strong>in</strong>stance, by<br />

identify<strong>in</strong>g common po<strong>in</strong>ts <strong>in</strong> the overlapp<strong>in</strong>g areas of the photos (as <strong>in</strong>dicated <strong>in</strong><br />

Figure 5.6.2), allow<strong>in</strong>g us to relate each photo with its neighbors. By tak<strong>in</strong>g <strong>in</strong>to<br />

consideration that the position of the camera is the same for both photographs,<br />

po<strong>in</strong>ts <strong>in</strong> them are related by a transformation of the form<br />

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞<br />

x ′ 1 0 0 1 0 0 x<br />

⎝y ′ ⎠ = ⎝0 1 0 ⎠ R ⎝0 1 0 ⎠ ⎝y<br />

⎠ (5.3)<br />

w ′ 0 0 1/f 2 0 0 f 1 w<br />

where f 1 <strong>and</strong> f 2 are the focal distances of the cameras <strong>and</strong> R is the rotation matrix<br />

that expresses the position of the second camera with respect to the first one.


5.6. APPLICATIONS IN GRAPHICS 107<br />

A<br />

match<strong>in</strong>g po<strong>in</strong>ts<br />

B<br />

Figure 5.17: Image sequence with overlapp<strong>in</strong>g.<br />

These values must be computed <strong>in</strong> such a way as to m<strong>in</strong>imize the error <strong>in</strong> the<br />

above transformation.<br />

Thus, it is possible, from the first photo, to recover, sucessively, the focal distance<br />

<strong>and</strong> the orientation (relative to the first camera) of the cameras used <strong>in</strong> the<br />

other photos of the sequence. However, better results are obta<strong>in</strong>ed by do<strong>in</strong>g, afterwards,<br />

a jo<strong>in</strong>t calibration of all cameras<br />

(this process is called bundle adjustment). Once we have obta<strong>in</strong>ed the cameras,<br />

the panoramic image can be constructed by apply<strong>in</strong>g the appropriate transformation<br />

to each photo.<br />

There also exist methods capable of execut<strong>in</strong>g the registration process automatically,<br />

without be<strong>in</strong>g necessary to identify manually the correspond<strong>in</strong>g po<strong>in</strong>ts<br />

<strong>in</strong> the superposition regions. Such methods operate directly with the images, <strong>and</strong><br />

try to f<strong>in</strong>d the camera parameters that maximize the similarity among them (see<br />

(Szeliski & Shum, 1997) e (McMillan & Bishop, 1995)).<br />

The second problem related with ”stitch<strong>in</strong>g” the photographs to produce the<br />

panoramic image is the color compatibility between the photos. Even though


108 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

the photos are obta<strong>in</strong>ed by the same camera, from the same po<strong>in</strong>t <strong>and</strong> almost<br />

at the same time, several factors contribute to generate color differences among<br />

the correspond<strong>in</strong>g objects of the image sequence. In Figure 5.6.2, for <strong>in</strong>stance,<br />

we can observe that the overlapp<strong>in</strong>g region of image A is darker than that of the<br />

correspond<strong>in</strong>g region <strong>in</strong> the image B.<br />

The solution to this problem is described <strong>in</strong> (Pham & Pr<strong>in</strong>gle, 1995) <strong>and</strong> consists<br />

<strong>in</strong> f<strong>in</strong>d<strong>in</strong>g a transformation that corrects the colors <strong>in</strong> image A to make them<br />

compatible with those <strong>in</strong> image B. To compute such a transformation, we use<br />

the colors present <strong>in</strong> the superpositon area of A <strong>and</strong> B. We f<strong>in</strong>d an association<br />

between the levels (r, g, b) <strong>in</strong> the two images of the form<br />

r B = p r (r A , g A , b A )<br />

g B = p g (r A , g A , b A )<br />

b B = p b (r A , g A , b A )<br />

where p r , p g e p b are polynomials.<br />

As an example, we can use second degree polynomials, of the form<br />

p(r A , g A , b A ) = c 000 + c 100 r A + c 010 g A + c 001 b A + c 200 rA 2 + c 020gA 2 + c 002b 2 A<br />

+ c 110 r A g A + c 101 r A b A + c 011 g A b A<br />

(althoug (Pham & Pr<strong>in</strong>gle, 1995) reports better results us<strong>in</strong>g third degree polynomials).<br />

The coefficients of each of the polynomials p r , p g <strong>and</strong> p b are then<br />

computed <strong>in</strong> such a way as to m<strong>in</strong>imize the quadratic error result<strong>in</strong>g from the<br />

adjustment. That is, we solve a problem of the form m<strong>in</strong> |Xβ − Y |, where<br />

• β is a 10 × 3 matrix conta<strong>in</strong><strong>in</strong>g the parameters to be determ<strong>in</strong>ed (a column<br />

for each polynomial);<br />

• Y is a n × 3 matrix where each row conta<strong>in</strong>s a color (r B , g B , b B ) of the<br />

image B;<br />

• X is a n × 10 matrix where each row is of the form<br />

( 1 rA g A b A r 2 A g2 A b2 A r Ag A r A b A g A b A<br />

)


5.6. APPLICATIONS IN GRAPHICS 109<br />

<strong>and</strong> corresponds to a color of B.<br />

This problem can be solved us<strong>in</strong>g the methods discussed <strong>in</strong> Section 5.2.<br />

5.6.3 Visibility for real time walktrough<br />

Virtual environment real time walktrough dem<strong>and</strong>s the use of procedures that<br />

guarantee that the amount of data sent to the visualization pipel<strong>in</strong>e be limited<br />

<strong>in</strong> order to ma<strong>in</strong>ta<strong>in</strong> an adequate visualization rate (or frame rate).<br />

We describe here a technique suitable for walk-through <strong>in</strong> build<strong>in</strong>g environments.<br />

Although the model<strong>in</strong>g of large scale build<strong>in</strong>g environments dem<strong>and</strong>s a<br />

huge number of objects, typically only a small number can be visible simultaneously,<br />

because most of them are hidden by obstacles (such as the walls of the<br />

build<strong>in</strong>g). The strategies described below can be used to accelerate the visualization<br />

of such structures.<br />

The first consists <strong>in</strong> organiz<strong>in</strong>g the polygons that compose the scene <strong>in</strong> a hierarchical<br />

structure called b<strong>in</strong>ary space partition tree, or BSP tree The use of this<br />

structure allows that the polygons located beh<strong>in</strong>d the observer be elim<strong>in</strong>ated <strong>in</strong> the<br />

visualization process. Moreover, this structure provides a depth order<strong>in</strong>g for the<br />

other polygons, that allows us to avoid the use of the z-buffer or other strategies<br />

for remov<strong>in</strong>g hidden faces (see (Fuchs et al. , 1980)). Figure 5.18 illustrates this<br />

idea. The plane is <strong>in</strong>itially divided <strong>in</strong>to two halfplanes by the l<strong>in</strong>e r 1 ; each of these<br />

two halfplanes is, <strong>in</strong> turn, divided <strong>in</strong>to two regions, respectively, by l<strong>in</strong>es r 2 <strong>and</strong><br />

r 3 ; <strong>and</strong> so on. In the end of the process, we have a tree of subdivisions where each<br />

leaf corresponds to one of the 16 regions (or cells) delimited <strong>in</strong> the figure.<br />

The use of the BSP tree structure allows us to reduce the number of objects to<br />

visualize. If the observer is <strong>in</strong> cell 1, look<strong>in</strong>g <strong>in</strong> the <strong>in</strong>dicated direction, the use of<br />

the BSP tree structure guarantees that only the marked walls need to be sent to the<br />

visualization pipel<strong>in</strong>e.<br />

In the case of huge enviroments, this is not enough. As an example, if the<br />

observer is <strong>in</strong> cell 5, look<strong>in</strong>g <strong>in</strong> the <strong>in</strong>dicated direction, only the walls to the left<br />

would be excluded <strong>in</strong> the visualization us<strong>in</strong>g the BSP tree analysis, although other<br />

walls are also <strong>in</strong>visible from any observer position on this cell.<br />

To cope with this situation, (Teller & Sequ<strong>in</strong>, 1991) proposes the adoption of<br />

an additional structure capable of stor<strong>in</strong>g visibility <strong>in</strong>formation, relative to the


110 CHAPTER 5. CONTINUOUS OPTIMIZATION<br />

r 1<br />

5<br />

P<br />

Q<br />

2<br />

R<br />

S<br />

3<br />

1<br />

T<br />

r 2<br />

r 3<br />

U<br />

4<br />

Figure 5.18: Visualization of build<strong>in</strong>g enviroments.<br />

different cells determ<strong>in</strong>ed by the BSP tree. This additional <strong>in</strong>formation is stored<br />

<strong>in</strong> a cell-cell visibility graph. In this graph, there exists an edge connect<strong>in</strong>g two<br />

cells when one of them is visible from the other. That is, when there exist po<strong>in</strong>ts<br />

<strong>in</strong>terior to these two cells such that the segment that connect them does not cross<br />

any wall. From this graph, the polygons correspond<strong>in</strong>g to one cell are rendered<br />

only if this cell is visible from the cell where the observer is located.<br />

The problem of decid<strong>in</strong>g if two cells are visible from each other can be formulated<br />

as a l<strong>in</strong>ear programm<strong>in</strong>g problem. Let us analyze, as an example, the<br />

situation of Figure 5.18. Cell 5 is connected through the door P Q to cell 2, which<br />

is connected to cell 3 through the door RS, which, by its turn, is connected to<br />

cell 4 through the door T U. Cells 4 <strong>and</strong> 5 are visible from each other when there


5.6. APPLICATIONS IN GRAPHICS 111<br />

exists a l<strong>in</strong>e, of equation y = ax + b, separat<strong>in</strong>g po<strong>in</strong>ts P <strong>and</strong> Q, po<strong>in</strong>ts R <strong>and</strong> S<br />

<strong>and</strong> po<strong>in</strong>ts T <strong>and</strong> U. That is, there must exist real numbers a <strong>and</strong> b such that the<br />

follow<strong>in</strong>g <strong>in</strong>equalities are satisfied:<br />

⎧<br />

y P ≥ ax P + b<br />

y Q ≤ ax Q + b<br />

⎪⎨<br />

y S ≥ ax S + b<br />

y R ≤ ax R + b<br />

y ⎪⎩ T ≥ ax T + b<br />

y U ≤ ax U + b<br />

The problem of verify<strong>in</strong>g if a set of l<strong>in</strong>ear <strong>in</strong>equalities has a solution can be solved<br />

us<strong>in</strong>g a l<strong>in</strong>ear program, as studied <strong>in</strong> Section 5.5.<br />

To construct the visibility graph it is not necessary to test all of the pairs of<br />

cells. We can use a propagation strategy. That is, first we identify visible cells<br />

which are adjacent; after this, the pairs of cells that are visible through a s<strong>in</strong>gle<br />

cell; <strong>and</strong> so forth.<br />

For huge scenes, the use of this strategy allows us to reduce significantly the<br />

visualization time: (Teller & Sequ<strong>in</strong>, 1991) reports reductions of up to 94% <strong>in</strong> the<br />

number of polygons to be visualized.


112 CHAPTER 5. CONTINUOUS OPTIMIZATION


Chapter 6<br />

Comb<strong>in</strong>atorial optimization<br />

In this chapter we study optimization problems <strong>in</strong> the discrete doma<strong>in</strong> where the<br />

elements are def<strong>in</strong>ed by comb<strong>in</strong>atorial relations. As we will see, besides the<br />

problems which are <strong>in</strong>tr<strong>in</strong>sically discrete, several cont<strong>in</strong>uous problems can be discretized<br />

<strong>and</strong> solved <strong>in</strong> an efficient way us<strong>in</strong>g the methods studied here.<br />

6.1 Introduction<br />

As described <strong>in</strong> the overview of optimization problems, Chapter 2, <strong>in</strong> comb<strong>in</strong>atorial<br />

optimization the solution set S is f<strong>in</strong>ite. Therefore we have two possibilities<br />

to specify S. The first is a direct specification, us<strong>in</strong>g a simple enumeration of<br />

its elements. The second consists of an <strong>in</strong>direct specification us<strong>in</strong>g relations that<br />

exploit the problem structure.<br />

The first option, of enumerat<strong>in</strong>g all of the elements of S, besides be<strong>in</strong>g clearly<br />

un<strong>in</strong>terest<strong>in</strong>g, may be impractical when the solution set has many elements.<br />

Therefore, we are <strong>in</strong>terested <strong>in</strong> optimization problems where the set S is f<strong>in</strong>ite<br />

<strong>and</strong> can be characterized by the use of comb<strong>in</strong>atorial relations bases on structural<br />

properties of the solution set S. This allows us to obta<strong>in</strong> a compact description as<br />

well as devis<strong>in</strong>g efficient strategies to compute the solution.<br />

113


114 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

6.1.1 Solution methods <strong>in</strong> comb<strong>in</strong>atorial optimization<br />

There are two classes of methods to solve a comb<strong>in</strong>atorial optimization problem:<br />

specific methods <strong>and</strong> general methods. We are <strong>in</strong>terested here <strong>in</strong> general methods<br />

because of their wide range of applicability. Among the general methods, we can<br />

establish the follow<strong>in</strong>g classification:<br />

• Exact methods;<br />

– Dynamic programm<strong>in</strong>g;<br />

– Integer programm<strong>in</strong>g;<br />

• Approximated methods;<br />

– Heuristic algorithms;<br />

– Probabilistic algorithms.<br />

In the exact methods, <strong>in</strong> general, it is possible to exploit the comb<strong>in</strong>atorial<br />

structure to obta<strong>in</strong> exact solutions us<strong>in</strong>g one of the strategies listed above. Such<br />

methods often result <strong>in</strong> algorithms of polynomial complexity which guarantee the<br />

computation of the global m<strong>in</strong>imum for the problem.<br />

Nevertheless, there are certa<strong>in</strong> types of problem for which no efficient algorithm<br />

to compute an exact solution is known. In pr<strong>in</strong>ciple, the only option to f<strong>in</strong>d<br />

the global m<strong>in</strong>imum would be the exhaustive <strong>in</strong>vestigation of all of the possibilities,<br />

which, as mentioned above, is not feasible <strong>in</strong> practice. When faced with<br />

this obstacle, we can use approximate methods that provide a good solution, but<br />

cannot guarantee that the solution is optimal. In many situations, this result is<br />

acceptable.<br />

In this chapter we study exact methods for comb<strong>in</strong>atorial optimization, <strong>in</strong> particular<br />

we will cover the methods of dynamic <strong>and</strong> <strong>in</strong>teger programm<strong>in</strong>g. In Chapter<br />

7 we will study generic approximate methods to solve both discrete <strong>and</strong> cont<strong>in</strong>uous<br />

problems.<br />

We will present now a concrete example of a comb<strong>in</strong>atorial problem.<br />

Example 20. Consider the diagram of Figure 6.1 that represents a network with<br />

possible paths connect<strong>in</strong>g the po<strong>in</strong>t A to the po<strong>in</strong>t B. In this network, as <strong>in</strong>dicated


6.1. INTRODUCTION 115<br />

Figure 6.1: Network of connections between A <strong>and</strong> B<br />

<strong>in</strong> the figure, each arc is associated to the cost of travers<strong>in</strong>g the path from the<br />

<strong>in</strong>itial po<strong>in</strong>t to the f<strong>in</strong>al po<strong>in</strong>t of the arc.<br />

The comb<strong>in</strong>atorial optimization problem, <strong>in</strong> this example, consists <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g<br />

the m<strong>in</strong>imal cost path start<strong>in</strong>g from A <strong>and</strong> arriv<strong>in</strong>g at B.<br />

Two relevant questions to consider are:<br />

• What is the size of the problem description?<br />

• What is the size of the solution set S?<br />

We will answer the first question us<strong>in</strong>g the fact that a network with (n + 1) ×<br />

(n + 1) po<strong>in</strong>ts has 2n(n + 1) arcs. S<strong>in</strong>ce the network structure is fixed for this<br />

class of problems, we only need to specify the costs associated to each arc. That<br />

is, the size of the description has order n 2 , which is denoted by O(n 2 ).<br />

To answer the second question we observe that <strong>in</strong> order to go from A to B it<br />

is necessary to traverse, from left to right, 2n arcs. In the trajectory, at each node<br />

we have two options: move upwards or downwards. Moreover, from the network<br />

structure, the valid paths must have the same number of upwards <strong>and</strong> downwards


116 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

motion (which is n). That is, the paths have 2n arcs, with n arcs mov<strong>in</strong>g upwards<br />

<strong>and</strong> n mov<strong>in</strong>g downward, which results <strong>in</strong> ( )<br />

2n<br />

possible solutions. Clearly, the<br />

size of the solution set S is much bigger than n 2 .<br />

The previous analysis shows that the size of the comb<strong>in</strong>atorial description of<br />

the problem is much smaller than the size of the solution set.<br />

6.1.2 Computational complexity<br />

The computational complexity of an algorithm can be def<strong>in</strong>ed as the number of<br />

operations necessary to solve a problem whose description has size n. This number<br />

is provided as a function of n.<br />

Consider two functions f : N → R <strong>and</strong> g : N → R. We say that f is O(g),<br />

when there exist numbers K > 0 <strong>and</strong> N ∈ N such that f(n) ≤ Kg(n), for every<br />

n ≥ N.<br />

The function f = 3n 3 + 2n 2 + 27, for example, is O(n 3 ).<br />

An important class are the so called polynomial algorithms, which have complexity<br />

O(n p ), for p fixed.<br />

We are <strong>in</strong>terested <strong>in</strong> solv<strong>in</strong>g efficiently the optimization problem. Tha is: we<br />

need to f<strong>in</strong>d algorithms for which the number of operations is comparable to the<br />

comb<strong>in</strong>atorial size of the problem. More specifically, we must try to f<strong>in</strong>d algorithms<br />

of polynomial complexity with respect to the size of the <strong>in</strong>put data.<br />

6.2 Description of comb<strong>in</strong>atorial problems<br />

One of the most important features of comb<strong>in</strong>atorial optimization methods is format<br />

description of the problem data. As we have mentioned previously, the adequate<br />

structur<strong>in</strong>g of the data will allow us to obta<strong>in</strong> an efficient solution to the<br />

problem.<br />

Graphs are the appropriate mathematical objects to describe the relations<br />

among the data of a comb<strong>in</strong>atorial problem. The network structure used <strong>in</strong> Example<br />

20 is a particular case of a graph. The comb<strong>in</strong>atorial optimization methods<br />

for the different classes of problems will use graphs of different types <strong>and</strong> exploit<br />

their properties <strong>in</strong> the search for an efficient solution to the problem. For this<br />

reason, we provide, <strong>in</strong> next section, a brief review of the ma<strong>in</strong> concepts of graph


6.2. DESCRIPTION OF COMBINATORIAL PROBLEMS 117<br />

theory.<br />

6.2.1 Graphs<br />

A graph is a pair G = (V, A), where V = {v 1 , . . . , v n } is the set of vertices <strong>and</strong><br />

A = {v i , v j }, v i , v j ∈ V is a set of non-ordered pairs of vertices, called edges of<br />

the graph. When two vertices are connected by an edge, they are called adjacent<br />

vertices.<br />

It is possible to represent a graph geometrically by represent<strong>in</strong>g each vertex<br />

v j ∈ V as a po<strong>in</strong>t <strong>in</strong> some euclidean space <strong>and</strong> each edge {v i , v j } by a cont<strong>in</strong>uous<br />

path of the space connect<strong>in</strong>g the vertices v i <strong>and</strong> v j .<br />

Figure 6.2 shows a graph with 4 arcs <strong>and</strong> 4 vertices.<br />

Figure 6.2: Graph.<br />

A graph is said to be oriented when the pairs of vertices are ordered pairs. In<br />

this case the edges are called arcs of the graph.<br />

Figure 6.3 depicts an oriented graph with 5 arcs <strong>and</strong> 4 vertices.<br />

Now we will describe some structures used to represent a graph.<br />

6.2.2 Adjacency matrix<br />

A graph can be represented by a matrix which describes the adjacencies between<br />

the vertices of the graph. That is, for each vertex v j it provides the other vertices


118 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

Figure 6.3: Oriented graph.<br />

which are connected to v j by an edge. Note that we have to establish an order<strong>in</strong>g<br />

for the graph vertices, <strong>in</strong> order to associate them with the <strong>in</strong>dices of the matrix.<br />

Therefore we have a matrix M |V |×|V | = (a ij ), where<br />

{<br />

1 if ij ∈ A<br />

a ij =<br />

0 if ij ∉ A<br />

Note that all of the elements on the diagonal of the adjacency matrix are equal<br />

to zero. Moreover, the adjacency matrix of a non-oriented graph is always symmetric,<br />

<strong>and</strong> the adjacency matrix of an oriented graph is not symmetric, <strong>in</strong> general.<br />

Below, we show the adjacency matrix of the graph <strong>in</strong> Figure 6.2 (left), <strong>and</strong> the<br />

adjacency matrix of the graph <strong>in</strong> Figure 6.3 (right).<br />

⎛ ⎞ ⎛ ⎞<br />

0 1 1 0<br />

0 1 1 0<br />

⎜1 0 1 0⎟<br />

⎜0 0 1 0⎟<br />

⎜<br />

⎝<br />

1 1 0 1<br />

0 0 1 0<br />

6.2.3 Incidence matrix<br />

⎟<br />

⎠ ;<br />

⎜<br />

⎝<br />

1 0 0 1<br />

0 0 0 0<br />

⎟<br />

⎠ .<br />

Another useful graph representation is the so called <strong>in</strong>cidence matrix. This matrix<br />

provides, for each vertice v j , the edges {v j , v k }, v k ∈ A, that is, the edges which<br />

are <strong>in</strong>cident to vertex v j (the edges that connect v j to some other vertex).


6.3. DYNAMIC PROGRAMMING 119<br />

Therefore we have a matrix K |V |×|A| = (a ij ), where<br />

{ 1 if the edge j is <strong>in</strong>cident to the vertex i<br />

a ij =<br />

0 otherwise<br />

Below we show the <strong>in</strong>cidence matrix of the graph <strong>in</strong> Figure 6.2<br />

⎛ ⎞<br />

1 0 1 0<br />

⎜1 1 0 0<br />

⎟<br />

⎝0 1 1 1⎠<br />

0 0 0 1<br />

For an oriented graph, the <strong>in</strong>cidence matrix must <strong>in</strong>dicate also the arc direction.<br />

In this case, the <strong>in</strong>cidence matrix is K |V |×|A| = (a ij ), where<br />

⎧<br />

⎨ 1 if edge j starts at vertex i<br />

a ij = −1 if edge j ends at vertex i<br />

⎩<br />

0 if edge j is not <strong>in</strong>cident to vertex i<br />

Below, we show the <strong>in</strong>cidence matrix of the graph <strong>in</strong> Figure 6.3<br />

⎛<br />

⎞<br />

1 0 −1 1 0<br />

⎜−1 1 0 0 0<br />

⎟<br />

⎝ 0 −1 1 −1 1⎠<br />

0 0 0 0 −1<br />

6.3 Dynamic programm<strong>in</strong>g<br />

The method of dynamical programm<strong>in</strong>g, also called recursive or sequenced optimization,<br />

applies to comb<strong>in</strong>atorial problems which can be decomposed <strong>in</strong>to a<br />

sequence of stages. The basic operation <strong>in</strong> each of these stages consists <strong>in</strong> tak<strong>in</strong>g<br />

a partial decision (local decision) which will lead us to the m<strong>in</strong>imum solution for<br />

the problem.<br />

Dynamic programm<strong>in</strong>g is based on the optimality pr<strong>in</strong>ciple, which is stated<br />

below.


120 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

Proposition 1 (Optimality pr<strong>in</strong>ciple). The best sequence of decisions has the<br />

property that, whatever the last decision we have taken, the sequence of decisions<br />

that led us to the next to the last stage must also be optimal.<br />

The optimality pr<strong>in</strong>ciple allows us to elaborate a sequential algorithm, whose<br />

solution strategy at each stage, depends only on a local decision. That is, by<br />

suppos<strong>in</strong>g that we have a partial optimal sequence of decisions, the current stage<br />

depends only of a localized decision, that is, a decision which is only based on<br />

part of the available <strong>in</strong>formation.<br />

In order to describe the method with more details, we will <strong>in</strong>troduce the appropriate<br />

formalism <strong>and</strong> methodology.<br />

By assum<strong>in</strong>g that the problem can be decomposed <strong>in</strong> a sequence of stages, the<br />

method will use:<br />

• stage variables; <strong>and</strong><br />

• state variables.<br />

We should remark that each stage can be associated to the time, or the evolution<br />

of the algorithm towards the solution.<br />

Therefore, we may suppose that the problem has N stages <strong>and</strong> each stage has<br />

a set X n of states that describe the partial result obta<strong>in</strong>ed.<br />

Then, we def<strong>in</strong>e the function f n (x) that provides the optimal value to atta<strong>in</strong> at<br />

the state x of stage n.<br />

Us<strong>in</strong>g the optimality pr<strong>in</strong>ciple, we obta<strong>in</strong> the fundamental equation of dynamic<br />

programm<strong>in</strong>g:<br />

f n+1 (x) = m<strong>in</strong><br />

u∈etapa n {f n(x) + c(u, x)}<br />

where c(u, x) is the transition cost from state u <strong>in</strong> stage n to state x <strong>in</strong> stage n + 1.<br />

The equation 6.3 provides the basic mechanism of recursion of the method, <strong>and</strong><br />

at the same time <strong>in</strong>dicates the computation to be used <strong>in</strong> the algorithm sequence.<br />

More precisely, at each stage n vary<strong>in</strong>g from 1 to N, the algorithm computes<br />

the function f n (x) for every state variables x ∈ X n <strong>and</strong> registers the transitions<br />

u → x associated to them. At the end of this process, we have computed the cost<br />

of the optimal path. Thus we can traverse the reverse trajectory, us<strong>in</strong>g the <strong>in</strong>verse<br />

of the transitions u → x, <strong>and</strong> obta<strong>in</strong> the shortest path. This last operation is called<br />

backtrack.


6.3. DYNAMIC PROGRAMMING 121<br />

The algorithm needs a number of operations proportional to the number N of<br />

stages multiplied by the number of transitions. By suppos<strong>in</strong>g that we have M<br />

possible transitions at each stage, the complexity of the algorithm is O(NM).<br />

In order to illustrate the algorithm, we will give a concrete example below.<br />

Example 21. Consider the diagram of Figure 6.4, that shows a network of connections<br />

between A <strong>and</strong> B with the associated costs at each arc. The optimization<br />

problem consists <strong>in</strong> comput<strong>in</strong>g the m<strong>in</strong>imal cost path between A <strong>and</strong> B.<br />

Figure 6.4: Network of connections between A <strong>and</strong> B<br />

From the comb<strong>in</strong>atorial structure of the network, it is possible to decompose<br />

the problem <strong>in</strong>to a sequence of stages that correspond to the partial paths to traverse<br />

the network from left to right. Note that these paths start at A <strong>and</strong> end <strong>in</strong> the<br />

<strong>in</strong>termediate vertices aligned with the vertical l<strong>in</strong>es. In Figure 6.4 these l<strong>in</strong>es are<br />

represented by dashed straight l<strong>in</strong>es.


122 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

At each stage of the algorithm, we compute the cost of all optimal partial paths<br />

up to the current stage, along with the po<strong>in</strong>ters <strong>in</strong>dicat<strong>in</strong>g the optimal transition.<br />

In Figure 6.5 we add the value of the m<strong>in</strong>imum cost f n (x) associated to each<br />

vertex x of the network, as well as a po<strong>in</strong>ter <strong>in</strong>dicat<strong>in</strong>g the vertex u <strong>in</strong> the previous<br />

stage associated with the m<strong>in</strong>imum path to that vertex. Moreover, we draw a th<strong>in</strong><br />

curve around the shortest path.<br />

Figure 6.5: State variable of the problem.<br />

6.3.1 Construct<strong>in</strong>g the state of the problem<br />

Accord<strong>in</strong>g to the results from previous section, the method of dynamic programm<strong>in</strong>g<br />

for comb<strong>in</strong>atorial optimization problems is structured based <strong>in</strong> two types of<br />

<strong>in</strong>formation: stage variables <strong>and</strong> state variables.


6.3. DYNAMIC PROGRAMMING 123<br />

The solution of the problem develops sequentially (by stages), us<strong>in</strong>g the recursion<br />

orig<strong>in</strong>ated from the optimality pr<strong>in</strong>ciple. At each stage, we have to keep<br />

<strong>in</strong>formation about previous stages (the state of the problem), <strong>in</strong> order to advance<br />

<strong>and</strong> compute the solution <strong>in</strong> the next stage.<br />

An important consequence of the optimality pr<strong>in</strong>ciple, is that we need only<br />

local <strong>in</strong>formation <strong>in</strong> our computations. The nature <strong>and</strong> the scope of the state <strong>in</strong>formation<br />

depend of the problem to be solved. In fact, this is one of the most<br />

important aspects of model<strong>in</strong>g <strong>in</strong> dynamic programm<strong>in</strong>g.<br />

In order to better illustrate the formulation <strong>and</strong> the use of the state variables,<br />

we will give another example.<br />

Example 22. Consider Example 21, <strong>in</strong> which we needed to compute the m<strong>in</strong>imum<br />

cost path of <strong>in</strong> a network connect<strong>in</strong>g the po<strong>in</strong>ts A <strong>and</strong> B, with costs associated to<br />

each arc. We will now modify the problem, <strong>in</strong>clud<strong>in</strong>g one more component <strong>in</strong> the<br />

cost of a path: each change <strong>in</strong> direction will imply <strong>in</strong> a penalty (an additional cost<br />

of c unities).<br />

In Example 21, the decision rule if given by:<br />

{<br />

f(n − 1, y − 1) + u(n − 1, y − 1)<br />

f(n, y) = m<strong>in</strong><br />

f(n − 1, y + 1) + d(n − 1, y + 1)<br />

where u(n, y) <strong>and</strong> d(n, y) are the costs of the arcs connect<strong>in</strong>g, respectively, y − 1<br />

to y (<strong>in</strong> the direction “up”) <strong>and</strong> y +1 to y (<strong>in</strong> the direction “down”). The only state<br />

<strong>in</strong>formation stored at each node (n, y) of the network is the optimal cost f(n, y)<br />

of the partial path from A to (n, y).<br />

In order to consider <strong>in</strong> the cost the changes of direction of the path, we need<br />

more state <strong>in</strong>formation. Now, the state is given by f(n, y, a), which is the value of<br />

the best path to arrive at the node (n, y) from the direction a. In order to arrive at<br />

node (n, y), it is possible to come either from (n−1, y −1), or from (n−1, y +1);<br />

we will denote these directions, respectively, by ↗ <strong>and</strong> ↘ . Thus, at each node<br />

we have to store the value of f(n, y, a) for these two directions.<br />

After <strong>in</strong>clud<strong>in</strong>g <strong>in</strong> the cost computation the change of direction, we obta<strong>in</strong>:<br />

{<br />

f(n − 1, y − 1, ↘) + c + u(n − 1, y − 1)<br />

f(n, y, ↗) = m<strong>in</strong><br />

f(n − 1, y − 1, ↗) + u(n − 1, y − 1)


124 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

<strong>and</strong><br />

{<br />

f(n − 1, y + 1, ↘) + d(n − 1, y + 1)<br />

f(n, y, ↘) = m<strong>in</strong><br />

f(n − 1, y + 1, ↗) + c + d(n − 1, y + 1)<br />

We should remark that, s<strong>in</strong>ce <strong>in</strong> the computation of stage n we are us<strong>in</strong>g the<br />

value of f <strong>in</strong> stage n − 1, we consider the change of direction between the stages<br />

n − 2, n − 1 <strong>and</strong> n − 1, n.<br />

6.3.2 Remarks about general problems<br />

The adequate model<strong>in</strong>g of a comb<strong>in</strong>atorial optimization problem relative to the<br />

method of dynamic programm<strong>in</strong>g requires that we are able to express its solution<br />

by a sequence of stages where, <strong>in</strong> each stage, we use only the localized state<br />

<strong>in</strong>formation.<br />

In the examples we have given, we def<strong>in</strong>ed the recursion from the end to the<br />

beg<strong>in</strong>n<strong>in</strong>g. That is, we computed the state of stage n based on stage n + 1. We<br />

can th<strong>in</strong>k about the solution of this problem also from the beg<strong>in</strong>n<strong>in</strong>g to the end.<br />

We leave this as an exercise to the reader.<br />

Another remark is that the solution of the problem corresponds essentially to<br />

fill<strong>in</strong>g a table where the columns correspond to the stages n = 1, . . . , N + 1,<br />

<strong>and</strong> the rows to the states x = 0, . . . , M + 1. One of the consequences of this<br />

fact is that it is possible to implement the method us<strong>in</strong>g an electronic spreadsheet<br />

program.<br />

6.3.3 Problems of resource allocation<br />

One class of practical problems that can be solved us<strong>in</strong>g dynamic programm<strong>in</strong>g is<br />

the class of resource allocation problems. The knapsack problem<br />

The knapsack problem is the simplest version of the general problem of resource<br />

allocation. Its precise formulation is given by:<br />

maximize<br />

subject to<br />

c 1 x 1 + . . . + c n x n<br />

w 1 x 1 + . . . + w n x n ≤ L<br />

There are two versions for the problem:


6.4. SHORTEST PATHS IN GRAPHS 125<br />

• zero-one version : x i ∈ {0, 1}<br />

• l<strong>in</strong>ear version: x i ≥ 0, x i ∈ Z<br />

The recursion equation is given by:<br />

{ s(k − 1, y − wk ) + c<br />

s(k, y) = max<br />

k<br />

s(k − 1, y)<br />

where s(k, y) is the knapsack of maximum value with size smaller than or equal<br />

to y, us<strong>in</strong>g only the first k objects. In the first case, we put the element k <strong>in</strong> the<br />

knapsack. In the second case, we keep the previous knapsack, that is, without the<br />

element k.<br />

The objective of the problem consists <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g s(n, L), given the <strong>in</strong>itial conditions:<br />

s(0, y) = 0,<br />

∀y<br />

s(k, y) = ∞, ify < 0<br />

The complexity of the problem is O(nL). At first sight, it would appear that<br />

this algorithm runs <strong>in</strong> polynomial time. However, stor<strong>in</strong>g an <strong>in</strong>teger M requires<br />

only log n bits. Thus, the encod<strong>in</strong>g size of a knapsack problem is O(n log L)<br />

Hence, the runn<strong>in</strong>g time of the algorithm above is not polynomial (it is called a<br />

pseudo-polynomial algorithm).<br />

6.4 Shortest paths <strong>in</strong> graphs<br />

In order to generalize the method of dynamic programm<strong>in</strong>g we will study shortest<br />

path problems <strong>in</strong> general graphs (oriented or not).<br />

6.4.1 Shortest paths <strong>in</strong> acyclic oriented graphs<br />

Acyclic oriented graphs (DAGs) constitute one of the richest structures to describe<br />

comb<strong>in</strong>atorial problems.<br />

Theorem 15 (Order<strong>in</strong>g of DAGs). The nodes <strong>in</strong> an acyclic oriented graph can be<br />

numbered from 1 to n, <strong>in</strong> such a way that every arc is of the form ij with i < j.


126 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

The proof is constructive <strong>and</strong> we leave it as an exercise to the reader.<br />

Given an acyclic oriented graph g = (V, E), we want to f<strong>in</strong>d a shortest path<br />

from s to t, where s, t ∈ V . We may suppose, without loss of generality, that<br />

s = 1 <strong>and</strong> t = |V |.<br />

Algorithm 1<br />

1. Order the nodes <strong>in</strong> such a way that the arc ij ⇒ i < j,<br />

or verify that the graph is not acyclic <strong>and</strong> leave the algorithm.<br />

2. Let c i the length of the smaller path from 1 to i.<br />

Then: c 1 = 0<br />

for i = 2, . . . , n do<br />

c i = m<strong>in</strong><br />

j


6.4. SHORTEST PATHS IN GRAPHS 127<br />

In the presence of negative cycles the shortest path problem is NP-complete (it<br />

may even be unbounded if if any path from the source to the dest<strong>in</strong>ation reaches<br />

such a cycle).<br />

The optimality pr<strong>in</strong>ciple for the shortest path <strong>in</strong> graphs without negative cycles<br />

can be formulated <strong>in</strong> a concise way us<strong>in</strong>g Bellman equation.<br />

Bellman equation<br />

where c j is the length of the smallest path from 1 to j.<br />

c j = m<strong>in</strong><br />

i≠j {c i + a ij } (6.1)<br />

Remark. A shortest path from s to t has at most |V | − 1 arcs. That is, the path<br />

has at most |V | − 2 <strong>in</strong>termediate nodes.<br />

We describe <strong>in</strong> what follows a shortest path algorithm for directed graphs that<br />

satisfy the hypothesis of not hav<strong>in</strong>g cycles with total negative cost.<br />

Def<strong>in</strong>ition of the problem<br />

Let c k (j) denote the length of the shortest path from 1 to j with, at most, k arcs,<br />

<strong>and</strong> a ij be the cost of transition from i to j. If there exists an arc ij this cost is<br />

given by the problem, otherwise the cost is ∞.<br />

Algorithm 2<br />

1. Initialization:<br />

c 1 (1) = 0<br />

c 1 (j) = a 1j for j ≠ 1<br />

2. Recursion:<br />

c k (j) = m<strong>in</strong> i≠j {c k−1 (i) + a ij }<br />

We want to obta<strong>in</strong> c |V |−1 (j).<br />

The complexity of the problem is O(|V | 3 ), or O(|V ||A|).<br />

Theorem 16. A graph has no negative cycles if, <strong>and</strong> only if, c |V | (j) = c |V |−1 (j)<br />

for every j.


128 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

Proof. Suppose that there are no negative cycles. A path with |V | arcs has repeated<br />

nodes <strong>and</strong>, therefore, has length greater then or equal to a subpath with<br />

less than |V | arcs. Thus, c |V | (j) = c |V |−1 (j) for every j.<br />

Algorithm 2 also computes, with slightly greater complexity, the shortest paths<br />

from i to j for every pair ij.<br />

c k (i, j) = m<strong>in</strong>{a il + c k−1 (lj)}<br />

l<br />

The complexity is O(|V | 3 log |V |), ou O(|V ||A| log |V |).<br />

6.4.3 Dijkstra algorithm<br />

This algorithm assumes that all costs are non-negative.<br />

Recall<strong>in</strong>g Bellman equations:<br />

c(1) = 0<br />

c(j) = m<strong>in</strong><br />

i≠j {c(i) + a ij}<br />

where c is a function that attributes a value for each node of the graph.<br />

It is not difficult to prove that c satisfies Bellman equations if, <strong>and</strong> only if, c(j)<br />

is the length of the shortest path from 1 to j.<br />

Dijkstra algorithm computes c(i) <strong>in</strong> an order v 1 , v 2 , . . . v |V | , such that c(v 1 ) ≤<br />

c(v 2 ) ≤ . . . ≤ c(v |V | ).<br />

Theorem 17. At the end of each iteration, c(j) is the length of the shortest path<br />

from 1 to j us<strong>in</strong>g only <strong>in</strong>termediate nodes <strong>in</strong> p. Thus, when all vertices have been<br />

<strong>in</strong>cluded <strong>in</strong> p, c(j) is the length of the shortest path from 1 to j.<br />

The proof is by <strong>in</strong>duction.<br />

The complexity of the problem is O(|V | 2 ) or O(|A| log |V |).<br />

6.5 Integer programm<strong>in</strong>g<br />

In this section we study problems on <strong>in</strong>teger programm<strong>in</strong>g, which appear when<br />

we restrict l<strong>in</strong>ear cont<strong>in</strong>uous optimization problems to the set of <strong>in</strong>teger numbers.


6.5. INTEGER PROGRAMMING 129<br />

Algorithm 3 Dijkstra<br />

1. Initialization:<br />

c 1 (1) = 0 <strong>and</strong> c 1 (j) = a 1j for j ≠ 1<br />

p = {1}, p = {2, . . . , |V |}<br />

2. Recursion:<br />

while p ≠ ∅ do<br />

Choose k such that c(k) = m<strong>in</strong> j∈p c(j)<br />

p = p ∪ {k}<br />

p = p − {k}<br />

for all j ∈ p do<br />

c(j) = m<strong>in</strong>{c j (i), c(k) + a kj }<br />

end for<br />

end while<br />

6.5.1 L<strong>in</strong>ear <strong>and</strong> Integer programm<strong>in</strong>g<br />

In Chapter 5 on cont<strong>in</strong>uous optimization we def<strong>in</strong>ed a l<strong>in</strong>ear programm<strong>in</strong>g problem<br />

as follows:<br />

n∑<br />

m<strong>in</strong> c j x j<br />

j=1<br />

subject to<br />

n∑<br />

a ij x j ≤ b i ,<br />

j=1<br />

for i = 1, . . . m.<br />

This is a problem with n variables <strong>and</strong> m constra<strong>in</strong>ts, <strong>in</strong> which both the objective<br />

function <strong>and</strong> the constra<strong>in</strong>ts themselves are l<strong>in</strong>ear functions.<br />

We remark that other <strong>in</strong>equality or equality constra<strong>in</strong>t can be added to the above<br />

equations. Moreover, the variables can be irrestrict or they might have constra<strong>in</strong>ts<br />

such as: x j ≤ 0; x j ≥ 0 or x j ∈ [l j , u j ], which naturally configures itself as<br />

constra<strong>in</strong>ts of the problem.<br />

An <strong>in</strong>teger programs is a l<strong>in</strong>ear program with the additional constra<strong>in</strong>t that<br />

x j ∈ Z.<br />

In what follows we will show that an <strong>in</strong>teger program can be naturally formulated<br />

as a comb<strong>in</strong>atorial optimization problem <strong>and</strong> vice-versa.


130 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

Example 23 (Shortest path). Consider the graph described by its <strong>in</strong>cidence matrix<br />

⎛<br />

⎞<br />

1 1 0 0 0 0<br />

⎜−1 0 −1 0 0 0<br />

⎟<br />

⎝ 0 −1 1 −1 0 1 ⎠<br />

0 0 0 0 −1 −1<br />

where the values 1 <strong>and</strong> −1 at the entry ij of the matrix <strong>in</strong>dicate, respectively, that<br />

the arc j arrives or leaves the node i.<br />

We want to f<strong>in</strong>d the shortest path between the <strong>in</strong>itial node s <strong>and</strong> the f<strong>in</strong>al node<br />

t. In the formulation of the problem as an <strong>in</strong>teger program we have:<br />

• Variables:<br />

{<br />

1, if the arc j is part of the path<br />

x j =<br />

0, otherwise<br />

• Objective function:<br />

m<strong>in</strong> ∑ c j x j<br />

• constra<strong>in</strong>ts:<br />

⎧<br />

∑ ⎨ 1, if i = s<br />

aij x j = −1, if i = t<br />

⎩<br />

1, if i ≠ s, t<br />

6.5.2 M<strong>in</strong>imal cost flow problem<br />

This is a more general case of the shortest path problem.<br />

For each node we have: d i which is the available amount (d i positive), or the<br />

dem<strong>and</strong>ed amount (d i negative) associated to the node i.<br />

For each arc we have: µ j which is the capacity of the arc.<br />

The problem consists <strong>in</strong>:<br />

m<strong>in</strong> ∑ c j x j<br />

with 0 ≤ x j ≤ µ j .<br />

subject to ∑ a ij x j = d i


6.5. INTEGER PROGRAMMING 131<br />

6.5.3 L<strong>in</strong>ear programs with <strong>in</strong>teger solutions<br />

Theorem 18. A flow problem of m<strong>in</strong>imal cost with <strong>in</strong>teger dem<strong>and</strong>s <strong>and</strong> capacities<br />

has an optimal <strong>in</strong>teger solution.<br />

m<strong>in</strong> c j x j<br />

subject to ∑ j<br />

a ij x j = d i , i = 1, . . . |V |<br />

for0 ≥ x j ≥ µ j<br />

Where a ij are the elements of the <strong>in</strong>cidence matrix A, <strong>and</strong> x j corresponds to the<br />

amount sent along the arc j.<br />

Proof. We will prove that non-<strong>in</strong>teger solutions are not basic.<br />

Let x be a non-<strong>in</strong>teger solution of the problem. There exists a circle comprised<br />

by arcs where x j is fractionary.<br />

Let x(ɛ) be a new solution given by<br />

⎧<br />

⎨ x j ,<br />

x j (ɛ) = x j + ɛ,<br />

⎩<br />

x j − ɛ,<br />

if j ∉ cycle<br />

if j ∈ cycle with equal orientation<br />

if j ∈ cycle with opposite orientation<br />

x(ɛ) satisfies the conservation equations.<br />

x(ɛ) satisfies 0 ≥ x(ɛ) of µ j for |ɛ| sufficiently small.<br />

S<strong>in</strong>ce x = 1x(ɛ) = 1 x(−ɛ), x is not a basic solution.<br />

2 2<br />

Particular cases<br />

• shortest path: One node with d = 1, one node with d = −1, <strong>and</strong> µ j = 1,<br />

∀j.<br />

• Bipartite match<strong>in</strong>g (or Assignment match<strong>in</strong>g):<br />

Remark. The l<strong>in</strong>ear program has an <strong>in</strong>teger solution <strong>in</strong> this case, but not <strong>in</strong><br />

the case of non-bipartite match<strong>in</strong>g.


132 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

6.5.4 Branch-<strong>and</strong>-Bound methods<br />

Branch-<strong>and</strong>-bound are methods of <strong>in</strong>telligent enumeration for problems of comb<strong>in</strong>atorial<br />

optimization.<br />

These methods have two <strong>in</strong>gredients: branch <strong>and</strong> bound (establish thresholds).<br />

The strategy consists <strong>in</strong> estimat<strong>in</strong>g possible solutions <strong>and</strong> subdivide the problem.<br />

Characteristics<br />

• Enumerates the alternatives, look<strong>in</strong>g for branches based on estimates<br />

(bound) of the correspond<strong>in</strong>g solution of an “easier” problem.<br />

• Stop the enumeration <strong>in</strong> a branch when the estimate (upper bound) is <strong>in</strong>ferior<br />

to the best known solution.<br />

• Use some heuristics <strong>in</strong> order to choose the node to be exploited: the one<br />

that has the greatest upper bound.<br />

• Complexity: In the worst case it is equal to the number of alternatives. In<br />

practice, it is much smaller.<br />

These methods will be studied with more details <strong>in</strong> Chapter 7.


6.6. APPLICATIONS IN COMPUTER GRAPHICS 133<br />

6.6 Applications <strong>in</strong> computer graphics<br />

Most of the applications of optimization methods <strong>in</strong> image synthesis is related<br />

with the problem of resource allocation. In general, the visualization system has<br />

limited process<strong>in</strong>g or memory resources for exhibit<strong>in</strong>g images. Therefore, it is<br />

necessary to use computational techniques to adapt the generated images to the<br />

available resources.<br />

In this chapter we will discuss applications of comb<strong>in</strong>atorial optimization techniques<br />

<strong>in</strong> different areas of computer graphics. More precisely, we will consider<br />

the follow<strong>in</strong>g problems:<br />

• Image quantization;<br />

• Interactive visualization;<br />

• Surface reconstruction;<br />

• M<strong>in</strong>imal curves on maps;<br />

6.6.1 Image quantization<br />

The problem of image quantization is a good example of a problem that exploits<br />

the idea of limited resources <strong>in</strong> order to model it as a comb<strong>in</strong>atorial optimization<br />

problem.<br />

The problem of quantization can be stated as follows:<br />

F<strong>in</strong>d a set of k colors that represent the gamut of an image f(u, v) with the<br />

smallest distortion.<br />

The problem is clearly an optimization problem. Moreover, s<strong>in</strong>ce a digital<br />

image has a discrete representation, the problem can be posed as a comb<strong>in</strong>atorial<br />

problem.<br />

In order to better formulate the problem, we need some prelim<strong>in</strong>ary def<strong>in</strong>itions.<br />

An image is a graphical object O = (U, f), with f : U ⊂ R 2 → C, where C<br />

is a color space. The attribute function f is called image function. The set U is<br />

called the image support <strong>and</strong> the image set f(U) is called the image gamut.<br />

In graphical applications, f is a digital image, which has a discrete representation.<br />

In this way, <strong>in</strong> general, both the doma<strong>in</strong> <strong>and</strong> the range set the attribute


134 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

function f are discretized <strong>and</strong>, for this reason, f is called a discrete-discrete image.<br />

The image support is usually a rectangle U = [a, b]×[c, d] sampled accord<strong>in</strong>g<br />

to a uniform grid F ∆ = {(u i , v j ) ∈ R 2 }, where u i = a + i∆u e v j = c + j∆v<br />

with ∆u = (b − a)/m, ∆v = (d − c)/n e i = 0, . . . m, j = 0, . . . n. The po<strong>in</strong>ts<br />

(u i , v j ) are called sample po<strong>in</strong>ts.<br />

The color space C of the image is usually a tricromatic color space (C ≡ R 3 ) or<br />

a monocromatic one (C ≡ R). Moreover, the values of f <strong>in</strong> the sample po<strong>in</strong>ts are<br />

usually represented by an <strong>in</strong>teger number with l bits (another option would be the<br />

representation by a float<strong>in</strong>g po<strong>in</strong>t number). That is, the image gamut is a f<strong>in</strong>ite set<br />

with 2 l elements.<br />

Figure 6.6 shows a discrete–discrete monocromatic image with its doma<strong>in</strong> U<br />

<strong>and</strong> range set C.<br />

Figure 6.6: Monocromatic discrete image.<br />

Quantization<br />

A quantization of an image is the process of discretization or reduction of its<br />

color gamut. Now we will def<strong>in</strong>e <strong>in</strong> a more precise way the problem of quantization.<br />

A quantization is a surjective transformation q : R n → M k , where


6.6. APPLICATIONS IN COMPUTER GRAPHICS 135<br />

M k = {c 1 , c 2 , . . . c k } is a f<strong>in</strong>ite subset of R n . The set M k is called color map of<br />

the quantization transformation. When k = 2 l , we say that q is a quantization<br />

with l bits. Given a discrete representation of an image, it is usual to have a quantization<br />

among f<strong>in</strong>ite subsets of color, of the type q : R j → M k . Se j = 2 n <strong>and</strong><br />

k = 2 m . In this case we say that we have a quantization from n to m bits.<br />

Consider a quantization map q : R n → M k . The elements c i de M k are called<br />

quantization levels. To each quantization level c i ∈ M k , there corresponds a<br />

subset of colors C i ⊂ R n , which are the colors mapped to the color c i by the<br />

transformation q, that is<br />

C i = q −1 (c i ) = {c ∈ C ; q(c) = c i }<br />

The family of sets C i constitute a partition of the color space. Each of the partition<br />

sets C i is called quantization cell. Note that the quantization map q assumes a<br />

constant value, equal to c i , on each cell C i .<br />

When the color space is monochromatic (C ≡ R), we say that the quantization<br />

is one-dimensional. We will concentrate here <strong>in</strong> this simpler case, although<br />

most of the concepts we will <strong>in</strong>troduce extend naturally to the case of color images.<br />

Figure 6.7 shows the quantization levels <strong>and</strong> the graph of a one-dimensional<br />

quantization function.<br />

Figure 6.7: Quantization levels <strong>and</strong> graph of a one-dimensional quantization map.<br />

Note that the quantization function q is completely determ<strong>in</strong>ed by the quantization<br />

cells C i <strong>and</strong> the quantization levels c i . In fact, us<strong>in</strong>g geometric arguments, we<br />

can obta<strong>in</strong> c i from C i <strong>and</strong> vice-versa. Thus, some quantization methods compute<br />

first the levels c i <strong>and</strong> then the cells C i , while others use the <strong>in</strong>verse strategy.


136 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

In order to estimate the effectiveness of the quantization transformation, we<br />

have to def<strong>in</strong>e the notion of quantization error. Let c be a color to be quantized<br />

<strong>and</strong> q the quantization map, then c = q(c)+e q . The quantization error e q = c−q(c)<br />

is the distance d(c, q(c)) between the orig<strong>in</strong>al color c <strong>and</strong> the quantized color q(c),<br />

us<strong>in</strong>g an appropriate metric. The metric RMS (root mean square), given by the<br />

square of the euclidean norm, is widely used because of its simplicity.<br />

The quantization error associated to a cell C k is given by<br />

E(k) = ∑ c∈C k<br />

h(c)d(c, c k ), (6.2)<br />

where h(c) is the number of occurrences of the color c <strong>in</strong> the image f. This<br />

number is given by the frequency histogram of the image which provides for each<br />

color, the number of pixels that is mapped to that color. (Note that the normalized<br />

histogram is an approximation to the probability distribution of the colors of the<br />

image f).<br />

Given a quantization cell C k , it is possible to prove that the optimal level of<br />

quantization c k correspond<strong>in</strong>g to C k is the centroid of the set of colors {c; c ∈ C k }<br />

weighted by the distribution of the colors <strong>in</strong> C k .<br />

The total quantization error for an image f is the sum of the quantization error<br />

over all of the samples of the image<br />

E(f, q) = ∑ ∑<br />

d(f(u i , v j ), q(f(u i , v j ))) (6.3)<br />

i j<br />

We remark that the above equation can be expressed, us<strong>in</strong>g the quantization<br />

cells <strong>and</strong> the levels of quantization, as E(f, q) = ∑ k E(C k).<br />

The quantization problem consists <strong>in</strong> comput<strong>in</strong>g c i <strong>and</strong> C i <strong>in</strong> such a way that<br />

the error E(f, q) be m<strong>in</strong>imum. This is equivalent to obta<strong>in</strong> the optimal quantizer<br />

ˆq over all of the possible K-partitions Q K of the color space.<br />

Quantization us<strong>in</strong>g dynamic programm<strong>in</strong>g<br />

ˆq = arg m<strong>in</strong><br />

q∈Q K<br />

E(f, q) (6.4)<br />

We have seen that the problem of color quantization can be stated as an optimization<br />

problem over a discrete set. Nevertheless, the size of the solution set of the


6.6. APPLICATIONS IN COMPUTER GRAPHICS 137<br />

equation 6.4 is of order |Q K | = ( N−1<br />

K−1)<br />

= O(N K−1 ), where K is the number of<br />

quantization levels <strong>and</strong> N is the number of dist<strong>in</strong>ct colors of the image f.<br />

Clearly, we need an efficient optimization method to solve the problem. We<br />

will show how to use dynamic programm<strong>in</strong>g for the case of monochromatic quantization<br />

(C ≡ R).<br />

Let qk n the optimal quantizer of n to k levels. The first t cells of qn k , 1 < t < k<br />

with k < n, constitute a partition of the <strong>in</strong>terval [0, x t ] of gray levels. It is possible<br />

to prove that these cells constitute an optimum quantizer for t levels of the subset<br />

of the colors c ∈ [0, x t ]. This property is equivalent to the pr<strong>in</strong>ciple of optimality<br />

<strong>and</strong> allows us to obta<strong>in</strong> a solution of the problem us<strong>in</strong>g dynamic programm<strong>in</strong>g.<br />

Let L(n, k) = x k−1 , the superior limit of the (k − 1)–th cell of the optimal<br />

quantizer ˆq k n . Then, by the Pr<strong>in</strong>ciple of Optimality,<br />

L(n, k) = arg m<strong>in</strong> {E(f, ˆq i<br />

x k


138 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

Algorithm 4 Quantize(f, N, K)<br />

E[i] = E(0, x i ), for 1 ≤ i ≤ N<br />

L[k, k] = k − 1, for 1 ≤ k ≤ K<br />

for k = 2, . . . , K do<br />

for n = k + 1, . . . , N − K + k do<br />

x = n − 1<br />

e = E[n − 1]<br />

for t = n − 2, . . . , k − 1 do<br />

if E[t] + E(x t , x n ) < e then<br />

x = t<br />

e = E[t] + E(x t , x n )<br />

end if<br />

end for<br />

L[n, k] = x<br />

E[n] = e<br />

end for<br />

end for<br />

return q N K = (L[x k, k]) k=1,...,K


6.6. APPLICATIONS IN COMPUTER GRAPHICS 139<br />

(a)<br />

(b)<br />

Figure 6.8: Quantization from 256 (a) to 4 (b) gray levels.<br />

space partition. This is the basis for the sequenc<strong>in</strong>g of the stages of the dynamic


140 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

programm<strong>in</strong>g.<br />

This algorithm can be extended to color quantization if we establish an order<strong>in</strong>g<br />

of the quantization cells. One way to obta<strong>in</strong> this is to consider the projection of the<br />

colors of the image along a direction <strong>and</strong> create quantization cells with orthogonal<br />

planes to this direction. This method works well because, <strong>in</strong> general, the colors of<br />

an image are concentrated along a pr<strong>in</strong>cipal direction.<br />

Both versions, monochromatic <strong>and</strong> polychromatic, of this algorithm were <strong>in</strong>troduced<br />

<strong>in</strong> (Wu, 1992).<br />

The general problem of multidimensional quantization is NP-complete. There<br />

exist algorithms based <strong>in</strong> several heuristics, such as “cluster analysis”, that although<br />

they do not guarantee an optimal solution, they produce good results. The<br />

<strong>in</strong>terested reader should take a look at (<strong>Velho</strong> et al. , 1997).<br />

6.6.2 Interactive visualization<br />

Now we will cover a problem <strong>in</strong> the area of visualization which is <strong>in</strong> the category<br />

of the use of limited resources <strong>in</strong> graphics process<strong>in</strong>g.<br />

The ma<strong>in</strong> objective of the <strong>in</strong>teractive visualization is to exhibit data, that can<br />

be vary<strong>in</strong>g dynamically, <strong>in</strong> such a way to allow its exploration <strong>and</strong> manipulation<br />

by the user.<br />

For this, images from these data must be generated at <strong>in</strong>teractive rates, that is,<br />

<strong>in</strong> a way to make it possible a feedback of the process by the user bases on image<br />

updat<strong>in</strong>g.<br />

This type of dem<strong>and</strong> exist <strong>in</strong> every <strong>in</strong>teractive graphical applications, such as<br />

geometric model<strong>in</strong>g systems <strong>and</strong> Computer Aided Design (CAD) systems.<br />

A stronger constra<strong>in</strong>t is the visualization <strong>in</strong> real time implemented <strong>in</strong> flight<br />

simulators <strong>and</strong> other similar applications. In this case, besides the time to draw<strong>in</strong>g<br />

the image be<strong>in</strong>g shorter (typically 1/60 of a second) no variation <strong>in</strong> the exhibition<br />

rate is allowed.<br />

Based on the above description, the <strong>in</strong>teractive visualization problem of data<br />

can be stated as follows: We should produce the best visual representation of the<br />

data with<strong>in</strong> the limits of the allocated time <strong>in</strong>terval.<br />

This <strong>in</strong>formal def<strong>in</strong>ition of <strong>in</strong>teractive visualization makes it clear that this is an<br />

optimization problem (produce the best image) with constra<strong>in</strong>t (allocated time).<br />

We will concentrate our efforts <strong>in</strong> the visualization of 3-dimensional scenes,


6.6. APPLICATIONS IN COMPUTER GRAPHICS 141<br />

although most of the concepts can be applied <strong>in</strong> the visualization of other types<br />

of data, such as 2-dimensional scenes, sequence of images (animation), or even<br />

volumetric data.<br />

A 3-dimensional scene is composed by a set of graphical objects positioned<br />

<strong>in</strong> relation to a common referential, called global coord<strong>in</strong>ate system (or scene<br />

coord<strong>in</strong>ate system).<br />

A graphical object O = (U, f), consists of its geometric support U ⊂ R 3 <strong>and</strong><br />

its attributes f : U → R p . The geometry of a graphical object def<strong>in</strong>e its shape.<br />

The attributes of a graphical object def<strong>in</strong>e its properties, such as color, reflectance,<br />

that, <strong>in</strong> general, are related with its visual appearance.<br />

Maybe the most common example is a 2-dimensional polygonal object embedded<br />

<strong>in</strong> the 3-dimensional space. The shape is given by a surface describe by a<br />

mesh of triangles <strong>and</strong> its attributes are def<strong>in</strong>ed on the vertices of this mesh.<br />

A graphical object of variable resolution <strong>in</strong>corporates <strong>in</strong> the representation the<br />

possibility of extract<strong>in</strong>g approximate versions of its geometry. These are simplified<br />

representations, also called levels of details, which can be atta<strong>in</strong>ed to satisfy<br />

two different types of criteria: approximation error (local or global) <strong>and</strong> size of<br />

the representation.<br />

In Figure 6.9, we show a polygonal graphical object us<strong>in</strong>g three levels of detail,with<br />

32, 428, <strong>and</strong> 2267 polygons.<br />

(a) (b) (c)<br />

Figure 6.9: Spock represented with 3 levels of detail.<br />

A graphical object can be composed of several sub-graphical objects Moreover,<br />

these objects can be structured <strong>in</strong> a hierarchical form. In the first case, we


142 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

have a composed graphical object <strong>and</strong> <strong>in</strong> the second case, we have a hierarchy of<br />

graphical objects.<br />

In general, it is possible to substitute a group of graphical objects by a unique<br />

graphical object that represents the group. This operation is called cluster<strong>in</strong>g. The<br />

result<strong>in</strong>g graphical object is called impostor.<br />

The hierarchical representation is constituted by collections of graphical objects<br />

clustered accord<strong>in</strong>g to logical levels. At each level change, a group of subord<strong>in</strong>ated<br />

objects (sons), may have its representation substituted by a unique simpler<br />

graphical object (father). This type of hierarchy is called level of detail hierarchy.<br />

Some graphical objects have a natural hierarchy. This is the case of objects<br />

with articulated structures. For example, the human body can be represented by<br />

the follow<strong>in</strong>g hierarchy:<br />

⎧<br />

Head<br />

Torso<br />

Right Arm<br />

{ Upper Arm<br />

Lower Arm<br />

Body<br />

⎪⎨<br />

Left Arm<br />

{ Upper Arm<br />

Lower Arm<br />

Right Leg<br />

{ Thigh<br />

Lower Leg<br />

⎪⎩<br />

Left Leg<br />

{<br />

Thigh<br />

Lower Leg<br />

For graphical objects with hierarchical structure, the creation of a hierarchy of<br />

levels of details is easy. In Figure 6.10, we show the correspond<strong>in</strong>g representations<br />

to the three levels of cluster<strong>in</strong>g of the hierarchy of the human body.<br />

We should remark that the cluster<strong>in</strong>g operation can be atta<strong>in</strong>ed <strong>in</strong> arbitrary sets<br />

of graphical objects <strong>in</strong> a three-dimensional scene, even if these objects do not<br />

have a natural hierarchical structure. In this case, the cluster<strong>in</strong>g is done based<br />

on the proximity relations between the objects. These groups can be grouped to<br />

constitute a hierarchy of the scene objects.


6.6. APPLICATIONS IN COMPUTER GRAPHICS 143<br />

(a) (b) (c)<br />

Figure 6.10: Human body with three levels of cluster<strong>in</strong>g.<br />

<strong>Optimization</strong> of level of details<br />

Interactive visualization implies the ma<strong>in</strong>tenance of a fixed rate <strong>in</strong> the generation<br />

of images from the scene. That is, the time to depict the objects of the scene on the<br />

screen cannot be greater than an pre-def<strong>in</strong>ed threshold. We have seen that the multiresolution<br />

representation of graphical objects allows us to control its geometric<br />

complexity <strong>and</strong>, consequently, the time to draw it.<br />

Interactive visualization methods need to determ<strong>in</strong>e the level of detail of the<br />

objects <strong>in</strong> the scene <strong>in</strong> such a way that the time to draw is below the threshold.<br />

There are two types of strategy for this purpose: reactive <strong>and</strong> predictive methods.<br />

Reactive methods are based on the time spent draw<strong>in</strong>g previous images. If<br />

this time is greater than the threshold, the level of detail is reduced; otherwise, it<br />

is <strong>in</strong>creased. Predictive methods are based on the complexity of the scene to be<br />

visualized. Thus, these methods need to estimate the draw<strong>in</strong>g time correspond<strong>in</strong>g<br />

to each level of detail of the objects.<br />

Although predictive methods are more difficult to implement, they are the only<br />

ones that allow us to guarantee a fixed rate of image generation. For this reason,<br />

we are go<strong>in</strong>g to develop a predictive method us<strong>in</strong>g optimization.<br />

Before describ<strong>in</strong>g the method, we have to <strong>in</strong>troduce some def<strong>in</strong>itions.<br />

A visual realization of the graphical object O, determ<strong>in</strong>ed by the triple<br />

(O, L, R), corresponds to a unique <strong>in</strong>stance of O <strong>in</strong> the level of detail L which<br />

has been drawn us<strong>in</strong>g the algorithm R.<br />

We def<strong>in</strong>e two heuristics: cost(O, L, R) <strong>and</strong> benefit(O, L, R). The cost func-


144 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

tion estimates the time necessary to generate a realization of the object. The<br />

benefit function estimates the contribution of the object realization for the perceptual<br />

effectiveness of the scene. We also def<strong>in</strong>e the set V of objects of the scene<br />

which are visualized <strong>in</strong> the current frame.<br />

With these def<strong>in</strong>itions, we can pose the <strong>in</strong>teractive visualization problem as:<br />

Maximize<br />

Subject to<br />

∑<br />

benefit(O i , L i , R i ) (6.7)<br />

(O i ,L i ,R i )∈C<br />

∑<br />

cost(O i , L i , R i ) ≤ T (6.8)<br />

(O i ,L i ,R i )∈C<br />

where T is the time allowed to draw the image.<br />

We should rem<strong>in</strong>d that this formulation, besides captur<strong>in</strong>g the essential aspects<br />

of the problem, makes it possible its application <strong>in</strong> different contexts related with<br />

image synthesis. This depends on the type of geometric representation <strong>and</strong> of the<br />

class of algorithms used.<br />

Apply<strong>in</strong>g the above method requires that we have an efficient algorithm to<br />

compute the functions cost <strong>and</strong> benefit. We will assume that this is possible for<br />

the practical cases.<br />

We describe now some heuristics to estimate these functions <strong>in</strong> the case of<br />

multiresolution polygonal meshes us<strong>in</strong>g the z-buffer algorithm.<br />

The cost function depends on two factors:<br />

• primitive process<strong>in</strong>g: coord<strong>in</strong>ate transformations, clipp<strong>in</strong>g, illum<strong>in</strong>ation.<br />

• pixel process<strong>in</strong>g: rasterization, z-buffer, texture mapp<strong>in</strong>g, Gouraud <strong>in</strong>terpolation.<br />

These factors correspond to the two stages of the visualization process (see Figure<br />

6.11).<br />

Us<strong>in</strong>g this model we estimate draw<strong>in</strong>g time by<br />

{<br />

c1 R<br />

cost(O, L, R) = max poly (O, L) + c 2 Rvert(O, L)<br />

c 3 R pix (O)


6.6. APPLICATIONS IN COMPUTER GRAPHICS 145<br />

SCENE<br />

Per Primitive<br />

Per Pixel<br />

Process<strong>in</strong>g Process<strong>in</strong>g<br />

IMAGE<br />

Figure 6.11: Visualization process <strong>in</strong> two stages.<br />

where R poly , Rvert <strong>and</strong> R pix are, respectively, the process<strong>in</strong>g times of polygons,<br />

vertices <strong>and</strong> pixels of the algorithm R <strong>and</strong> c 1 , c 2 , c 3 are constants that depend on<br />

the hardware used (i.e. FLOPS, etc).<br />

The function benefit depends ma<strong>in</strong>ly of the area occupied by the object <strong>and</strong><br />

of the fidelity of its visualization. Moreover, this function depends on several<br />

perceptual factors among them:<br />

• semantics: def<strong>in</strong>e the relative importance of the objects;<br />

• focus: given by the region of <strong>in</strong>terest;<br />

• speed: depends on the motion (motion blur);<br />

• hysteresis: level of detail must vary smoothly.<br />

Based on these elements we estimate the effectiveness of the image as<br />

benefit(O, L, R) = area(O) × fidelity(O, R) ×<br />

importance(O) × foco(O) ×<br />

mov(O) × hist(O, L, R)<br />

The reader should observe that the preponderant factors <strong>in</strong> the above equation<br />

are the size of the object (i.e. area(O)) <strong>and</strong> the fidelity of its realization, which is<br />

determ<strong>in</strong>ed by the reconstruction used <strong>in</strong> the visualization algorithm.<br />

The <strong>in</strong>teractive visualization method by optimization uses the cost <strong>and</strong> benefit<br />

functions def<strong>in</strong>ed above <strong>in</strong> order to choose the representation set C that satisfies<br />

the equation 6.7 for each frame.<br />

The reader should observe that this optimization problem with constra<strong>in</strong>ts is a<br />

version of the knapsack problem <strong>in</strong> which the items are partitioned <strong>in</strong>to subsets


146 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

of c<strong>and</strong>idates, <strong>and</strong> only one item of each subset can be <strong>in</strong>serted <strong>in</strong> the knapsack.<br />

In this case, the items <strong>in</strong> the knapsack constitute the set C <strong>and</strong> the realizations<br />

(O i , L i , R i ) are the items <strong>in</strong>serted <strong>in</strong> the knapsack of each subset S i .<br />

Unfortunately, this version of the knapsack problem is NP -hard. On the other<br />

h<strong>and</strong>, a reasonable solution is obta<strong>in</strong>ed us<strong>in</strong>g a greedy algorithm. This algorithm,<br />

at each step, chooses the realization of greatest value v, given by the ratio<br />

v = benefit(O i, L i , R i )<br />

cost(O i , L i , R i )<br />

The items are selected <strong>in</strong> decreas<strong>in</strong>g order of value until if fills the size of the<br />

knapsack, given by the equation 6.8. We remarkthat s<strong>in</strong>ce only a realization of<br />

each object O i can be part of the set C, its realization (O i , L i , R i ) with greatest<br />

value v will be added to the knapsack.<br />

Comments <strong>and</strong> references<br />

The <strong>in</strong>teractive visualization by optimization was developed by Funkhouser <strong>and</strong><br />

Sequim (Funkhouser & Séqu<strong>in</strong>, 1993) for the case of multiple levels of detail<br />

without cluster<strong>in</strong>g. The problem was also studied by Mason <strong>and</strong> Blake (Mason &<br />

Blake, 1997), who extended the method to the case of hierarchical representation<br />

of the scene, with cluster<strong>in</strong>g.<br />

Note that this problem is closely related with the computation of meshes of<br />

variable resolution adapted to the visualization conditions.<br />

6.6.3 Geometric model<strong>in</strong>g<br />

Most of the applications of optimization methods <strong>in</strong> geometric model<strong>in</strong>g is connected<br />

with the problem of construct<strong>in</strong>g optimal geometric shapes. In general, a<br />

model<strong>in</strong>g system <strong>in</strong>tegrates several tools <strong>in</strong> order to create <strong>and</strong> analyze objects.<br />

In the applications, the shape of the objects must comply with certa<strong>in</strong> functional<br />

or aesthetic requirements. For example, a curve must pass through some predeterm<strong>in</strong>ed<br />

po<strong>in</strong>ts <strong>and</strong>, at the same time, it must be smooth. For this reason,<br />

normally, it is very difficult for the user to accomplish such tasks manually, without<br />

us<strong>in</strong>g appropriate tools. <strong>Optimization</strong> methods enable the use of a natural<br />

formulation of the problem <strong>and</strong> its automatic solution.


6.6. APPLICATIONS IN COMPUTER GRAPHICS 147<br />

For this reason, normally, it is very difficult for the user to accomplish more<br />

tasks manually, without us<strong>in</strong>g appropriate tools. <strong>Optimization</strong> methods make it<br />

possible to use a natural formulation of the problem <strong>and</strong> to compute its automatic<br />

solution.<br />

In this section we will study two examples of the use of comb<strong>in</strong>atorial optimization<br />

methods <strong>in</strong> geometric model<strong>in</strong>g. The first example is the computation of<br />

curves of m<strong>in</strong>imum length <strong>in</strong> maps; the second example is the reconstruction of<br />

surfaces from a f<strong>in</strong>ite set of transversal sections.<br />

Shortest curves on maps<br />

In some model<strong>in</strong>g applications it is necessary to compute curves of m<strong>in</strong>imum<br />

length on a surface. This problem is specially relevant <strong>in</strong> GIS (Geographic Information<br />

System) applications. In this type of application we need to f<strong>in</strong>d shortest<br />

paths <strong>in</strong> maps for different purposes, such as, <strong>in</strong> the plann<strong>in</strong>g to build a highway.<br />

Comb<strong>in</strong>atorial optimization methods allow us to compute m<strong>in</strong>imal curves <strong>in</strong><br />

general graphs. We will show how to reduce the problem of f<strong>in</strong>d<strong>in</strong>g optimal paths<br />

<strong>in</strong> maps to the problem of shortest paths <strong>in</strong> neighborhood graphs.<br />

In GIS we treat data def<strong>in</strong>ed on the earth surface. A map is a geo-referenced<br />

region which can be modeled as a two-dimensional graphical object O = (U, f),<br />

where the geometric support U is a rectangle U = [a, b] × [b, c], <strong>and</strong> the attribute<br />

function f describes geographic <strong>in</strong>formation <strong>in</strong> U.<br />

Depend<strong>in</strong>g on the nature of the attributes, we have two types of geographic<br />

objects:<br />

• entity maps; <strong>and</strong><br />

• property maps.<br />

In entity maps or thematic maps, f def<strong>in</strong>es a partition of the region U <strong>in</strong>to subregions<br />

S i , with i = 1, . . . , N, each one of them associated to a geographic entity<br />

E i . For example, a geopolitic map, with the division of a country <strong>in</strong>to its states, is<br />

<strong>in</strong> this category.<br />

In property maps, also called field maps, f def<strong>in</strong>es cont<strong>in</strong>uous attributes that<br />

vary on the region U. For example, an elevation map, describ<strong>in</strong>g the topography<br />

of a region, is <strong>in</strong> this category.


148 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

We remark that for a given region U we might have several dist<strong>in</strong>ct attribute<br />

functions f U associated to it. This is equivalent to say<strong>in</strong>g that the range set of the<br />

attribute function is multidimensional.<br />

The fundamental operations <strong>in</strong> a GIS system correspond to the analysis of these<br />

geographic objects. In this context, the relevant problems are related with:<br />

1. the data correlation between different attribute functions; <strong>and</strong> with<br />

2. the <strong>in</strong>ference of new attribute functions from basic data.<br />

The problem of shortest curves <strong>in</strong> maps is <strong>in</strong> this last class of operations.<br />

Until now, we have discussed a cont<strong>in</strong>uous mathematical model for geographic<br />

objects. The computational implementation requires a discrete representation of<br />

this model.<br />

The discretization of a geographic object is, <strong>in</strong> general, based on a matrix or<br />

vector representation. The vector representation is more appropriate to describe<br />

entity maps, while the matrix representation is the choice to describe property<br />

maps.<br />

Example 24 (Satelite images). Remore sens<strong>in</strong>g satellites capture the irradiation<br />

from regions on the earth surface. These data are discretized by the sensors <strong>and</strong><br />

they are structured <strong>in</strong> such a way that they are ready for a matrix representation.<br />

Therefore we have a discrete image on a regular grid.<br />

From the matrix data, such as satellite images, we are able to extract other<br />

<strong>in</strong>formations. For example, the terra<strong>in</strong> elevation can be computed from the correlation<br />

of po<strong>in</strong>ts <strong>in</strong> two images. This operation is known as aerophotogrametry,<br />

<strong>and</strong> is related with the problem of stereographic vision, which we will discuss<br />

later on.<br />

Map operations can de def<strong>in</strong>ed by n variables functions (comb<strong>in</strong>ation functions),<br />

or by functions of one variable (transformation functions). For example,<br />

we can def<strong>in</strong>e a transformation operation to compute the <strong>in</strong>cl<strong>in</strong>ation of a terra<strong>in</strong><br />

from its topographic map.<br />

The shortest path problem consists <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g a m<strong>in</strong>imal length curve <strong>in</strong> U that<br />

jo<strong>in</strong>s the po<strong>in</strong>t p 0 ∈ U to the po<strong>in</strong>t p 1 ∈ U. The metric used is def<strong>in</strong>ed by an<br />

attribute function of U. This problem, <strong>in</strong> its discrete form, can be solved us<strong>in</strong>g<br />

comb<strong>in</strong>atorial optimization methods.


6.6. APPLICATIONS IN COMPUTER GRAPHICS 149<br />

Shortest paths <strong>in</strong> neighborhood graphs<br />

In this section we pose the discrete problem of optimal paths <strong>in</strong> maps. The <strong>in</strong>put<br />

data consist of an image represented by a matrix (raster image) C, the <strong>in</strong>itial po<strong>in</strong>t<br />

p 0 <strong>and</strong> f<strong>in</strong>al po<strong>in</strong>t p 1 of the path to be computed, both po<strong>in</strong>ts belong<strong>in</strong>g to C.<br />

The image C is given by a matrix M × N, <strong>in</strong> which the value c(i, j) ∈ R + of<br />

entry (i, j) provides the cost to traverse the pixel (we can <strong>in</strong>terpret this value as<br />

the diameter of the pixel us<strong>in</strong>g the metric def<strong>in</strong>ed by C).<br />

The problem consists <strong>in</strong> comput<strong>in</strong>g the path of m<strong>in</strong>imum cost jo<strong>in</strong><strong>in</strong>g the po<strong>in</strong>ts<br />

p 0 = (i 0 , j 0 ) <strong>and</strong> p 1 = (i 1 , j 1 ).<br />

The first step to solve the problem is to construct an adequate graph. We have<br />

two parameters to consider: the density of the graph <strong>and</strong> the neighborhood of each<br />

graph node.<br />

We def<strong>in</strong>e the neighborhood graph G = (V, A), where V is the set of graph<br />

vertices <strong>and</strong> A is the set of arcs that jo<strong>in</strong> two neighbor vertices. The set V is a<br />

discretization of the <strong>in</strong>put image C. For convenience, we use the orig<strong>in</strong>al discretization<br />

of C. Thus, V is the set of the M × N pixels (i, j) ∈ C, where<br />

i = 0, . . . N − 1 e j = 0, . . . M − 1.<br />

The set A is determ<strong>in</strong>ed by a topology on the grid c ij , from which we def<strong>in</strong>e<br />

the discrete neighborhood of a pixel. Two natural options are the 4-connected <strong>and</strong><br />

8-connected neighborhoods (see Figure 6.12).<br />

Figure 6.12: Neighborhoods of a pixel.<br />

We will use the topology def<strong>in</strong>ed by the 4-connected neighborhood. Thus, for


150 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

each pixel (i, j) we construct four arcs<br />

a l = ((i, j), (i + 1, j))<br />

a r = ((i, j), (i − 1, j))<br />

a t = ((i, j), (i, j + 1))<br />

a b = ((i, j), (i, j − 1))<br />

that jo<strong>in</strong> (i, j) to its neighbors (i + 1, j), of the lect (i − 1, j), from above (i, j + 1)<br />

<strong>and</strong> from below (i, j − 1).<br />

To each arc a = ((i, j), (k, l)) we associate a cost<br />

c(i, j) + c(k, l)<br />

c a ((i, j), (k, l)) =<br />

2<br />

With the problem structured <strong>in</strong> this form, we can solve it us<strong>in</strong>g Dijkstra algorithm,<br />

because the costs are all positive. We remark that it is not necessary to<br />

construct the graph directly to obta<strong>in</strong> the path.<br />

Comments <strong>and</strong> references<br />

The algorithm Voxel Cod<strong>in</strong>g (Zhou et al. , 1998) solves a simplified version of<br />

the problem for n-dimensional volumetric data with functions of constant cost.<br />

6.6.4 Surface reconstruction<br />

The reconstruction of three dimensional surfaces from plane curves is a powerful<br />

technique <strong>in</strong> geometric model<strong>in</strong>g. This technique has special relevance <strong>in</strong> several<br />

computer graphics application areas. In particular, it is used <strong>in</strong> the area of medical<br />

images to create anatomic models from volumetric data. This is the case of computerized<br />

tomography (CT) <strong>and</strong> magnetic resonance (MRI). Moreover, the reconstruction<br />

of surfaces us<strong>in</strong>g transversal sections is closely related with the loft<strong>in</strong>g<br />

technique which is widely used <strong>in</strong> CAD/CAM, which allows us to construct solid<br />

shapes us<strong>in</strong>g the extrusion of contours.<br />

The surface reconstruction problem us<strong>in</strong>g transversal sections consists <strong>in</strong> produc<strong>in</strong>g<br />

the best surface that conta<strong>in</strong> a set of pre-def<strong>in</strong>ed curves.<br />

This problem is clearly an optimization one, <strong>and</strong>, when the curves have a discrete<br />

description, it can be solved us<strong>in</strong>g comb<strong>in</strong>atorial optimization methods.


6.6. APPLICATIONS IN COMPUTER GRAPHICS 151<br />

Some def<strong>in</strong>itions<br />

The data for the problem of surface reconstruction by transversal sections consist<br />

<strong>in</strong> a set of transversal sections {S i |i = 1, . . . , K} def<strong>in</strong>ed <strong>in</strong> planes orthogonal to<br />

a given direction z, correspond<strong>in</strong>g to a monotone sequence (z i ), i = 1, . . . , K (i.e<br />

i < i + 1, ∀i).<br />

Each transversal section S i consists of one or more contours, given by polygonal<br />

closed curves P = (p 0 , p 1 , . . . p n ), (i.e p 0 = p n ), that represent the <strong>in</strong>tersection<br />

of the surface to be reconstructed with the support plane z i of the section S i . In<br />

order to simplify the problem, we assume that each section has only one contour,<br />

that is, each contour curve is connected.<br />

The discrete reconstruction problem consists <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g the mesh of triangles<br />

that conta<strong>in</strong>s the given polygonal curves <strong>and</strong> at the same time satisfies certa<strong>in</strong><br />

optimality criteria.<br />

We should remark that it is possible to decompose the above problem <strong>in</strong>to a<br />

sequence of simpler subproblems: the construction of meshes determ<strong>in</strong>ed by pairs<br />

of contours <strong>in</strong> adjacent transversal sections S i <strong>and</strong> S i+1 .<br />

A natural <strong>in</strong>terpretation <strong>in</strong> this context is to see the problem as the one of f<strong>in</strong>d<strong>in</strong>g<br />

the topology of the triangle mesh that corresponds to the polygonal surface<br />

with “best” geometry (we remark that the def<strong>in</strong>ition of the optimality criteria depends<br />

on the precise def<strong>in</strong>ition of a “good” surface).<br />

In order to f<strong>in</strong>d construct the surface topology <strong>and</strong> geometry def<strong>in</strong>ed by consecutive<br />

curves we have to construct the polygons between these two contours.<br />

Essentially, the problems is reduced to the one of f<strong>in</strong>d<strong>in</strong>g vertex correspondences<br />

<strong>in</strong> the contours <strong>in</strong> order to construct the triangles.<br />

<strong>Optimization</strong> methods for contour reconstruction<br />

The use of comb<strong>in</strong>atorial optimization methods <strong>in</strong> the solution of the problem<br />

requires that the topological relations be formulated us<strong>in</strong>g a graph.<br />

We def<strong>in</strong>e the contours P ∈ S i <strong>and</strong> Q ∈ S i+1 by vertices lists P =<br />

(p 0 , p 1 , . . . p m−1 ) <strong>and</strong> Q = (q 0 , q 1 , . . . q n−1 ), with p k , q l ∈ R 3 . S<strong>in</strong>ce the contours<br />

represent closed polygonal curves, p 0 follows p m−1 <strong>and</strong> q 0 follows q n−1 . That is,<br />

the <strong>in</strong>dices of P <strong>and</strong> Q must be <strong>in</strong>terpreted modulo m <strong>and</strong> n respectively.<br />

Because of the nature of the problem, the valid triangles of the mesh are el-


152 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

ementary triangles, for which one of the edges belongs to one of the contours<br />

(P or Q) <strong>and</strong> the other two edges jo<strong>in</strong> this edge to a vertex <strong>in</strong> the other contour.<br />

That is, an elementary triangle is of the form (q j , q j+1 , p i ), or (p i+1 , p i , q j ) (see<br />

Figure 6.13)<br />

Figure 6.13: Elementary triangle.<br />

In an elementary triangle we dist<strong>in</strong>guish two types of edges: we call segments<br />

the edges that belong to the contours P <strong>and</strong> Q; we call span the edges that jo<strong>in</strong><br />

one contour to the other.<br />

A polygonal mesh that reconstructs the region of the surface between the contours<br />

P <strong>and</strong> Q, has the follow<strong>in</strong>g properties:<br />

M1 - each segment is the edge of exactly one triangle of the mesh.<br />

M2 - if there exists a determ<strong>in</strong>ed span, it is the common edge of two triangles<br />

adjacent to the mesh.<br />

The surface def<strong>in</strong>ed by this mesh is called acceptable surface.<br />

There exists a great number of meshes that satisfy the requirements, M1 <strong>and</strong><br />

M2, above. In order to better underst<strong>and</strong> the valid meshes we def<strong>in</strong>e the directed<br />

graph G = (V, A) <strong>in</strong> which the nodes v ∈ V correspond to the set of all of the<br />

possible spans between the vertices of the contour P <strong>and</strong> Q, <strong>and</strong> the arcs a ∈ A<br />

correspond to the set of elementary triangles of the mesh. More precisely:<br />

V = {v = (i, j) : v ↦→ (p i , q j )}<br />

A = {a = (i, j, k, l) : a ↦→ (p i+1 , p i , q j ), a ↦→ (q j , q j+1 , p k )}


6.6. APPLICATIONS IN COMPUTER GRAPHICS 153<br />

We remark that an edge a = (v i,j , v k,l ) represents a triangle delimited by the<br />

spans (p i , q j ) <strong>and</strong> (p k , q l ). The other edge, given by (p i , p k ), or (q j , q l ), belongs<br />

to the contour P or Q. Thus, elementary triangles satisfy one of the constra<strong>in</strong>ts:<br />

i = k, l = j + 1, or k = i + 1, l = j.<br />

S<strong>in</strong>ce the polygonal curves are closed, G is a toroidal graph (see Figure 6.14).<br />

Figure 6.14: Toroidal graph.<br />

A valid mesh corresponds to a cycle S <strong>in</strong> the graph G. The properties M1<br />

<strong>and</strong> M2 of an acceptable surface can be <strong>in</strong>terpreted based on this graph <strong>in</strong> the<br />

follow<strong>in</strong>g form:<br />

G1 - for each <strong>in</strong>dex i = 0, . . . , m − 1 there exists exactly a unique arc<br />

(v i,j , v k,j+1 ) ∈ S.<br />

G2 - if the vertex v i,j ∈ S exists, then exactly one arc a I ∈ S ends at v i,j , <strong>and</strong><br />

an arc a O departs from v i,j .<br />

From the above remarks, we conclude that the number of valid meshes <strong>in</strong>creases<br />

exponentially with the number of vertices <strong>in</strong> the graph (mn).<br />

It is our goal to f<strong>in</strong>d a valid mesh that represents the best surface whose boundary<br />

is def<strong>in</strong>ed by the contours P <strong>and</strong> Q. Thus, we have formulated the reconstruction<br />

problem by contours as a comb<strong>in</strong>atorial optimization problem.<br />

“F<strong>in</strong>d the valid cycle of m<strong>in</strong>imal cost <strong>in</strong> graph G”.<br />

For this, we have to choose an appropriate metric which will be used to provide<br />

quantitative <strong>in</strong>formation about the quality of the mesh. A local metric associates


154 CHAPTER 6. COMBINATORIAL OPTIMIZATION<br />

to each arc (v i,j , v k,l ) ∈ A a cost c : A → R + . The function c(v i,j , v k,l ) estimates<br />

the triangle contribution correspond<strong>in</strong>g to the quality of the mesh.<br />

There are several possible local metrics. Among them, the most used are:<br />

• Volume - maximiz<strong>in</strong>g the volume is a good metric for convex objects, nevertheless<br />

it does not perform well for concave objects. Moreover, this metric<br />

is very difficult to compute.<br />

• Edge length - m<strong>in</strong>imiz<strong>in</strong>g the length of spans favors the construction of<br />

ruled surfaces. The computation of this metric is simple.<br />

• Area - m<strong>in</strong>imiz<strong>in</strong>g the total area of the surface is, maybe, the most general<br />

metric. It works very well for concave <strong>and</strong> convex surfaces, <strong>and</strong>, moreover,<br />

it is easy to compute.<br />

We remark that we can def<strong>in</strong>e variants of the above metrics. For example,<br />

normalize the pairs of contours, such that both are mapped <strong>in</strong>to a unit evolv<strong>in</strong>g<br />

box.<br />

Back to our optimization problem, we need to compute the cycle S of m<strong>in</strong>imum<br />

cost accord<strong>in</strong>g to the chosen metric <strong>and</strong> that satisfies properties G1 e G2.<br />

The solution to the problem can be considerably simplified by us<strong>in</strong>g an appropriate<br />

transformation <strong>in</strong> the graph G, that reflects the properties of the valid cycles.<br />

We def<strong>in</strong>e the graph G ′ obta<strong>in</strong>ed from glu<strong>in</strong>g two copies of G. More precisely,<br />

G ′ = (V ′ , A ′ ) where<br />

V ′ = {v i,j : i = 0, . . . , 2m; j = 0, . . . , n}<br />

<strong>and</strong> A ′ is def<strong>in</strong>ed <strong>in</strong>, similarly, analogous to A.<br />

The cost c ′ of an edge <strong>in</strong> G ′ is given by<br />

c ′ (v i,j , v k,l ) = c(v i mod m,j mod n , v k mod m,l mod n )<br />

Note that, contrary to G, the transformed graph G ′ is acyclic.<br />

With this new formulation of the problem, we conclude that the set of valid<br />

cycles <strong>in</strong> G is given by the paths γ k ∈ G ′ that have v k,0 e v m+k,n , respectively, as<br />

<strong>in</strong>itial <strong>and</strong> f<strong>in</strong>al vertices. To recover the cycle S k ∈ G equivalent to γ k ∈ G ′ we<br />

just need to take the coord<strong>in</strong>ates v i,j as v i mod m,j mod n .


6.6. APPLICATIONS IN COMPUTER GRAPHICS 155<br />

In this way, the solution to the problem is given by the path γ k , of smallest cost,<br />

for k ∈ (0, . . . , m − 1).<br />

γ m<strong>in</strong> = m<strong>in</strong> OptimalPath(k, m + k)<br />

k=0,m−1<br />

where the rout<strong>in</strong>e OptimalPath computes the shortest path jo<strong>in</strong><strong>in</strong>g the vertices v k,0<br />

<strong>and</strong> v m+k,m <strong>in</strong> G ′ .<br />

Note that s<strong>in</strong>ce all of the costs are positive, it is also possible <strong>in</strong> this case, use<br />

Djkstra algorithm to implement the rout<strong>in</strong>e OptimalPath.<br />

F<strong>in</strong>al comments<br />

A more efficient version of the comb<strong>in</strong>atorial optimization algorithm described<br />

above uses the divide-<strong>and</strong>-conquer strategy. It exploits the fact that, if two optimal<br />

paths cross themselves, they must have a part <strong>in</strong> common (Fuchs et al. , 1977).<br />

We remark that the problem of surface reconstruction from cuts is more general<br />

than we have discussed <strong>in</strong> this section.<br />

In the general case, there may exist several contours for each section (that is, the<br />

contour is not connected), <strong>and</strong> the complete problem is divided <strong>in</strong> the follow<strong>in</strong>g<br />

sub-problems (Meyers et al. , 1991):<br />

• correspondence – f<strong>in</strong>d correspondences between contours <strong>in</strong> each pair of<br />

adjacent sections;<br />

• bifurcation – when a unique contour <strong>in</strong> a section corresponds to more than<br />

one contour <strong>in</strong> another section, this section must be subdivided.<br />

• polygonization – f<strong>in</strong>d the polygonal mesh between two contours on adjacent<br />

sections (this problem has been discussed here).<br />

Another <strong>in</strong>terest<strong>in</strong>g remark is that the polygonal mesh produced by the reconstruction<br />

algorithm can be coded <strong>in</strong> a compact form us<strong>in</strong>g a triangle strip.


156 CHAPTER 6. COMBINATORIAL OPTIMIZATION


Chapter 7<br />

Global <strong>Optimization</strong><br />

In the preceed<strong>in</strong>g chapters, we have seen general purpose methods for cont<strong>in</strong>uous<br />

<strong>and</strong> discrete problems. The methods for cont<strong>in</strong>uous problems were only able to<br />

f<strong>in</strong>d local m<strong>in</strong>ima, <strong>in</strong> general. In the case of discrete problems, the solution space<br />

is frequently too large to f<strong>in</strong>d the global m<strong>in</strong>ima <strong>in</strong> a reasonable time; we need<br />

to use heuristics. In this chapter, we describe some approaches to f<strong>in</strong>d<strong>in</strong>g the<br />

global m<strong>in</strong>ima, us<strong>in</strong>g heuristics for discrete problems <strong>and</strong> guaranteed methods for<br />

cont<strong>in</strong>uous problems.<br />

7.1 Two Extremes<br />

As we have seen <strong>in</strong> Chapter 2 many optimization problems can be formulated<br />

<strong>in</strong> the follow<strong>in</strong>g way: among a f<strong>in</strong>ite set of c<strong>and</strong>idates, f<strong>in</strong>d one that best fits a<br />

given criterium. S<strong>in</strong>ce the set of c<strong>and</strong>idates is f<strong>in</strong>ite, such problems are aparently<br />

easy to solve, us<strong>in</strong>g exhaustive enumeration: We simply enumerate all c<strong>and</strong>idates<br />

<strong>in</strong> succession, evaluate the optimality criterium on each c<strong>and</strong>idate, <strong>and</strong> select the<br />

best c<strong>and</strong>idate <strong>in</strong> the end. In practice, this solution does not work because the set<br />

of c<strong>and</strong>idates is too large, even though it is f<strong>in</strong>ite. This happens for all the ma<strong>in</strong><br />

comb<strong>in</strong>atorial problems, such as the famous Travel<strong>in</strong>g Salesman Problem.<br />

For problems whose c<strong>and</strong>idate space is huge, an alternative approach for their<br />

solution is to use stochastic methods. We choose a c<strong>and</strong>idate at r<strong>and</strong>om <strong>and</strong> evaluate<br />

the optimality criterium on it. We repeat this process a large number of times<br />

157


158 CHAPTER 7. GLOBAL OPTIMIZATION<br />

<strong>and</strong> we select the best c<strong>and</strong>idate found. In other words, stochastic methods perform<br />

a “r<strong>and</strong>om walk” <strong>in</strong> the space of c<strong>and</strong>idates.<br />

These two approaches — exhaustive enumeration <strong>and</strong> r<strong>and</strong>om walk — are two<br />

extremes examples of methods that explore the space of c<strong>and</strong>idates. Exhaustive<br />

enumeration is systematic <strong>and</strong> explores the c<strong>and</strong>idate space completely, which can<br />

be very slow for huge spaces. R<strong>and</strong>om walk is not at all systematic <strong>and</strong> explores<br />

the c<strong>and</strong>idate space <strong>in</strong> no particular order <strong>and</strong> without any method. In both cases,<br />

the only <strong>in</strong>formation “learned” by the method is which is the best c<strong>and</strong>idate at a<br />

given moment. None of the methods learns or exploit the structure of the space of<br />

c<strong>and</strong>idates.<br />

7.2 Local Search<br />

global<br />

local<br />

local<br />

Figure 7.1: Local maxima × global maxima <strong>in</strong> hill climb<strong>in</strong>g (Beasley et al. ,<br />

1993).<br />

A more clever solution is to use local search, also known as “hill climb<strong>in</strong>g”:<br />

Start<strong>in</strong>g at a <strong>in</strong>itial c<strong>and</strong>idate, we “move” always <strong>in</strong> the most promiss<strong>in</strong>g direction,<br />

that is, the direction <strong>in</strong> which the objective function grows (Figure 7.1). When the<br />

objective function has a s<strong>in</strong>gle global maximum, then this solution can be very


7.3. STOCHASTIC RELAXATION 159<br />

efficient, but <strong>in</strong> general this method tends to get caught anf stuck <strong>in</strong> local maxima,<br />

because at these po<strong>in</strong>ts the objective function does not grow <strong>in</strong> any direction (that<br />

is, there is no “hill’ to climb). In Figure 7.1, start<strong>in</strong>g the “climb” at the black dot<br />

we can reach the grey dot, which is a local maximum, but we cannot reach the<br />

higher peaks on the right <strong>and</strong> on the left.<br />

7.3 Stochastic Relaxation<br />

From the previous discussion, it is clear that efficient solutions to the global optimization<br />

problem must use methods that perform a systematic — but not exhaustive<br />

— exploration of the space of c<strong>and</strong>idates <strong>and</strong> which are still flexible enough<br />

to try several different c<strong>and</strong>idates, to avoid gett<strong>in</strong>g stuck <strong>in</strong> local maxima.<br />

The technique of stochastic relaxation, also known as simulated anneal<strong>in</strong>g,<br />

can be seen as a clever comb<strong>in</strong>ation of local search <strong>and</strong> r<strong>and</strong>om walk: We start<br />

at a r<strong>and</strong>om po<strong>in</strong>t <strong>in</strong> the space of c<strong>and</strong>idates. At each step, we move <strong>in</strong> a r<strong>and</strong>om<br />

direction. If this movement takes us to a higher po<strong>in</strong>t, then we accept this po<strong>in</strong>t<br />

as the new current po<strong>in</strong>t. If this movement takes us to a lower po<strong>in</strong>t, then we<br />

accept this po<strong>in</strong>t with probability p(t), where t is the simulation time. The value<br />

of the function p starts near 1, but decreases to 0 when t grows, typically <strong>in</strong> an<br />

exponential way. Thus, our exploration of the space of c<strong>and</strong>idates starts as a<br />

r<strong>and</strong>om walk <strong>and</strong> ends as local search.<br />

The physical analogy that motivates this method is the process of cool<strong>in</strong>g of a<br />

solid object: when the temperature is high, the solid is soft <strong>and</strong> easily shaped; as<br />

the temperature <strong>in</strong>creases, the solid becomes more rigid <strong>and</strong> it is more difficult to<br />

change its shape.<br />

Simulated anneal<strong>in</strong>g is able to escape from local m<strong>in</strong>ima, but because only<br />

one c<strong>and</strong>idate is considered <strong>in</strong> turn, this method also learns very little about the<br />

structure of the space of c<strong>and</strong>idates.<br />

7.4 Genetic Algorithms<br />

Another class of flexible systematic methods for global optimization is the class<br />

of genetic algorithms. Algorithms <strong>in</strong> this class are based on ideas from biology,


160 CHAPTER 7. GLOBAL OPTIMIZATION<br />

which we now review briefly.<br />

7.4.1 Genetic <strong>and</strong> evolution<br />

In the optimization problems that we shall consider, the space of c<strong>and</strong>idates can<br />

be parametrized by a small number of parameters, as described <strong>in</strong> Section 2.1.<br />

Us<strong>in</strong>g the term<strong>in</strong>ology of biology, each c<strong>and</strong>idate represents an <strong>in</strong>dividual of a<br />

population. Each <strong>in</strong>dividual is determ<strong>in</strong>ed by its genetic content, or genotype,<br />

which is stored <strong>in</strong> cromossomes. 1 Each parameter is called a gene. Each gene<br />

controls one aspect of the <strong>in</strong>dividual <strong>and</strong> can take only a f<strong>in</strong>ite number of different<br />

states (or values), called alleles. The set of gene values completely determ<strong>in</strong>es<br />

the <strong>in</strong>dividual; we say that the <strong>in</strong>dividual is the phenotype or expression of its<br />

genotype.<br />

Individuals are immersed <strong>in</strong> an environment, which is under cont<strong>in</strong>uous<br />

change. Accord<strong>in</strong>g to Darw<strong>in</strong> 2 , <strong>in</strong>dividuals <strong>in</strong> Nature compete among themselves<br />

to get food, shelter, partners, etc.; only the <strong>in</strong>dividuals that are able to face environmental<br />

changes manage to survive, reproduce themselves <strong>and</strong> preserver their<br />

characteristics. In other words, the evolution of the <strong>in</strong>dividuals is performed by<br />

the selection of the fittest. Genetic algorithms use the objective function (or a<br />

monotone variant of it) as a fitness criterium for <strong>in</strong>dividuals.<br />

Two genetic mechanisms are essential so that a population can evolve <strong>and</strong><br />

adapt itselt to changes <strong>in</strong> the environment: mutations <strong>and</strong> sexual reproduction.<br />

Mutations are r<strong>and</strong>om variations of genotypes. Some mutations are favorable<br />

to survival; other mutations are unfavorable <strong>and</strong> the <strong>in</strong>dividuals that carry them<br />

are elim<strong>in</strong>ated by selection. Thus, the existence of mutations allows populations<br />

to adapt themselves to environmental changes. However, if mutations were the<br />

sole adaptation mechanism, evolution would be extremely slow, because Nature<br />

is conservative: mutations are not very frequent <strong>and</strong> they are not always favorable.<br />

Moreover, an <strong>in</strong>dividual would never suffer only favorable mutations. Sexual<br />

reproduction, which r<strong>and</strong>omly mixes the genetic material of one or more <strong>in</strong>dividuals,<br />

allows a much faster evolution, because it allows <strong>in</strong>dividuals to <strong>in</strong>herit<br />

1 In Nature, <strong>in</strong>dividuals possess many cromossomes; for <strong>in</strong>stance, human be<strong>in</strong>gs possess 46<br />

cromossomas, total<strong>in</strong>g 50–100 thous<strong>and</strong> genes.<br />

2 Charles Darw<strong>in</strong> (1809–1882), English naturalist that founded the modern theory of evolution<br />

by natural selection, described <strong>in</strong> his classic work The Orig<strong>in</strong> of Species.


7.4. GENETIC ALGORITHMS 161<br />

favorable characteristics from more than one source.<br />

7.4.2 Genetic algorithms for optimization<br />

Inspired by the theory of evolution described <strong>in</strong> the previous section, genetic algorithms<br />

explore the c<strong>and</strong>idate space of a optimization problem <strong>in</strong> a way that is<br />

both systematic <strong>and</strong> stochastic, as described below:<br />

1. Beg<strong>in</strong> with an <strong>in</strong>itial population with r<strong>and</strong>om genotypes or computed us<strong>in</strong>g<br />

some heuristic that is specific to the given problem. Each gene represents a<br />

parameter of the c<strong>and</strong>idate space.<br />

2. At each step of the evolution, some <strong>in</strong>dividuals suffer mutations <strong>and</strong> some<br />

<strong>in</strong>dividuals reproduce.<br />

3. All <strong>in</strong>dividuals are evaluated accord<strong>in</strong>g to a fitness criterium that is based<br />

on the objective function. Only the fittest <strong>in</strong>dividuals survive to the next<br />

generation.<br />

4. Evolution stops when the fittest of the best <strong>in</strong>dividual has converged to some<br />

value (that we hope is close to the global optimum) or after a fixed number<br />

of steps.<br />

There are many variations of this basic scheme; for <strong>in</strong>stance, steps 2 <strong>and</strong> 3 can be<br />

<strong>in</strong>terchanged: only the fittest <strong>in</strong>dividuals suffer mutations or reproduce.<br />

There are many details to be settled <strong>and</strong> many parameters to be adjusted: the<br />

representation of genotypes; the size <strong>and</strong> nature of the <strong>in</strong>itial population; the<br />

frequency <strong>and</strong> nature of mutations <strong>and</strong> reproducitions; the fitness criterium for<br />

choos<strong>in</strong>g the best <strong>in</strong>dividuals; the criteria for convergence <strong>and</strong> divergence of the<br />

evolution. The art <strong>in</strong> the use of genetic algorithms lies exactly <strong>in</strong> h<strong>and</strong>l<strong>in</strong>g these<br />

details accord<strong>in</strong>g to each problem to be solved.<br />

In the orig<strong>in</strong>al work of Holl<strong>and</strong> (1975), who <strong>in</strong>troduced the theoretical basis<br />

for genetic algorithms, genotypes conta<strong>in</strong>ed only one cromossome, which were<br />

represented by bit str<strong>in</strong>gs, one bit for each gene. In this case, the basic genetic<br />

operations — mutation <strong>and</strong> reproduction — are easy to def<strong>in</strong>e <strong>and</strong> underst<strong>and</strong>. A<br />

mutation <strong>in</strong> a bit str<strong>in</strong>g is simply a change <strong>in</strong> a s<strong>in</strong>gle bit: from 0 to 1 or from 1<br />

to 0. Sexual reproduction mimics the crossover mechanism that occurs <strong>in</strong> nature:


162 CHAPTER 7. GLOBAL OPTIMIZATION<br />

the cromossomes of the parents are cut at a cross<strong>in</strong>g po<strong>in</strong>t <strong>and</strong> recomb<strong>in</strong>ed <strong>in</strong> a<br />

crossed fashion, yield<strong>in</strong>g two offspr<strong>in</strong>gs, as shown <strong>in</strong> Figure 7.2. These are the<br />

simplest rules; more complicated rules can (<strong>and</strong> should) be formulated for specific<br />

situations.<br />

po<strong>in</strong>t of crossover<br />

po<strong>in</strong>t of crossover<br />

parents<br />

1 0 1 0 0 0 1 1 1 0<br />

0 0 1 1 0 1 0 0 1 0<br />

children 1 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 1 1 0<br />

Figure 7.2: Sexual reproduction by cromossomial crossover (Beasley et al. ,<br />

1993).<br />

More recently, Koza (1992) proposed the use of programs <strong>in</strong>stead of bit cha<strong>in</strong>s<br />

as genotypes <strong>in</strong> the exploration of spaces that are not naturally described by a<br />

f<strong>in</strong>ite number o parameters, but <strong>in</strong> which the c<strong>and</strong>idates still have a f<strong>in</strong>ite description.<br />

Sims (1991) was the first to use this k<strong>in</strong>d of genetic algorithm <strong>in</strong> Computer<br />

<strong>Graphics</strong> for image synthesis <strong>and</strong> model<strong>in</strong>g.<br />

7.4.3 The travel<strong>in</strong>g salesman problem<br />

As a concrete example of the use of genetic algorithms <strong>in</strong> discrete optimization<br />

problems, consider the classic travel<strong>in</strong>g salesman problem: We wish to visit<br />

n cities <strong>in</strong> a region us<strong>in</strong>g a closed circuit of least lenght. If we number the cities


7.4. GENETIC ALGORITHMS 163<br />

from 1 to n, a tour for the travel<strong>in</strong>g salesman can be represented by a permutation<br />

of the numbers from 1 to n.<br />

Michalewicz (1994) describes the follow<strong>in</strong>g genetic model for this problem.<br />

Individuals represent the possible tours. The fitness of an <strong>in</strong>dividual is given by<br />

the length of the tour that the <strong>in</strong>dividual represents, which can be computed by<br />

add<strong>in</strong>g the distance between consecutive cities <strong>in</strong> the tour. The shortest the tour,<br />

the fitter the <strong>in</strong>dividual.<br />

Each <strong>in</strong>dividual has a s<strong>in</strong>gle cromossome with n genes; each gene can take <strong>in</strong>teger<br />

values from 1 to n. (Note that genes are not bit str<strong>in</strong>gs <strong>in</strong> this formulation.)<br />

Not all possible cromossomes represent valid <strong>in</strong>dividuals, because only permutations<br />

are valid. Thus, the c<strong>and</strong>idate space has n! elements. (For n = 100, we have<br />

100! ≈ 10 158 elements!) Mutation occurs by chang<strong>in</strong>g the value of one or more<br />

genes, or the order of the cities <strong>in</strong> a cromossome. Crossed reproduction comb<strong>in</strong>es<br />

segments of the two tours. If these operations do not take the nature of the problem<br />

<strong>in</strong>to account, it is possible (<strong>and</strong> even probable) to generate cromossomes that<br />

do not represent permutations. These cromossomes must be considere defective<br />

<strong>and</strong> their <strong>in</strong>dividuals do not survive. In this case, the c<strong>and</strong>idate space actually has<br />

n n elements, only n! of which are considered valid.<br />

Michalewicz (1994) reports that, for n = 100, start<strong>in</strong>g with an <strong>in</strong>itial r<strong>and</strong>om<br />

population, after 20000 generations we typically obta<strong>in</strong> a solution for the travel<strong>in</strong>g<br />

salesman problem whose length is less than 10% above the optimal value.<br />

7.4.4 <strong>Optimization</strong> of a univariate function<br />

We shall now see an example of cont<strong>in</strong>uous optimization (also from (Michalewicz,<br />

1994)). We want to maximize the function f(x) = x s<strong>in</strong>(10πx) + 1 <strong>in</strong> the <strong>in</strong>terval<br />

[−1, 2]. As shown <strong>in</strong> Figure 7.3, this function has many local maxima, but only<br />

one global maximum, which occurs at x ≈ 1.85.<br />

Michalewicz (1994) describes the follwo<strong>in</strong>g genetic model for this problem.<br />

Individuals represent real numbers <strong>and</strong> their fitness is measured directly by the<br />

function f: the larger the value of f, the fitter the <strong>in</strong>dividual.<br />

Each <strong>in</strong>dividual has a s<strong>in</strong>gle cromossome, with n genes (each gene is a s<strong>in</strong>gle<br />

bit). The <strong>in</strong>dividual correspond<strong>in</strong>g to one such cromossome is a po<strong>in</strong>t <strong>in</strong> the uniform<br />

mesh of po<strong>in</strong>ts that has 2 n po<strong>in</strong>ts <strong>in</strong> the <strong>in</strong>terval [−1, 2]. More precisely, if


164 CHAPTER 7. GLOBAL OPTIMIZATION<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

-1<br />

-0.5<br />

0<br />

0.5<br />

1<br />

1.5<br />

2<br />

Figure 7.3: Graph of f(x) = x s<strong>in</strong>(10πx) + 1 on [−1, 2].<br />

the genotype is<br />

x = (x n . . . x 1 ),<br />

where each x i is a bit, the the phenotype is the follow<strong>in</strong>g real number:<br />

x = −1 + 3<br />

2 n − 1<br />

n∑<br />

x i 2 i−1 .<br />

In other words, the <strong>in</strong>tegers from 0 to 2 n −1, which are exactly those representable<br />

with n bits, are l<strong>in</strong>early mapped to the <strong>in</strong>terval [−1, 2]. This mapp<strong>in</strong>g transforms<br />

the cont<strong>in</strong>uous problem <strong>in</strong>to a discret problem, whose solution approximates the<br />

solution of the cont<strong>in</strong>uous problem. This approximation can be as good as desired:<br />

we just need to use a sufficiently f<strong>in</strong>e discretization, that is, a sufficiently large<br />

number of bits <strong>in</strong> the cromossomes.<br />

Mutations occur by r<strong>and</strong>omly chang<strong>in</strong>g one or more bits, with probability given<br />

by a user-def<strong>in</strong>ed mutation rate. To mutate a but <strong>in</strong> a genotype x, we r<strong>and</strong>omly<br />

select a position k between 1 <strong>and</strong> n <strong>and</strong> we set x k ← 1−x k . Two <strong>in</strong>dividuals mate<br />

mix<strong>in</strong>g their bits: we r<strong>and</strong>omly select a position k between 0 <strong>and</strong> n; the cromossomes<br />

of the parents are split after position k <strong>and</strong> recomb<strong>in</strong>ed, yield<strong>in</strong>g two off-<br />

i=1


7.5. GUARANTEED GLOBAL OPTIMIZATION 165<br />

spr<strong>in</strong>gs. More precisely, of the parents are a = (a n . . . a 1 ) <strong>and</strong> b = (b n . . . b 1 ), then<br />

the offspr<strong>in</strong>gs are (a n . . . a k b k−1 . . . b 1 ) <strong>and</strong> (b n . . . b k a k−1 . . . a 1 ) (see Figure 7.2).<br />

Michalewicz (1994) reports a solution of this problem that locates the global<br />

maximum with a precision of six decimal places. For this, he used n = 22, an<br />

<strong>in</strong>itial r<strong>and</strong>om population of 50 <strong>in</strong>dividuals, mutation rate equal to 0.01, <strong>and</strong> cross<strong>in</strong>g<br />

rate equal to 0.25. After 150 generations, the fittest <strong>in</strong>dividual had genotype<br />

1111001101000100000101. This <strong>in</strong>dividual corresponds to the approximate solution<br />

ˆx = 2587568/1398101 ≈ 1.850773, whose fitness is f(ˆx) ≈ 2.850227.<br />

The global maximum of f is f ∗ ≈ 2.850273, atta<strong>in</strong>ed at x ∗ ≈ 1.850547. Thus,<br />

the result has an absolute error less than 5 × 10 −5 <strong>and</strong> a relative error less than<br />

2 × 10 −5 — a good solution.<br />

7.5 Guaranteed Global <strong>Optimization</strong><br />

In this section, we revisit the unconstra<strong>in</strong>ed cont<strong>in</strong>uous m<strong>in</strong>imization problem,<br />

seen <strong>in</strong> Chapter 4: Given a box 3 Ω ⊆ R d <strong>and</strong> an objective function f : Ω → R,<br />

f<strong>in</strong>d its global m<strong>in</strong>imum f ∗ = m<strong>in</strong>{ f(x) : x ∈ Ω } <strong>and</strong> the set of all po<strong>in</strong>ts <strong>in</strong> Ω<br />

where this m<strong>in</strong>imum value is atta<strong>in</strong>ed, Ω ∗ (f) = { x ∗ ∈ Ω : f(x ∗ ) = f ∗ }.<br />

In Chapter 4, we saw methods that f<strong>in</strong>d local m<strong>in</strong>ima. In this section, we are<br />

<strong>in</strong>terested <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g the global m<strong>in</strong>imum. In general, it is not possible to f<strong>in</strong>d this<br />

m<strong>in</strong>imum exactly, <strong>and</strong> so we shall consider an approximate, numerical version<br />

of the problem: Instead of f<strong>in</strong>d<strong>in</strong>g f ∗ <strong>and</strong> all m<strong>in</strong>imum po<strong>in</strong>ts Ω ∗ (f) exactly, we<br />

seek only to identify some real <strong>in</strong>terval M that is guaranteed to conta<strong>in</strong> the global<br />

m<strong>in</strong>imum f ∗ , <strong>and</strong> some subset ̂Ω of Ω that is guaranteed to conta<strong>in</strong> Ω ∗ (f). The<br />

goal then becomes to make M <strong>and</strong> ̂Ω as small as possible, for a given computation<br />

budget (that is, time or memory).<br />

Methods that sample the objective function f at a f<strong>in</strong>ite set of po<strong>in</strong>ts <strong>in</strong> Ω cannot<br />

reliably f<strong>in</strong>d the global m<strong>in</strong>imum f ∗ , because f may oscillate arbitrarily between<br />

these sample po<strong>in</strong>ts.<br />

Consider the follow<strong>in</strong>g example (Moore, 1991): Choose a small sub<strong>in</strong>terval<br />

3 A box is the cartesian product of d real <strong>in</strong>tervals: [a 1 , b 1 ] × · · · × [a n , b n ].


166 CHAPTER 7. GLOBAL OPTIMIZATION<br />

[x 1 , x 2 ] ⊆ Ω, <strong>and</strong> def<strong>in</strong>e<br />

g(x) = f(x) +<br />

{<br />

0, for x ∉ [x1 , x 2 ]<br />

(x − x 1 )(x − x 2 )10 20 , for x ∈ [x 1 , x 2 ]<br />

The function g co<strong>in</strong>cides with f, except on the <strong>in</strong>terval [x 1 , x 2 ], <strong>in</strong> which it atta<strong>in</strong>s<br />

a very small m<strong>in</strong>imum. If this <strong>in</strong>terval is sufficiently small (for <strong>in</strong>stance, if x 1 <strong>and</strong><br />

x 2 are consecutive float<strong>in</strong>g-po<strong>in</strong>t numbers), then it is not possible to notice the<br />

difference between f <strong>and</strong> g by po<strong>in</strong>t sampl<strong>in</strong>g.<br />

What is neeeded for the reliable solution of global optimization problems is to<br />

replace po<strong>in</strong>t sampl<strong>in</strong>g by reliable estimates of the whole set of values taken by<br />

f on a subregion X of Ω. (This would allow us to notice the difference between<br />

f <strong>and</strong> g <strong>in</strong> the previous example.) More precisely, if for each X ⊆ Ω we know<br />

how to compute an <strong>in</strong>terval F (X) that is guaranteed to conta<strong>in</strong> f(X) = {f(x) :<br />

x ∈ X}, even if properly, then it is possible to elim<strong>in</strong>ate parts of Ω that cannot<br />

conta<strong>in</strong> global m<strong>in</strong>imizers. The parts that cannot be elim<strong>in</strong>ated are subdivided, so<br />

that after reach<strong>in</strong>g a user-def<strong>in</strong>ed tolerance, whatever is left 4 of Ω must conta<strong>in</strong> all<br />

global m<strong>in</strong>imizers <strong>and</strong> so serves as an approximate solution ̂Ω. The theme of this<br />

section is precisely how to compute such <strong>in</strong>terval estimates F (X) <strong>and</strong> how to use<br />

those estimates to elim<strong>in</strong>ate parts of Ω.<br />

7.5.1 Branch-<strong>and</strong>-bound methods<br />

A branch-<strong>and</strong>-bound algorithm for unconstra<strong>in</strong>ed global m<strong>in</strong>imization alternates<br />

between two ma<strong>in</strong> steps: branch<strong>in</strong>g, which is a recursive subdivision of the doma<strong>in</strong><br />

Ω; <strong>and</strong> bound<strong>in</strong>g, which is the computation of lower <strong>and</strong> upper bounds for<br />

the values taken by f <strong>in</strong> a subregion of Ω. By keep<strong>in</strong>g track of the current best<br />

upper bound ̂f for the global m<strong>in</strong>imum f ∗ , one can discard all subregions whose<br />

lower bound for f is greater than ̂f, because they cannot conta<strong>in</strong> a global m<strong>in</strong>imizer<br />

of f. Subregions that cannot be thus discarded are split, <strong>and</strong> the pieces are<br />

put <strong>in</strong>to a list L to be further processed. Thus, at all times, the set ̂Ω = ∪L, given<br />

by the union of all subregions <strong>in</strong> L, conta<strong>in</strong>s all possible global m<strong>in</strong>imizers <strong>and</strong> so<br />

is a valid solution to the global m<strong>in</strong>imization problem, as def<strong>in</strong>ed at the beg<strong>in</strong>n<strong>in</strong>g<br />

of this section.<br />

4 “When you have elim<strong>in</strong>ated the impossible, whatever rema<strong>in</strong>s, however improbable, must be<br />

the truth.” — Sherlock Holmes <strong>in</strong> The Sign of Four, Arthur Conan Doyle (1889).


7.5. GUARANTEED GLOBAL OPTIMIZATION 167<br />

More precisely, the branch-<strong>and</strong>-bound solution of the global m<strong>in</strong>imization<br />

problems starts with L = {Ω}, <strong>and</strong> ̂f = ∞, <strong>and</strong> repeats the steps below while L<br />

is not empty:<br />

1. select one subregion from L;<br />

2. if X is small enough, then accept X as part of ̂Ω;<br />

3. compute an <strong>in</strong>terval estimate F (X) for f(X);<br />

4. if <strong>in</strong>f F (X) > ̂f, then discard X;<br />

5. update ̂f ← m<strong>in</strong>( ̂f, sup F (X));<br />

6. subdivide X <strong>in</strong>to X 1 e X 2 ;<br />

7. add X 1 <strong>and</strong> X 2 to L;<br />

Figure 7.4 show the decomposition of Ω = [−10, 10] × [−10, 10] obta<strong>in</strong>ed by<br />

a branch-<strong>and</strong>-bound algorithm to m<strong>in</strong>imize Matyas’s function<br />

f(x 1 , x 2 ) = 0.26 (x 2 1 + x2 2 ) − 0.48 x 1x 2 ,<br />

whose global m<strong>in</strong>imum f ∗ = 0 is atta<strong>in</strong>ed at Ω ∗ = {(0, 0)}. The white regions<br />

have been discarded <strong>and</strong> the approximate solution ̂Ω appears <strong>in</strong> grey.<br />

The basic branch-<strong>and</strong>-bound algorithm outl<strong>in</strong>ed above admits endless variations,<br />

depend<strong>in</strong>g on how the selection, branch<strong>in</strong>g, <strong>and</strong> bound<strong>in</strong>g steps are implemented.<br />

The ma<strong>in</strong> criteria for select<strong>in</strong>g from L the next box X to be exam<strong>in</strong>ed at each<br />

step are:<br />

• X is the newest box <strong>in</strong> L. In this casee, the list L is be<strong>in</strong>g used as a stack<br />

<strong>and</strong> Ω is explored <strong>in</strong> depth.<br />

• X is the oldest box <strong>in</strong> L. In this casee, the list L is be<strong>in</strong>g used as a queue<br />

<strong>and</strong> Ω is explored <strong>in</strong> breadth.<br />

• X is the box <strong>in</strong> L that has smallest lower bound for f. In this case, the list L<br />

is be<strong>in</strong>g used as a priority queue <strong>and</strong> Ω is explored by given priority to the<br />

regions than seem more promis<strong>in</strong>g at each step.


168 CHAPTER 7. GLOBAL OPTIMIZATION<br />

Figure 7.4: Doma<strong>in</strong> decomposition performed by branch-<strong>and</strong>-bound.<br />

The simplest branch<strong>in</strong>g method is to bisect the current box orthogonally to its<br />

widest direction. Cyclic bisection (Moore, 1979) is another natural choice for<br />

branch<strong>in</strong>g. For reviews of branch<strong>in</strong>g methods <strong>and</strong> strategies for select<strong>in</strong>g subregions<br />

from L, <strong>in</strong>clud<strong>in</strong>g recent results, see (Berner, 1996; Csendes & Ratz, 1997).<br />

In Section 7.6, we shall describe how to implement the bound<strong>in</strong>g step, that is,<br />

how to compute <strong>in</strong>terval estimates F (X) for f(X).<br />

7.5.2 Accelerat<strong>in</strong>g <strong>in</strong>terval branch-<strong>and</strong>-bound methods<br />

Due to overestimation <strong>in</strong> <strong>in</strong>terval estimates F (X), pure branch-<strong>and</strong>-bound algorithms<br />

tend to spend a great deal of time try<strong>in</strong>g to elim<strong>in</strong>ate regions close to local<br />

m<strong>in</strong>imizers. Therefore, <strong>in</strong> practice, it is essential to use additional tests to accelerate<br />

convergence.<br />

The simplest additional test is the midpo<strong>in</strong>t test: At each step, the objective<br />

function f is evaluated at the center of the current subregion X <strong>and</strong> this value is<br />

used to update the current best upper bound ̂f for the global m<strong>in</strong>imum f ∗ . This<br />

test tends to compensate for possible overestimation <strong>in</strong> F (X) by us<strong>in</strong>g an actual<br />

po<strong>in</strong>t of f(X).


7.5. GUARANTEED GLOBAL OPTIMIZATION 169<br />

The follow<strong>in</strong>g conditions are sufficient for the convergence of <strong>in</strong>terval branch<strong>and</strong>-bound<br />

algorithms: the objective function f is cont<strong>in</strong>uous; the branch<strong>in</strong>g step<br />

reduces the diameter of boxes to zero; <strong>and</strong> the diameter of F (X) tends to zero as<br />

the diameter of X goes to zero. Better results are possible when f is sufficiently<br />

differentiable; we can apply the necessary first- <strong>and</strong> second-order tests discussed<br />

<strong>in</strong> Chapter 4:<br />

The monotonicity test computes <strong>in</strong>terval estimates for the components of the<br />

gradient ∇f <strong>in</strong> a subregion X. If any of these estimates does not conta<strong>in</strong> zero,<br />

then f is monotone <strong>in</strong> the correspond<strong>in</strong>g direction <strong>and</strong> X cannot conta<strong>in</strong> a local<br />

m<strong>in</strong>imizer, The convexity test computes <strong>in</strong>terval estimates for the components of<br />

the Hessian∇ 2 f <strong>in</strong> X. If these estimates show that the Hessian is positive def<strong>in</strong>ite<br />

over X, then aga<strong>in</strong> X cannot conta<strong>in</strong> a local m<strong>in</strong>imizer.<br />

These high-order tests are more expensive, but can discard large subregions of<br />

Ω. Moreover, they can be comb<strong>in</strong>ed with the methods discussed <strong>in</strong> Chapter 4 to<br />

locate subregions X that conta<strong>in</strong> a s<strong>in</strong>gle local m<strong>in</strong>imizer, which is then computed<br />

with a local method (not an <strong>in</strong>terval method), thus avoid<strong>in</strong>g excessive subdivision<br />

of X.<br />

7.5.3 Constra<strong>in</strong>ed optimization<br />

The branch-<strong>and</strong>-bound described above for unconstra<strong>in</strong>ed problems can be<br />

adapted for solv<strong>in</strong>g constra<strong>in</strong>ed problems:<br />

m<strong>in</strong><br />

f(x)<br />

subject to g i (x) = 0,<br />

i = 1, . . . , m<br />

h j (x) ≥ 0, j = 1, . . . , p;<br />

In addition to the <strong>in</strong>terval estimate F for f, we also need <strong>in</strong>terval estimates G i<br />

<strong>and</strong> H j for the restrictions g i <strong>and</strong> h j , respectively. At each step, we test whether<br />

the subregion X under exam<strong>in</strong>ation is feasible <strong>and</strong> discard it if it is not. More precisely,<br />

if G i (X) does not conta<strong>in</strong> zero, then the restriction g i (x) = 0 is not satisfied<br />

<strong>in</strong> X; if H j (X) conta<strong>in</strong>s only negative numbers, then the restriction h j (x) ≥ 0 is<br />

not satisfied <strong>in</strong> X. Subregions that are too small <strong>and</strong> have not been elim<strong>in</strong>ated are<br />

considered feasible. Details of this method can be seen <strong>in</strong> (Baker Kearfott, 1996;<br />

Hansen, 1988; Ratschek & Rokne, 1988).


170 CHAPTER 7. GLOBAL OPTIMIZATION<br />

An important special case is to determ<strong>in</strong>e whether a constra<strong>in</strong>ed problem is<br />

feasible, that is, whether it is possible to satisfy all given restrictions <strong>in</strong> Ω. Many<br />

times, the goal is precisely to f<strong>in</strong>d the feasible region:<br />

˜Ω = { x ∈ Ω : g i (x) = 0, h j (x) ≥ 0 }.<br />

In these cases, the objective function is not relevant <strong>and</strong> can be taken constant.<br />

The goal then becomes simply to solve the system of equations <strong>and</strong> <strong>in</strong>equalities<br />

given by the restrictions. In this case, the elim<strong>in</strong>ation of those parts that are not<br />

feasible, as described above, is sufficient to yield an approximation ̂Ω for ˜Ω.<br />

7.6 Range analysis<br />

Range analysis is the study of the behavior of real functions based on <strong>in</strong>terval<br />

estimates for their set of values. More precisely, given a function f : Ω → R,<br />

range analysis methods provide an <strong>in</strong>clusion function for f, that is, a function F<br />

def<strong>in</strong>ed on the subsets of Ω such that F (X) is an <strong>in</strong>terval conta<strong>in</strong><strong>in</strong>g f(X), for<br />

all X ⊆ Ω. Thus, the <strong>in</strong>clusion function F provides robust estimates for the<br />

maximum <strong>and</strong> m<strong>in</strong>imum values of f <strong>in</strong> X. The estimates provided by F must be<br />

robust, but are not required to be tight. In other words, F (X) has to conta<strong>in</strong> f(X),<br />

but may be strictly larger. Robust <strong>in</strong>terval estimates hence play an essential role<br />

<strong>in</strong> the correctness of <strong>in</strong>terval branch-<strong>and</strong>-bound algorithms.<br />

Any method that uses an <strong>in</strong>clusion function to solve a global optimization problem<br />

actually provides a computational proof that the global m<strong>in</strong>imum is <strong>in</strong> the<br />

<strong>in</strong>terval M <strong>and</strong> that all po<strong>in</strong>ts of Ω where this m<strong>in</strong>imum is atta<strong>in</strong>ed are <strong>in</strong> ̂Ω,<br />

someth<strong>in</strong>g that no method based on po<strong>in</strong>t sampl<strong>in</strong>g can do.<br />

The efficiency of <strong>in</strong>terval methods depends on the quality the <strong>in</strong>clusion functions<br />

used, that is, on how close the estimate F (X) is to the exact range f(X).<br />

Here, as elsewhere, one usually trades quality for speed: techniques that provide<br />

tighter bounds are usually more expensive. However, tighter estimates often allow<br />

the algorithm to discard many regions that conta<strong>in</strong> no m<strong>in</strong>imizer at an earlier<br />

stage, before they get subdivided; this means that less range estimations need to<br />

be performed, <strong>and</strong> the algorithm runs faster as a whole.<br />

There are special range analysis techniques for some classes of functions. If f


7.6. RANGE ANALYSIS 171<br />

is a Lipschitz function with Lipschitz constant L, then by def<strong>in</strong>ition we have<br />

|f(x) − f(y)| ≤ L|x − y|,<br />

for all x, y ∈ X. So, a simple <strong>and</strong> fast estimate for f(X) is given by<br />

F (X) = [f(x 0 ) − Lδ, f(x 0 ) + Lδ],<br />

where x 0 is an arbitrary po<strong>in</strong>t of X, <strong>and</strong> δ = max{ |x − x 0 | : x ∈ X }. For<br />

<strong>in</strong>stance, we can take x 0 as the midpo<strong>in</strong>t of X <strong>and</strong> δ as half the diameter of X.<br />

Note that if we know how to estimate the derivative of f, then the Mean Value<br />

Theorem implies that f is a Lipschitz function.<br />

If f is a polynomial function of degree n <strong>in</strong> the <strong>in</strong>terval [0, 1], then we can<br />

represent f <strong>in</strong> the Bernste<strong>in</strong>-Bézier basis of degree n:<br />

f(x) =<br />

n∑<br />

b i Bi n (x),<br />

i=0<br />

where<br />

B n i (x) =<br />

( n<br />

i)<br />

x i (1 − x) n−i .<br />

From this representation, we can take F (X) = [m<strong>in</strong> b i , max b i ], because<br />

B n i (x) ≥ 0<br />

<strong>and</strong><br />

n∑<br />

Bi n (x) = 1.<br />

i=0<br />

A similar result holds for multivariate function <strong>in</strong> doma<strong>in</strong>s of the form Ω =<br />

[a 1 , b 1 ] × · · · × [a n , b n ].<br />

We shall now describe <strong>in</strong>terval arithmetic, a classical range analysis method of<br />

wide applicability.<br />

7.6.1 Interval arithmetic<br />

One of the ma<strong>in</strong> problems <strong>in</strong> the theory <strong>and</strong> practice of numerical algorithms is<br />

the control of errors caused by represent<strong>in</strong>g the real numbers as the discrete set of


172 CHAPTER 7. GLOBAL OPTIMIZATION<br />

float<strong>in</strong>g-po<strong>in</strong>t numbers of fixed precision <strong>in</strong> use <strong>in</strong> digital computers. These roud<strong>in</strong>g<br />

errors are not the same as the approximation errors <strong>in</strong> each algorithm, which<br />

occur because the desired solutions cannot be found us<strong>in</strong>g a f<strong>in</strong>ite number of elementary<br />

operations. Round<strong>in</strong>g erros are present even <strong>in</strong> elementary arithmetic<br />

operations, such as addition <strong>and</strong> multiplication.<br />

A simple <strong>and</strong> common approach for estimat<strong>in</strong>g the error <strong>in</strong> a float<strong>in</strong>g-po<strong>in</strong>t<br />

computation is to repeat it us<strong>in</strong>g more precision, <strong>and</strong> compare the results. If the<br />

results agree <strong>in</strong> many decimal places, then the computation is assumed to be correct,<br />

at least <strong>in</strong> the common part. However, this common practice can be seriously<br />

mislead<strong>in</strong>g, as the follow<strong>in</strong>g simple example by Rump (Rump, 1988) shows.<br />

Consider the evaluation of<br />

f = 333.75y 6 + x 2 (11x 2 y 2 − y 6 − 121y 4 − 2) + 5.5y 8 + x/(2y),<br />

for x = 77617 <strong>and</strong> y = 33096. Note that x, y, <strong>and</strong> all coefficients <strong>in</strong> f are<br />

exactly representable <strong>in</strong> float<strong>in</strong>g-po<strong>in</strong>t. Thus, the only errors that can occur <strong>in</strong> the<br />

evaluation of f are round<strong>in</strong>g errors. Rump (Rump, 1988) reports that comput<strong>in</strong>g<br />

f <strong>in</strong> FORTRAN on a IBM S/370 ma<strong>in</strong>frame yield:<br />

f = 1.172603 . . .<br />

f = 1.1726039400531 . . .<br />

f = 1.172603940053178 . . .<br />

us<strong>in</strong>g s<strong>in</strong>gle precision;<br />

us<strong>in</strong>g double precision;<br />

us<strong>in</strong>g extended precision.<br />

S<strong>in</strong>ce these three values agree <strong>in</strong> the first seven places, common practice<br />

would accept the computation as correct. However, the true value is f =<br />

−0.8273960599 . . . ; not even the sign is right <strong>in</strong> the computed results! (The problem<br />

does not occur only <strong>in</strong> FORTRAN nor only <strong>in</strong> the IBM S/370. Repeat this<br />

computation us<strong>in</strong>g your favorite programm<strong>in</strong>g language or calculator.)<br />

Interval arithmetic was <strong>in</strong>vented by Moore (1966) with the explicit goal of improv<strong>in</strong>g<br />

the reliability of numerical computation. Even if it cannot make round<strong>in</strong>g<br />

errors disappear, it can at least be used to track the occurence of damag<strong>in</strong>g round<strong>in</strong>g<br />

errors <strong>and</strong> provide a measure of the impact of round<strong>in</strong>g errors <strong>in</strong> the f<strong>in</strong>al<br />

result. For <strong>in</strong>stance, <strong>in</strong> Rump’s example, <strong>in</strong>terval arithmetic would probably return<br />

a very large <strong>in</strong>terval, which still conta<strong>in</strong>ed the correct answer but too large to<br />

give precise its exact location. Nevertheless, a very large <strong>in</strong>terval would certa<strong>in</strong>ly<br />

alert the user that the computation was severely damaged by round<strong>in</strong>g errors.


7.6. RANGE ANALYSIS 173<br />

For our purposes, the most important feature of <strong>in</strong>terval arithmetic is that<br />

it is the natural tool for range analysis (Moore, 1966; Moore, 1979; Ratschek<br />

& Rokne, 1984), allow<strong>in</strong>g us to implement reliable global optimization methods<br />

(Baker Kearfott, 1996; Hansen, 1988; Ratschek & Rokne, 1988).<br />

In <strong>in</strong>terval arithmetic, a real number x is represented by a pair of float<strong>in</strong>g-po<strong>in</strong>t<br />

numbers [a, b], correspond<strong>in</strong>g to an <strong>in</strong>terval that is guaranteed to conta<strong>in</strong> x, i.e.,<br />

such that a ≤ x ≤ b. In this way, <strong>in</strong>terval arithmetic provides not only an estimate<br />

for the value of x, (the midpo<strong>in</strong>t (a + b)/2), but also bounds on how good this<br />

estimate is (the width of the <strong>in</strong>terval, b − a).<br />

The real power of <strong>in</strong>terval arithmetic is that we can operate with <strong>in</strong>tervals as<br />

if they were numbers, <strong>and</strong> obta<strong>in</strong> reliable estimates for the results of numerical<br />

computations, even when implemented <strong>in</strong> float<strong>in</strong>g-po<strong>in</strong>t arithmetic.<br />

To implement a numerical algorithm with <strong>in</strong>terval arithmetic, it is enough to<br />

use <strong>in</strong>terval versions of the elementary operations, <strong>and</strong> function composition. For<br />

<strong>in</strong>stance, we have the follow<strong>in</strong>g <strong>in</strong>terval formulas for the elementary arithmetic<br />

operations:<br />

[a, b] + [c, d] = [a + c, b + d]<br />

[a, b] − [c, d] = [a − d, b − c]<br />

[a, b] × [c, d] = [m<strong>in</strong>(ac, ad, bc, bd), max(ac, ad, bc, bd)]<br />

[a, b] / [c, d] = [a, b] × [1/d, 1/c], if 0 ∉ [c, d]<br />

With a little extra work, we can write formulas for all elementary functions,<br />

such as square root, s<strong>in</strong>e, cos<strong>in</strong>e, logarithm, exponencial, etc. Therefore, if we<br />

know how to write an algorithm for comput<strong>in</strong>g a function f on n variables x 1 ,<br />

. . . , x n , the we know how to write an algorithm for comput<strong>in</strong>g an <strong>in</strong>terval estimate<br />

F (X) for f over X = [a 1 , b 1 ] × · · · × [a n , b n ]: simply compose the <strong>in</strong>terval<br />

versions of the elementary operations <strong>and</strong> functions <strong>in</strong> the same way they are<br />

composed to evaluate f. This is specially elegant to implement with programm<strong>in</strong>g<br />

languages that support operator overload<strong>in</strong>g, such as C++, but <strong>in</strong>terval arithmetic<br />

can be easily implemented <strong>in</strong> any programm<strong>in</strong>g language. There are several implementations<br />

available on the <strong>in</strong>ternet (Kre<strong>in</strong>ovich, n.d.).


174 CHAPTER 7. GLOBAL OPTIMIZATION<br />

The dependency problem<br />

A limitation of <strong>in</strong>terval arithmetic is that its range estimates tend to be too conservative:<br />

the <strong>in</strong>terval F (X) computed as estimate for the range of values of an<br />

expression f over a region X can be much wider than the exact range f(X). This<br />

over-conservatism of <strong>in</strong>terval arithmetic is particularly severe <strong>in</strong> long computation<br />

cha<strong>in</strong>s — such as the ones <strong>in</strong> Computer <strong>Graphics</strong> — <strong>and</strong> one frequently gets<br />

<strong>in</strong>terval estimates that are too large, sometimes to the po<strong>in</strong>t of uselessness. This<br />

happens even <strong>in</strong> computations that are not severely affected by round<strong>in</strong>g errors.<br />

The ma<strong>in</strong> reason for this over-conservatism is the implicit assumption that<br />

oper<strong>and</strong>s <strong>in</strong> primitive <strong>in</strong>terval operations are mutually <strong>in</strong>dependent. If there are<br />

dependencies among the oper<strong>and</strong>s, then those formulas are not the best possible.<br />

because not all comb<strong>in</strong>ations of values <strong>in</strong> the oper<strong>and</strong> <strong>in</strong>tervals will be atta<strong>in</strong>ed;<br />

the result <strong>in</strong>terval will then be much wider than the exact range of the result quantity.<br />

This problem is called the dependency problem <strong>in</strong> <strong>in</strong>terval arithmetic.<br />

An extreme example of this problem is the expression x−x. Although its value<br />

is always zero, this expression is computed bl<strong>in</strong>dly <strong>in</strong> <strong>in</strong>terval arithmetic as<br />

x = [a, b]<br />

x − x = [a − b, b − a]<br />

Thus, the width of the <strong>in</strong>terval x − x, which should be zero, is twice the width<br />

of x.<br />

A less extreme <strong>and</strong> more typical example is x(10 − x) for x <strong>in</strong> [4, 6]:<br />

x = [4, 6]<br />

10 − x = [10, 10] − [4, 6] = [4, 6]<br />

x(10 − x) = [4, 6] · [4, 6] = [16, 36].<br />

Note that the result [16, 36] is 20 times larger than [24, 25], which is the exact<br />

range of x(10 − x) over [4, 6]. This discrepancy is due to the <strong>in</strong>verse dependency<br />

that exists between x <strong>and</strong> 10−x, a dependency that is not taken <strong>in</strong>to account by the<br />

<strong>in</strong>terval multiplication formula. For <strong>in</strong>stance, the upper bound 36 comes from the<br />

product of the two upper bounds, which <strong>in</strong> this case are both equal to 6. However,<br />

these two upper bounds are never atta<strong>in</strong>ed simultaneaously: when x = 6, we have<br />

10 − x = 4, <strong>and</strong> vice-versa.


7.6. RANGE ANALYSIS 175<br />

7.6.2 Aff<strong>in</strong>e arithmetic<br />

Aff<strong>in</strong>e arithmetic (AA) is a model for numeric computation designed to h<strong>and</strong>le the<br />

dependecy problem (Comba & Stolfi, 1993). Like st<strong>and</strong>ard <strong>in</strong>terval arithmetic,<br />

AA can provide guaranteed bounds for the computed results, tak<strong>in</strong>g <strong>in</strong>to account<br />

<strong>in</strong>put, truncation, <strong>and</strong> round<strong>in</strong>g errors. Unlike IA, however, AA automatically<br />

keeps track of correlations between computed <strong>and</strong> <strong>in</strong>put quantities, <strong>and</strong> is therefore<br />

frequently able to compute much better estimates than <strong>in</strong>terval arithmetic,<br />

specially <strong>in</strong> long computation cha<strong>in</strong>s.<br />

In AA, each quantity x is represented by an aff<strong>in</strong>e form<br />

ˆx = x 0 + x 1 ε 1 + x 2 ε 2 + · · · + x n ε n ,<br />

which is a polynomial of degree 1 with real coefficients x i <strong>in</strong> the symbolic variables<br />

ε i (called noise symbols) whose values are unknown but assumed to lie <strong>in</strong><br />

the <strong>in</strong>terval [−1, +1]. Each noise symbol ε i st<strong>and</strong>s for an <strong>in</strong>dependent component<br />

of the total uncerta<strong>in</strong>ty of the quantity x; the correspond<strong>in</strong>g coefficient x i gives<br />

the magnitude of that component.<br />

The ma<strong>in</strong> benefit of us<strong>in</strong>g aff<strong>in</strong>e forms <strong>in</strong>stead of <strong>in</strong>tervals is that two or more<br />

aff<strong>in</strong>e forms share common noise symbols, mean<strong>in</strong>g that the quantities they represent<br />

are at least partially dependent on each other. S<strong>in</strong>ce this dependency is<br />

represented <strong>in</strong> AA, albeit implicitly, AA can provide better <strong>in</strong>terval estimates for<br />

complicated expressions, as the simple example <strong>in</strong> the end of this section shows.<br />

To obta<strong>in</strong> <strong>in</strong>terval estimates with AA, first, convert all <strong>in</strong>put <strong>in</strong>tervals to aff<strong>in</strong>e<br />

forms. Then, operate on these aff<strong>in</strong>e forms to compute the desired function, by<br />

replac<strong>in</strong>g each primitive operation with its AA version. F<strong>in</strong>ally, convert the result<strong>in</strong>g<br />

aff<strong>in</strong>e form back <strong>in</strong>to an <strong>in</strong>terval. We shall now see these three steps <strong>in</strong> some<br />

detail.<br />

Given an <strong>in</strong>terval [a, b] represent<strong>in</strong>g some quantity x, an equivalent aff<strong>in</strong>e form<br />

is ˆx = x 0 + x k ε k , where x 0 = (b + a)/2 <strong>and</strong> x k = (b − a)/2. S<strong>in</strong>ce <strong>in</strong>put <strong>in</strong>tervals<br />

usually represent <strong>in</strong>dependent variables, they are assumed to be unrelated, <strong>and</strong> a<br />

new noise symbol ε k must be used for each <strong>in</strong>put <strong>in</strong>terval. Conversely, the value of<br />

a quantity represented by an aff<strong>in</strong>e form ˆx = x 0 + x 1 ε 1 + · · · + x n ε n is guaranteed<br />

to be <strong>in</strong> the <strong>in</strong>terval [ˆx] = [x 0 − ξ, x 0 + ξ], where ξ = ‖ˆx‖ := ∑ n<br />

i=1 |x i| Note<br />

that the range [ˆx] is the smallest <strong>in</strong>terval that conta<strong>in</strong>s all possible values of ˆx,<br />

because by def<strong>in</strong>ition each noise symbol ε i ranges <strong>in</strong>dependently over the <strong>in</strong>terval


176 CHAPTER 7. GLOBAL OPTIMIZATION<br />

[−1, +1].<br />

To evaluate an expression with AA, simpy use the AA version of each elementary<br />

operation z = f(x, y, . . . ), as done <strong>in</strong> <strong>in</strong>terval arithemetic. Aff<strong>in</strong>e operations<br />

are the easiest, because they can be computed exactly. Given two aff<strong>in</strong>e forms<br />

<strong>and</strong> a scalar α ∈ R, we have<br />

ˆx = x 0 + x 1 ε 1 + · · · + x n ε n<br />

ŷ = y 0 + y 1 ε 1 + · · · + y n ε n<br />

ˆx ± ŷ = (x 0 ± y 0 ) + (x 1 ± y 1 )ε 1 + · · · + (x n ± y n )ε n<br />

αˆx = (αx 0 ) + (αx 1 )ε 1 + · · · + (αx n )ε n<br />

ˆx ± α = (x 0 ± α) + x 1 ε 1 + · · · + x n ε n .<br />

Note that ˆx − ˆx is identically zero <strong>in</strong> AA. As we saw earlier, the absence of such<br />

trivial cancellations is a major <strong>in</strong>gredient <strong>in</strong> the dependency problem <strong>in</strong> <strong>in</strong>terval<br />

arithmetic.<br />

For non-aff<strong>in</strong>e operations, we pick a good aff<strong>in</strong>e approximation for f <strong>and</strong> append<br />

an extra term z k ε k to represent the error <strong>in</strong> this approximation:<br />

ẑ = z 0 + z 1 ε 1 + · · · + z n ε n + z k ε k .<br />

Aga<strong>in</strong>, ε k is a new noise symbol <strong>and</strong> z k is an upper bound for the approximation<br />

error. In this way, we can write AA formulas for all elementary operations <strong>and</strong><br />

functions. (For details, see (Stolfi & de Figueiredo, 1997).) For <strong>in</strong>stance, the<br />

multiplication is given by:<br />

z 0 = x 0 y 0<br />

z i = x 0 y i + y 0 x i (i = 1, . . . , n)<br />

z k = ‖ˆx‖ ‖ŷ‖.<br />

To see how AA h<strong>and</strong>les the dependency problem, consider aga<strong>in</strong> the example<br />

given earlier: evaluate z = x(10 − x) for x <strong>in</strong> the <strong>in</strong>terval [4, 6]:<br />

ˆx = 5 + 1ε 1<br />

10 − ˆx = 5 − 1ε 1<br />

ẑ = ˆx(10 − ˆx) = 25 + 0ε 1 + 1ε 2<br />

[ẑ] = [25 − 1, 25 + 1] = [24, 26].


7.7. EXAMPLES IN COMPUTER GRAPHICS 177<br />

Figure 7.5: Genetic formulation for the design of a stable table (Bentley, 1999).<br />

Note that the correlation between x <strong>and</strong> 10 − x was canceled <strong>in</strong> the product. Note<br />

also that the <strong>in</strong>terval associated with ẑ is much closer to the exact range [24, 25],<br />

<strong>and</strong> much better that the estimate computed with <strong>in</strong>terval arithmetic, [16, 36]. Exp<strong>and</strong><strong>in</strong>g<br />

the expression to 10x − x 2 , makes <strong>in</strong>terval arithmetic give a much worse<br />

result, [4, 44], whereas it allows AA to compute the exact values, [24, 25].<br />

Although AA is harder to implement <strong>and</strong> more expensive to execute than <strong>in</strong>terval<br />

arithmetic, manh times the better estimates provided by AA make algorithms<br />

faster as a whole.<br />

7.7 Examples <strong>in</strong> Computer <strong>Graphics</strong><br />

In this section, we give examples of the use of genetic algorithms <strong>and</strong> branch-<strong>and</strong>bound<br />

global optimization <strong>in</strong> Computer <strong>Graphics</strong> problems.<br />

7.7.1 Design<strong>in</strong>g a stable table<br />

We start be describ<strong>in</strong>g a genetic formulation for the design of a table hav<strong>in</strong>g a<br />

square top <strong>and</strong> four feet <strong>and</strong> that is stable (Bentley, 1999). The feet are attached to<br />

the top at positions along the diagonals of the table top <strong>and</strong> they can have different<br />

lengths (Figure 7.5). Clearly, <strong>in</strong> the optimum configuration — the stable one —<br />

the feet are attached to the corners of the table <strong>and</strong> have the same length.


178 CHAPTER 7. GLOBAL OPTIMIZATION<br />

Here is a genetic model for this problem (Bentley, 1999): Individuals represent<br />

the various possible configurations. Each <strong>in</strong>dividual has a s<strong>in</strong>gle cromossome with<br />

eight genes: four values to measure the distance of each foot to the center of the<br />

table <strong>and</strong> four values to measure the length of each foot. Each of these eight values<br />

is represented <strong>in</strong> 8 bits, giv<strong>in</strong>g a cromossome with 64 bits. Thus, the c<strong>and</strong>idate<br />

space has 2 64 ≈ 2 × 10 19 elements. The objective function evaluates the stability<br />

of the table from the position <strong>and</strong> length of the feet. Start<strong>in</strong>g with a r<strong>and</strong>om <strong>in</strong>itial<br />

population, the optimum solution is reached.<br />

7.7.2 Curve fitt<strong>in</strong>g<br />

We now consider the classic curve fitt<strong>in</strong>g problem: a set of po<strong>in</strong>ts on the plane is<br />

given <strong>and</strong> we wish to f<strong>in</strong>d the curve that best fits them. If the search is limited to<br />

a fixed class of functions (for <strong>in</strong>stance, polynomials of degree 3), then the fitt<strong>in</strong>g<br />

problem can be formulated as a least-squares problem <strong>and</strong> solved us<strong>in</strong>g cont<strong>in</strong>uous<br />

optimization on the parameters of the fitt<strong>in</strong>g model, as seen <strong>in</strong> Chapter 4. The<br />

follow<strong>in</strong>g genetic formulation allows a fitt<strong>in</strong>g that is not restricted to a small class<br />

of functions (Sokoryansky & Shafran, 1997).<br />

In this genetic model, <strong>in</strong>dividuals are curves whose fittness is measured by how<br />

well the curve fits the given data po<strong>in</strong>ts. The genotype is an algebraic expression<br />

whose tree is represented l<strong>in</strong>early <strong>in</strong> LISP. For <strong>in</strong>stance, the expression t 2 − t + 1<br />

can be represented by the list (+ (- (* t t) t) 1). This representation<br />

may be <strong>in</strong>convenient for a human reader, but is very suitable to the symbolic<br />

manipulations that are needed dur<strong>in</strong>g the evolution. For <strong>in</strong>stance, crossed reproduction<br />

takes place by simply mix<strong>in</strong>g branches of the trees (see Figure 7.6).<br />

Figure 7.7 shows an example of curve fitt<strong>in</strong>g by expression evolution. The<br />

set of data po<strong>in</strong>ts {x i , y i } consists of five po<strong>in</strong>ts sampled from the curve y =<br />

x 2 − x + 1. The <strong>in</strong>itial population had 100 r<strong>and</strong>om <strong>in</strong>dividuals. The objective<br />

function used to measure the fitness of a curve y = g(x) was<br />

n∑<br />

i=1<br />

|g(x i ) − y i | 3<br />

y i<br />

.<br />

Figure 7.7 shows the best <strong>in</strong>dividuals <strong>in</strong> generations 1, 10, 37 e 64; their<br />

genotypes are given below:


7.7. EXAMPLES IN COMPUTER GRAPHICS 179<br />

Figure 7.6: Crossed reproduction of algebraic expressions.<br />

generation genotype of best <strong>in</strong>dividual<br />

1 (+ 6.15925 (cos (- 0.889367 (% (- t<br />

(- (hs<strong>in</strong> -0.90455) -1.51327)) t))))<br />

10 (* t (/ t 1.06113))<br />

37 (* t (/ (/ t 1.06113) 1.06113))<br />

64 (* (* (cos (s<strong>in</strong> -0.632015)) t) t)<br />

Note that, even though the f<strong>in</strong>al fitt<strong>in</strong>g is good, its expression,<br />

y = x 2 cos(s<strong>in</strong>(−0.632015)) ≈ 0.830511 x 2 ,<br />

does not correspond to the orig<strong>in</strong>al expression y = x 2 −x+1. It is unreasonable to<br />

expect that it is possible to evolve to the orig<strong>in</strong>al expression, because the solution<br />

space conta<strong>in</strong>s much more than just polynomial functions (for <strong>in</strong>stance, the best<br />

<strong>in</strong>dividual <strong>in</strong> generation 10 is a rational function). It is even surpris<strong>in</strong>g that <strong>in</strong> this<br />

example the algorithm has found a polynomial of the correct degree. In any case,<br />

<strong>in</strong> practice we do not know the orig<strong>in</strong>al expression.


180 CHAPTER 7. GLOBAL OPTIMIZATION<br />

Figure 7.7: Curve fitt<strong>in</strong>g by evolution.


7.7. EXAMPLES IN COMPUTER GRAPHICS 181<br />

7.7.3 Colllision detection<br />

Consider a physically based animation, <strong>in</strong> which objects move subject to the action<br />

of <strong>in</strong>ternal <strong>and</strong> external forces, such as gravity. To achieve a realistic animation,<br />

it necessary to know how to detect when two objects colide, because at that<br />

moment new forces appear <strong>in</strong>stantaneously.<br />

The problem of detect<strong>in</strong>g the collision of two objects given parametrically by<br />

S 1 , S 2 : R 2 → R 3 can be formulated as a feasibility problem<br />

S 1 (u 1 , v 1 ) − S 2 (u 2 , u 2 ) = 0,<br />

for (u 1 , v 1 , u 2 , u 2 ) ∈ R 4 . (Note that this problem has three equations, one for each<br />

coord<strong>in</strong>ate <strong>in</strong> R 3 .) If the objects are given implicitly by S 1 , S 2 : R 3 → R, then the<br />

feasibility problem is<br />

S 1 (x, y, z) = 0, S 2 (x, y, z) = 0,<br />

for (x, y, z) ∈ R 3 .<br />

These problems can be solved reliably us<strong>in</strong>g an <strong>in</strong>terval branch-<strong>and</strong>-bound<br />

method. Note that it is not necessary to compute the whole feasible set <strong>in</strong> this<br />

case: it is enough to establish whether it is empty or not. When there is a collision,<br />

the search for the feasible set ˜Ω can stop as soon as a subregion is accepted<br />

as solution. This frequently happens quite early <strong>in</strong> the exploration of the doma<strong>in</strong><br />

Ω.<br />

7.7.4 Surface <strong>in</strong>tersection<br />

When we compute the feasible set for detect<strong>in</strong>g collisions, we are actually solv<strong>in</strong>g<br />

a surface <strong>in</strong>tersection problem. Figure 7.8 shows two parametric surfaces that<br />

<strong>in</strong>tersect each other: a mechanical part <strong>and</strong> a cyl<strong>in</strong>der. Figure 7.8 also shows<br />

the object obta<strong>in</strong>ed by remov<strong>in</strong>g the <strong>in</strong>terior of the cyl<strong>in</strong>der from the mechanical<br />

part. This removal is made by identify<strong>in</strong>g the <strong>in</strong>tersection curve <strong>in</strong> the parametric<br />

doma<strong>in</strong>s of the surfaces. Figure 7.9 shows the projection of the feasible set on each<br />

doma<strong>in</strong>; these projections provide an approximation to the <strong>in</strong>tersection curve that<br />

is used for trimm<strong>in</strong>g, as shown <strong>in</strong> Figure 7.8.


182 CHAPTER 7. GLOBAL OPTIMIZATION<br />

Figure 7.8: Intersection of parametric surfaces (Snyder, 1991).<br />

Figure 7.9: Trimm<strong>in</strong>g curves (Snyder, 1991).


7.7. EXAMPLES IN COMPUTER GRAPHICS 183<br />

Figure 7.10: Intersection of parametric surfaces.<br />

Figure 7.10 shows two other <strong>in</strong>tersect<strong>in</strong>g surfaces. Figure 7.11 shows the doma<strong>in</strong><br />

decompositions computed with <strong>in</strong>terval arithmetic <strong>and</strong> with aff<strong>in</strong>e arithmetic.<br />

The two surfaces are Bézier surfaces — their expressions conta<strong>in</strong> many<br />

correlations, which are exploited by AA when comput<strong>in</strong>g the <strong>in</strong>tersection, yeild<strong>in</strong>g<br />

a better approximation for the same tolerance.


184 CHAPTER 7. GLOBAL OPTIMIZATION<br />

Figure 7.11: Intersection curves computed with <strong>in</strong>terval arithmetic <strong>and</strong> aff<strong>in</strong>e<br />

arithmetic.


7.7. EXAMPLES IN COMPUTER GRAPHICS 185<br />

Figure 7.12: Approximat<strong>in</strong>g an implicit curve (Snyder, 1991).<br />

7.7.5 Implicit curves <strong>and</strong> surfaces<br />

Another problem that is naturally described as a feasibility problem is approximat<strong>in</strong>g<br />

implicit objects: given f : Ω ⊆ R d → R, where d = 2 or d = 3, compute<br />

an approximation to the curve or surface S = f −1 (0) = { x ∈ Ω : f(x) = 0 }.<br />

As described earlier, this problem can be solved by <strong>in</strong>terval branch-<strong>and</strong>-bound<br />

methods, if we are given an <strong>in</strong>clusion function F for f: the doma<strong>in</strong> Ω is subdivided<br />

recursively, <strong>and</strong> we discard those subregions X of Ω for which 0 ∉ F (X).<br />

What is left is a collection of small cells that conta<strong>in</strong> S. This cellular model, which<br />

has dimension d, can be used to extract an approximation of S of the right dimension,<br />

d − 1, by l<strong>in</strong>ear <strong>in</strong>terpolation of the values of f at the vertices, as shown <strong>in</strong><br />

Figure 7.12.<br />

Figure 7.13 shows that the use of aff<strong>in</strong>e arithmetic can provide better approximations<br />

than <strong>in</strong>terval arithmetic, with<strong>in</strong> the same tolerance. The curve shown <strong>in</strong><br />

Figure 7.13 is the quartic given by<br />

x 2 + y 2 + xy − (xy) 2 /2 − 1/4 = 0.<br />

Note how AA was able to exploit the correlations <strong>in</strong> the terms of this expression<br />

to compute a better approximation. Interval arithmetic was unable to compute an<br />

approximation with the right number of connected components.


186 CHAPTER 7. GLOBAL OPTIMIZATION<br />

Figure 7.13: Approximat<strong>in</strong>g an implicit curve with <strong>in</strong>terval arithmetic <strong>and</strong> aff<strong>in</strong>e<br />

arithmetic.<br />

7.7.6 Adaptive approximation of parametric surfaces<br />

The simplest method for approximat<strong>in</strong>g a parametric surface by a polygonal mesh<br />

is to use a uniform decomposition of its doma<strong>in</strong>. However, it is generally necessary<br />

to use a very f<strong>in</strong>e mesh <strong>in</strong> the doma<strong>in</strong> to capture all details of the surface.<br />

This method is <strong>in</strong>efficient <strong>and</strong> generates very large models, because <strong>in</strong> general<br />

surfaces have details on several scales <strong>and</strong> it is not necessary to use a f<strong>in</strong>e mesh <strong>in</strong><br />

the whole doma<strong>in</strong>.<br />

If the goal is to compute a polygonal approximation such that the distance<br />

between any two vertices is at most at user-selected tolerance ε, then we can<br />

formulate the problem of approximat<strong>in</strong>g a parametric surface S : R 2 → R 3 as a<br />

feasibility problem <strong>in</strong> which the constra<strong>in</strong>ts are of the form<br />

|S(u, v) − S(u ′ , v ′ )| < ε.<br />

Figure 7.14 shows a uniform decomposition <strong>and</strong> a decomposition adapted to<br />

the criterium above, along with the correspond<strong>in</strong>g polygonal decompositions.<br />

Note how the po<strong>in</strong>ts <strong>in</strong> the adapted approximation are concentrated near the details<br />

of the surface.


7.7. EXAMPLES IN COMPUTER GRAPHICS 187<br />

Figure 7.14: Approximat<strong>in</strong>g parametric surfaces (Snyder, 1991).


188 CHAPTER 7. GLOBAL OPTIMIZATION


Chapter 8<br />

Probability <strong>and</strong> <strong>Optimization</strong><br />

In this chapter we present the basic pr<strong>in</strong>ciples of the <strong>in</strong>formation theory. We will<br />

discuss the concept of entropy <strong>in</strong> thermodynamics <strong>and</strong> the optimization pr<strong>in</strong>ciples<br />

that are based on entropy.<br />

8.1 The Second Law of Thermodynamics<br />

The science of thermodynamics began <strong>in</strong> the first part of the XIX century with the<br />

study of gases with a motivation to improve the efficiency of the steam eng<strong>in</strong>es.<br />

Nonetheless, the pr<strong>in</strong>ciples of thermodynamics are valid for physical systems of<br />

any nature.<br />

Thermodynamics is an experimental science, based on a small number of pr<strong>in</strong>ciples,<br />

which are generalizations made from experiments. These laws are concerned<br />

only with the macroscopic properties of matter (e.g. large scale). Thermodynamics<br />

play a fundamental role <strong>in</strong> physics <strong>and</strong> cont<strong>in</strong>ues valid, even after the<br />

advance of physics <strong>in</strong> the XX century.<br />

In the first part of the XIX century, researchers observed that various processes<br />

occur <strong>in</strong> a spontaneous way. For example, if two bodies with different temperatures<br />

are put <strong>in</strong> contact with each other, these bodies will change their temperatures<br />

such that they both will have the same temperature. The converse behavior<br />

(i.e., two bodies with the same temperature put <strong>in</strong> contact, <strong>and</strong> one body gets<br />

hotter <strong>and</strong> the other cooler) never occurs. The second law of thermodynamics de-<br />

189


190 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

term<strong>in</strong>es <strong>in</strong> which direction a process occurs. In can be stated <strong>in</strong> a short manner<br />

as follows: The entropy <strong>in</strong> the universe always <strong>in</strong>creases.<br />

The second law of thermodynamics says that <strong>in</strong> any process that occurs (<strong>in</strong><br />

time) with<strong>in</strong> an isolated system (i.e., which does not exchanges with the exterior,<br />

mass or energy), the entropy does not decrease. This is the only physical law<br />

that prescribes a preferencial direction for the time doma<strong>in</strong>. For that reason, it<br />

is of fundamental importance <strong>in</strong> physics, as we can verify from the words of the<br />

physicists Arthur Edd<strong>in</strong>gton <strong>and</strong> Albert E<strong>in</strong>ste<strong>in</strong>.<br />

”The law that entropy always <strong>in</strong>creases – the second law of thermodynamics –<br />

holds, I th<strong>in</strong>k, the supreme position among the laws of Nature. If someone po<strong>in</strong>ts<br />

out to you that your pet theory of the universe is <strong>in</strong> disagreement with Maxwell’s<br />

equations - then so much worse for Maxwell equations. If it is found to be contradicted<br />

by observation - well these experimentalists do bungle th<strong>in</strong>gs sometimes.<br />

But if your theory is found to be aga<strong>in</strong>st the second law of Thermodynamics, I can<br />

give you no hope; there is noth<strong>in</strong>g for it but to collapse <strong>in</strong> deepest humiliation”.<br />

Arthur Edd<strong>in</strong>gton(Edd<strong>in</strong>gton, 1948).<br />

”[A law] is more impressive the greater the simplicity of its premises, the more<br />

different are the k<strong>in</strong>ds of th<strong>in</strong>gs it relates, <strong>and</strong> the more extended its range of applicability.<br />

Therefore, the deep impression which classical thermodynamics made<br />

on me. It is the only physical theory of universal content, which I am conv<strong>in</strong>ced,<br />

that with<strong>in</strong> the framework of applicability of its basic concepts will never be overthrown”.<br />

Albert E<strong>in</strong>ste<strong>in</strong>(Kle<strong>in</strong>, 1967).<br />

In thermodynamics, the variation of entropy is def<strong>in</strong>ed as the heat variation<br />

over the temperature<br />

dS = dQ T<br />

From statistical thermodynamics, we have Boltzmann formula that def<strong>in</strong>es the<br />

entropy of a physical system<br />

S = k log W<br />

where S denotes the entropy, k = 1, 381.10 −16 is the Boltzmann constant <strong>and</strong> W<br />

is the number of acessible microstates (they are the possible values that determ<strong>in</strong>e<br />

the state of each particle <strong>in</strong> the system). Quantum mechanics says that W is a<br />

f<strong>in</strong>ite <strong>in</strong>teger number.


8.2. INFORMATION AND MEASURES 191<br />

Boltzmann himself noticed that the entropy measures the lack of organization<br />

<strong>in</strong> a physical system. The identification of entropy with <strong>in</strong>formation was clarified<br />

later with the explanation of Szilard to the Maxwell devil paradox (described <strong>in</strong><br />

the next paragraph), <strong>and</strong> also with the <strong>in</strong>formation theory of Shannon.<br />

Maxwell propose the follow<strong>in</strong>g experience: Let a chamber divided <strong>in</strong> two parts<br />

A <strong>and</strong> B, separated by a partition (which does not transfer heat), with a small mov<strong>in</strong>g<br />

door. Em each part, we have samples of a gas with the same temperature. The<br />

gas temperature is determ<strong>in</strong>ed by an statistical “mean” of the velocities of the gas<br />

particles. However, <strong>in</strong> a gas sample, there are slow <strong>and</strong> fast particles (respectively<br />

with lower <strong>and</strong> higher velocities than the statistical mean.). A devil will control<br />

the door between A <strong>and</strong> B, lett<strong>in</strong>g pass from A to B only slow particles <strong>and</strong> lett<strong>in</strong>g<br />

pass from B to A only fast particles. With this, the temperature of A will <strong>in</strong>crease<br />

<strong>and</strong> the temperature of B will decrease. This is someth<strong>in</strong>g that is not accord<strong>in</strong>g to<br />

our <strong>in</strong>tuition, <strong>and</strong> also contradicts the second law of thermodynamics.<br />

Szilard (German physicist, who participated of the first experiment of a nuclear<br />

reaction, <strong>in</strong> 1942, <strong>in</strong> the University of Chicago) <strong>in</strong> 1929 propose the follow<strong>in</strong>g explanation<br />

for the Maxwell devil paradox. The devil must be very well <strong>in</strong>formed<br />

about the velocity of all gas particles. Szilard identified <strong>in</strong>formation with entropy,<br />

<strong>and</strong> showed that the available <strong>in</strong>formation <strong>in</strong>creases the entropy of the system<br />

chamber + devil. The idea of his argument is that, from quantum mechanics, the<br />

energy is quantized. By associat<strong>in</strong>g a packet of energy with a bit of <strong>in</strong>formation<br />

about a particle, he calculated the total entropy of the system chamber+devil,<br />

<strong>and</strong> showed that the entropy <strong>in</strong>creases. This argument demonstrates that any system<br />

which manipulates <strong>in</strong>formation (for example, a bra<strong>in</strong> or a computer) can be<br />

considered as a thermo-mach<strong>in</strong>e, <strong>and</strong> the entropy always <strong>in</strong>creases <strong>in</strong> the system.<br />

Accord<strong>in</strong>g to (Resnikoff, 1989), a computer (<strong>in</strong> the 80’s) would dissipate energy<br />

<strong>in</strong> the order of 10 10 times this fundamental physical limit.<br />

8.2 Information <strong>and</strong> Measures<br />

Before present<strong>in</strong>g the <strong>in</strong>formation theory <strong>in</strong> the orig<strong>in</strong>al form described by Shannon<br />

<strong>in</strong> (Shannon, 1948), we will comment on the ga<strong>in</strong> of <strong>in</strong>formation as a mesure.<br />

We wish to measure the value of a variable x. Let us assume that the measure<br />

is taken result<strong>in</strong>g <strong>in</strong> an <strong>in</strong>terval [a, b]. We have a previous measure of x corre-


192 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

spond<strong>in</strong>g to the <strong>in</strong>terval [0, 1]. From a new measure, we obta<strong>in</strong> that x belongs to<br />

an <strong>in</strong>terval ∆x (conta<strong>in</strong>ed <strong>in</strong> [0, 1]). What is the ga<strong>in</strong> of <strong>in</strong>formation with this new<br />

measure?<br />

We can answer this question, consider<strong>in</strong>g the space (<strong>in</strong> bits, or symbols) required<br />

to represent the ga<strong>in</strong> of this new measure. To make th<strong>in</strong>ks simple, let<br />

us suppose that 0 < x < 1 is a number <strong>in</strong> decimal format, for example<br />

x = 0, 23765..., <strong>and</strong> that for each new measure we obta<strong>in</strong> new digits of the decimal<br />

representation of x. For example, <strong>in</strong> the first measure, we obta<strong>in</strong> 23, <strong>and</strong> <strong>in</strong><br />

the second measure we obta<strong>in</strong> 765. The space necessary to represent the <strong>in</strong>formation<br />

ga<strong>in</strong> of the first <strong>and</strong> second measures is respectively 2 <strong>and</strong> 3 digits. Thus, we<br />

conclude that for an <strong>in</strong>formation ga<strong>in</strong> of 2 digits, it is necessary to subdivide the<br />

measure <strong>in</strong>terval by 100. For an <strong>in</strong>formation ga<strong>in</strong> of 3 digits is necessary to divide<br />

it by 1000, <strong>and</strong> so on... With a little more rigor <strong>in</strong> the arguments above, <strong>and</strong> the<br />

<strong>in</strong>clusion of some “desirable” hypotesis for the <strong>in</strong>formation ga<strong>in</strong> of a measure, it<br />

is possible to demonstrate that this <strong>in</strong>formation ga<strong>in</strong> is<br />

I = − log( ∆x 2<br />

∆x 1<br />

)<br />

where ∆x 2 is the <strong>in</strong>terval of the new measure <strong>and</strong> ∆x 1 is the <strong>in</strong>terval of the previous<br />

measure. In a b<strong>in</strong>ary system, the <strong>in</strong>formation is expressed by the number<br />

of bits <strong>and</strong> the base of the logarithm should be 2. For the <strong>in</strong>formation <strong>in</strong> decimal<br />

digits the logarithm is <strong>in</strong> base 10.<br />

Suppose we observe a system with a f<strong>in</strong>ite number of states S 1 , S 2 , ..., S n . Each<br />

state S i has a occurence probability p i , with<br />

p 1 + p 2 + ... + p n = 1<br />

The ga<strong>in</strong> of <strong>in</strong>formation with the observation of the occurence <strong>in</strong> state S i is<br />

− log( ∆x 2<br />

∆x 1<br />

) = − log( p i<br />

) = − log p 1 i. Therefore, the expectation on the <strong>in</strong>formation<br />

ga<strong>in</strong> is<br />

n∑<br />

S = − p i log p i<br />

i=1


8.3. INFORMATION THEORY 193<br />

8.3 Information Theory<br />

Information theory began with Shannon <strong>in</strong> 1948. Shannon´s start<strong>in</strong>g po<strong>in</strong>t for the<br />

formula of entropy was axiomatic. He imposed that the entropy (or <strong>in</strong>formation)<br />

had some properties <strong>and</strong> demonstrated that there is only one possible formula<br />

for the ga<strong>in</strong> of <strong>in</strong>formation under these assumptions. Below we show a small<br />

excerpt from his famous paper (Shannon, 1948), where he describes the desirable<br />

properties for the entropy.<br />

”Suppose we have a set of possible events whose probabilities of occurrence<br />

are p 1 , p 2 , . . . , p n (p 1 + p 2 + . . .+ p n = 1). These probabilities are known but that<br />

is all we know concern<strong>in</strong>g which event will occur. Can we f<strong>in</strong>d a measure of how<br />

much choice is <strong>in</strong>volved <strong>in</strong> the selection of the event or of how uncerta<strong>in</strong> we are of<br />

the outcome?<br />

If there is such a measure, say H(p 1 , p 2 , . . . , p n ), it is reasonable to require of<br />

it the follow<strong>in</strong>g properties:<br />

1. H should be cont<strong>in</strong>uous <strong>in</strong> the p i .<br />

2. If all the p i are equal, p i = 1 , then H should be a monotonic <strong>in</strong>creas<strong>in</strong>g<br />

n<br />

function of n. With equally likely events there is more choice, or uncerta<strong>in</strong>ty,<br />

when there are more possible events.<br />

3. If a choice is broken down <strong>in</strong>to two successive choices, the orig<strong>in</strong>al H<br />

should be the weighted sum of the <strong>in</strong>dividual values of H. The mean<strong>in</strong>g<br />

of this is illustrated <strong>in</strong> Fig. 8.1. On the left we have three possibilities<br />

p 1 = 1, p 2 2 = 1, p 3 3 = 1 . On the right we first choose between two possibilities<br />

each with probability 1 , <strong>and</strong> if the second occurs make another<br />

6<br />

2<br />

choice with probabilities 2, 1 . The f<strong>in</strong>al results have the same probabilities<br />

3 3<br />

as before. We require, <strong>in</strong> this special case, that<br />

H( 1 2 , 1 3 , 1 6 ) = H(1 2 , 1 2 ) + 1 2 H(2 3 , 1 3 )<br />

The coefficient 1 2<br />

is because this second choice only occurs half the time.”<br />

Property 3 <strong>in</strong>dicates that, if for example, we have NL symbols with the same<br />

probability <strong>and</strong> we group these NL symbols <strong>in</strong> N groups of L symbols, <strong>and</strong> if the


194 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

Figure 8.1: Decomposition from the choice of three possibilities.<br />

<strong>in</strong>formation for an output symbol is given <strong>in</strong> two steps: first <strong>in</strong>dicat<strong>in</strong>g the group<br />

<strong>and</strong> second the specific element of this group. Then, the total <strong>in</strong>formation is the<br />

sum of <strong>in</strong>formation about the group plus the symbol (<strong>in</strong>side the group), that is,<br />

I(NL) = I(N) + I(L).<br />

Shannon demonstated that the only H that satisfies these 3 properties is of the<br />

form:<br />

H = −k ∑ i<br />

p i log p i<br />

where k is a positive constant, that without loss of generality we can take to be<br />

k = 1.<br />

From now on, we denote this quantity by entropy, whose formula is:<br />

S = −<br />

n∑<br />

p i log p i<br />

i=1<br />

In the figure 8.2 we have the graph of the entropy <strong>in</strong> bits for two r<strong>and</strong>om variables.<br />

That is, S = −p log 2 p − (1 − p) log 2 (1 − p).


8.4. PROPERTIES OF ENTROPY 195<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0 0.2 0.4 0.6 0.8 1<br />

p<br />

Figure 8.2: Graph of S = −p log 2 p − (1 − p) log 2 (1 − p)<br />

8.4 Properties of Entropy<br />

Before we present the ma<strong>in</strong> properties of entropy, we need to establish the follow<strong>in</strong>g<br />

notatation:<br />

∑<br />

Let p = (p 1 , p 2 , . . . , p n ) be a probability distribution with p i 0 <strong>and</strong> n p i =<br />

1.<br />

We denote the entropy as:<br />

S n (p) = −<br />

n∑<br />

p i log p i<br />

S<strong>in</strong>ce p i log p i is not def<strong>in</strong>ed for p i = 0 we take 0 log 0 = lim<br />

p→0<br />

p log p = 0.<br />

i=1<br />

1. For the n degenerate distributions ∆ 1 = (1, 0, ..., 0), ∆ 2 =<br />

(0, 1, ..., 0), ..., ∆ n = (0, 0, ..., 1), we have S n (∆ i ) = 0, which is desirable,<br />

because <strong>in</strong> this case there is no uncerta<strong>in</strong>ty <strong>in</strong> the output.<br />

2. For any probability distribution that is not degenerate we have S n (p) > 0,<br />

which is desirable, because there is uncerta<strong>in</strong>ty <strong>in</strong> the output.<br />

3. S n (p) does not change if we permute <strong>in</strong> p 1 , p 2 , . . . , p n . This property is desirable<br />

because the order of p ′ is <strong>in</strong> the probability distribution cannot change<br />

the entropy.<br />

i=1


196 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

4. The entropy does not change with the <strong>in</strong>clusion of an impossible event, that<br />

is:<br />

S n+1 (p 1 , p 2 , . . . , p n , 0) = S n (p 1 , p 2 , . . . , p n )<br />

5. S n (p) is a concave function with maximum at p 1 = p 2 = . . . = p n = 1 n . In<br />

this case, the value of the entropy is<br />

S n (p) = −<br />

8.5 Symbol Cod<strong>in</strong>g<br />

n∑<br />

( 1<br />

p i log p i = −n<br />

n n)<br />

log 1 = log n<br />

i=1<br />

An <strong>in</strong>terest<strong>in</strong>g application of the <strong>in</strong>formation theory is the problem of optimal<br />

signal cod<strong>in</strong>g. Let an output alphabet be α = {A, B, ..., Z} <strong>and</strong> we know the<br />

probability of the symbols p A , p B , . . . , p Z . What is the m<strong>in</strong>imum number of bits<br />

that need to be used to encode the set of symbols of the output of some <strong>in</strong>formation<br />

source?<br />

In order to achieve the m<strong>in</strong>imum number of bits it is necessary to use a variable<br />

lenght code for each symbol, with less bits assigned to the most frequent<br />

symbols. Furthermore, we would like to avoid separators between the symbols<br />

(which would be an extra symbol).<br />

A concrete example is given next. We will illustrate the symbol cod<strong>in</strong>g problem<br />

with a source that only emits the symbols A, B, C, D with the respective<br />

probabilities 1, 1, 1, 1 . In the table below we have two possible encod<strong>in</strong>gs:<br />

2 4 8 8<br />

Symbol Probability Encond<strong>in</strong>g α Encod<strong>in</strong>g β<br />

A 1/2 1 00<br />

B 1/4 01 01<br />

C 1/8 001 10<br />

D 1/8 000 11<br />

Table 1 - Codes α <strong>and</strong> β of the symbols A, B, C, D.<br />

A sufficient condition for a codification to be uniquely decoded is that the code<br />

of a symbol should not be a prefix of the code for any other symbol. This is


8.6. DECISION TREES 197<br />

the case of the example above. A codification that satisfy this condition can be<br />

structured <strong>in</strong> the form of a tree. In Figure 8.3 we have code trees α <strong>and</strong> β.<br />

Let l(x) be the size of the code for symbol x <strong>and</strong> p(x) its probablility of ocurrence.<br />

The average length of a codification can be calculated with the formula:<br />

L = ∑ x<br />

l(x)p(x)<br />

Shannon demonstrated <strong>in</strong> his fundamental theorem for noiseless channels<br />

(Shannon, 1948) that there is no codification such that its average length<br />

L is less than the entropy S(X). Furthermore, given ε > 0 there is a codification<br />

such that L < S(X) + ε.<br />

The average lenght of the codes α <strong>and</strong> β of the example above <strong>and</strong> the associated<br />

entropy are respectively:<br />

L α = 1 2 .1 + 1 4 .2 + 1 8 .3 + 1 8 .3 = 7 4<br />

L β = 1 2 .2 + 1 4 .2 + 1 8 .2 + 1 8 .2 = 2<br />

H = − 1 2 . log 1 2 − 1 4 . log 1 4 − 2.1 8 log 1 8 = 1 2 + 1 2 + 3 4 = 7 4<br />

Note that the codification α is optimal, because L α = H.<br />

8.6 Decision Trees<br />

In this section, we will present the concept of decision trees. Let´s see one example:<br />

Imag<strong>in</strong>e a game with two participants P <strong>and</strong> R. Player R chooses <strong>in</strong> secrecy<br />

an object O <strong>and</strong> player P has to discover which is the object by ask<strong>in</strong>g questions to<br />

R, that can only answer YES or NO. Simplify<strong>in</strong>g the problem, suppose there are<br />

4 possible objects for R to choose: A, B, C, D <strong>and</strong> that we know the probabilities<br />

that R chooses these objects:<br />

p A = 1 2 , p B = 1 4 , p C = 1 8 , p D = 1 8<br />

In figure 8.4 we have two schemes for questions. The tree of figure 8.4(a)<br />

<strong>in</strong>dicates that <strong>in</strong>itially we ask the question: “The chosen object is A?”. If the


198 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

Figure 8.3: Codification trees (a) α (b) β.


8.6. DECISION TREES 199<br />

Figure 8.4: Trees for schemes (a) <strong>and</strong> (b)<br />

answer is YES (represented by “1” <strong>in</strong> the figure), we know with certa<strong>in</strong>ty that the<br />

object is A. If the answer is NO (“0”), we ask the question: “Is the object B?”,<br />

<strong>and</strong> so on...<br />

Notice that this problem if decision trees is equivalent to the problem of symbol<br />

cod<strong>in</strong>g, as the similarity of Figures 8.3 <strong>and</strong> 8.4 <strong>in</strong>dicates.<br />

The codification α <strong>in</strong> table 1 <strong>in</strong> section 8.5 is the codification of questions <strong>in</strong><br />

the tree of figure 8.4(a). For example, code ”1” of symbol A <strong>in</strong> the table <strong>in</strong>dicates<br />

the certa<strong>in</strong>ty of A if we have the positive answer for the first question. Code<br />

”01” of symbol B <strong>in</strong> table <strong>in</strong>dicates the certa<strong>in</strong>ty of symbol B if the first answer is<br />

negative <strong>and</strong> the second is positive. The codification β <strong>in</strong> table 1 is the codification<br />

of questions <strong>in</strong> the tree of figure 8.4(b).<br />

The results of section 8.5 apply here <strong>and</strong>, therefore, the average number of<br />

questions cannot be less than the entropy. The scheme of questions <strong>in</strong> figure<br />

8.4(a) is optima, because we have an average of 7 questions, which is the value of<br />

4<br />

the entropy. In the scheme of questions for figure 8.4(b) we have an average of 2<br />

questions to know the chosen object.


200 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

8.7 Conditional Entropy<br />

Before we formulate the concept of conditional entropy we will <strong>in</strong>troduce the<br />

notation for r<strong>and</strong>om variables.<br />

We can consider the output of an <strong>in</strong>formation source as the realization of a<br />

r<strong>and</strong>om variable, denoted by X.<br />

The r<strong>and</strong>om variable X is def<strong>in</strong>ed by the alphabet of possible values X can<br />

assume: {x 1 , x 2 , . . . , x n } <strong>and</strong> by its probability distribution p = (p 1 , p 2 , . . . , p n ),<br />

where p i = P {X = x i }<br />

∑<br />

The expectation of X is E [X] = n x i p i<br />

i=1<br />

We denote by S(X) the entropy governed by the distribution p.<br />

With this notation we can consider a new r<strong>and</strong>om variable Z(X) =<br />

∑<br />

− log P {X}, thus E [Z] = E [− log P {X}] = − n p i . log p i = S(X)<br />

From this formula we have that the entropy is a mean <strong>and</strong> the value − log p x is<br />

the quantity of <strong>in</strong>formation supplied by the output of x.<br />

The entropy correspond<strong>in</strong>g to the output sequence of x, y is:<br />

S(X, Y ) = − ∑ x,y<br />

P {X = x, Y = y} log P {X = x, Y = y}<br />

i=1<br />

If the source is <strong>in</strong>dependent of time <strong>and</strong> memoryless, we have that:<br />

S(X 1 , X 2 , ..., X n ) = nS (X)<br />

It can be proved that the entropy is maximal if the r<strong>and</strong>om variables are <strong>in</strong>dependent.<br />

In general, we are <strong>in</strong>terested <strong>in</strong> the entropy per output unity, def<strong>in</strong>ed<br />

as:<br />

S = 1 k lim<br />

k→∞ S(X 1, X 2 , ..., X k )<br />

The def<strong>in</strong>ition of conditional entropy is:<br />

S(X|Y ) = − ∑ x,y<br />

P {X = x, Y = y} log P {X = x|Y = y}<br />

We have thar S(X|Y ) = − ∑ P {Y = y} S {X|Y = y}, that is, the condi-<br />

y<br />

tional entropy is an average of entropies.


8.8. MUTUAL INFORMATION 201<br />

An important relation of conditional entropy is that:<br />

S(X, Y ) = S(X) + S(Y |X)<br />

This equation can be <strong>in</strong>terpreted <strong>in</strong> the follow<strong>in</strong>g way: the uncerta<strong>in</strong>ty of two<br />

r<strong>and</strong>om variables is equal tho the uncerta<strong>in</strong>ty of the first variable plus the mean<br />

uncerta<strong>in</strong>ty of the second variable with the first variable known.<br />

The last property of conditional entropy is that:<br />

S(X 1 |X 2 , ..., X k , X k+1 ) S(X 1 |X 2 , ..., X k )<br />

That is, the uncerta<strong>in</strong>ty decreases as more <strong>in</strong>formation is known.<br />

8.8 Mutual Information<br />

The mutual <strong>in</strong>formation between two r<strong>and</strong>om variables X <strong>and</strong> Y is def<strong>in</strong>ed as<br />

I(X; Y ) = S(X) − S(X|Y )<br />

We have that:<br />

∑<br />

I(X; Y ) = S(X) − S(X|Y ) = P (x, y) log P (x|y)<br />

x,y<br />

∑<br />

P (x, y) log P (x,y) = S(X) + S(Y ) − S(X, Y ) = I(Y ; X)<br />

P (x)P (y)<br />

x,y<br />

P (x)<br />

=<br />

The above equation tells us that the mutual <strong>in</strong>formation is symmetric <strong>and</strong> also<br />

that:<br />

0 I(X; Y ) m<strong>in</strong> {S(X), S(Y )}<br />

We have that I(X; Y ) = 0 if <strong>and</strong> only if X <strong>and</strong> Y are <strong>in</strong>dependent <strong>and</strong><br />

I(X; Y ) = m<strong>in</strong> {S(X), S(Y )} if <strong>and</strong> only if X determ<strong>in</strong>es Y or Y determ<strong>in</strong>es<br />

X.<br />

F<strong>in</strong>ally, we note that:<br />

I(X; Y ) = ∑ x,y<br />

P (x, y) log<br />

[<br />

P (x, y)<br />

P (x)P (y) = E log<br />

]<br />

P (x, y)<br />

P (x)P (y)<br />

We can <strong>in</strong>terpret log P (x,y)<br />

<strong>and</strong> y.<br />

P (x)P (y)<br />

as the mutual <strong>in</strong>formation between the symbols x


202 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

8.9 Divergence<br />

Very often it is necessary to measure the distance between two probability distributions<br />

p e q. The most common measure was developed by Kullback <strong>and</strong> Leibler<br />

<strong>in</strong> 1951 <strong>and</strong> is known as distance (or divergence or measure) of Kullback-Leibler,<br />

relative entropy, discrim<strong>in</strong>ant, or simply divergence.<br />

The divergence is def<strong>in</strong>ed as:<br />

D (p; q) =<br />

n∑<br />

i=1<br />

p i . log p i<br />

q i<br />

where we assume that if q i = 0 then p i = 0 <strong>and</strong> that 0 log 0 0 = 0.<br />

The divergence is not symmetric, that is, D (p; q) ≠ D (q; p) <strong>in</strong> general. Eventually,<br />

we can use the symmetric divergence, def<strong>in</strong>ed by<br />

J (p; q) = D (p; q) + D (q; p)<br />

As the entropy, the divergence has several <strong>in</strong>terest<strong>in</strong>g properties, such as:<br />

1. D (p; q) = 0 only if p = q. If p ≠ q than D (p; q) > 0.<br />

2. D (p; q) is cont<strong>in</strong>uous <strong>and</strong> convex <strong>in</strong> p e q.<br />

3. D (p; q) is permutationally symmetric, that is, D (p; q) does not alter if the<br />

pairs (p 1 , q 1 ), . . . , (p n , q n ) are permuted.<br />

4. If U = ( 1 , 1 , . . . , 1 ) is the uniform distribution, than:<br />

n n n<br />

D (p; U) =<br />

n∑<br />

p i . log p n∑<br />

i<br />

1/n = log n + p i . log p i = log n − S n (p)<br />

i=1<br />

i=1<br />

5. I(X; Y ) = ∑ P (x, y) log P (x,y)<br />

P (x)P (y)<br />

x,y<br />

q = P (x)P (y).<br />

= D (p; q), where p = P (x, y) <strong>and</strong>


8.10. CONTINUOUS RANDOM VARIABLES 203<br />

8.10 Cont<strong>in</strong>uous R<strong>and</strong>om Variables<br />

In this section, we give def<strong>in</strong>itions of entropy, conditional entropy, divergence<br />

<strong>and</strong> mutual <strong>in</strong>formation for cont<strong>in</strong>uous r<strong>and</strong>om variables. They are similar to the<br />

discrete case, with the basic difference of chang<strong>in</strong>g the symbol ∑ by ∫ .<br />

Let X be a cont<strong>in</strong>uous r<strong>and</strong>om variable, with probability distribution f X (x).<br />

The entropy of X is called differential entropy, <strong>and</strong> is def<strong>in</strong>ed by<br />

∫<br />

S(X) = − f X (x) log f X (x)dx = −E [log f X (x)]<br />

The differential entropy has some of the same properties of the ord<strong>in</strong>ary entropy<br />

(for discrete variables), however it can be negative <strong>and</strong> it is not an absolut<br />

measure of the r<strong>and</strong>omness of X.<br />

The mutual <strong>in</strong>formation of cont<strong>in</strong>uous variables has the same def<strong>in</strong>ition as <strong>in</strong><br />

the discrete case:<br />

I(X; Y ) = S(X) − S(X|Y )<br />

where the conditional entropy is def<strong>in</strong>ed by<br />

∫ ∫<br />

S(X|Y ) = − f X,Y (x, y) log f X (x|y)dxdy<br />

The relation below is valid:<br />

∫ ∫<br />

I(X; Y ) = f X (x, y) log f ∫ ∫<br />

X(x|y)<br />

f X (x) dxdy =<br />

The divergence is def<strong>in</strong>ed by:<br />

∫<br />

D (f X ; g X ) =<br />

f X (x) log f X(x)<br />

g X (x) dx<br />

f X,Y (x, y) log f X,Y (x, y)<br />

f X (x)f Y (y) dxdy<br />

The divergence for cont<strong>in</strong>uous variable has some unique properties:<br />

• D (f X ; g X ) = 0 only if f X = g X . If f X ≠g X than D (f X ; g X ) > 0.<br />

• The divergence is <strong>in</strong>variant <strong>in</strong> relation to the follow<strong>in</strong>g variations <strong>in</strong> the components<br />

of the vector x: Permutations <strong>in</strong> the order of components, scal<strong>in</strong>g<br />

<strong>in</strong> amplitude <strong>and</strong> monotonous non-l<strong>in</strong>ear transformations.<br />

Also, we have I(X; Y ) = D (f X,Y ; f X f Y ).


204 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

8.11 Pr<strong>in</strong>ciple of Maximum Entropy - MaxEnt<br />

In Nature, due to the second law of Thermodynamics, the entropy of a system<br />

always <strong>in</strong>creases (or better, it never decreases). However, <strong>in</strong> general, the entropy<br />

of a physical system tends to the distribution of maximum entropy, respect<strong>in</strong>g the<br />

physical restrictions that the system imposes.<br />

The Pr<strong>in</strong>ciple of Maximum Entropy is based on this idea, that is, to f<strong>in</strong>d the<br />

distribution of maximum entropy, respect<strong>in</strong>g the imposed restrictions.<br />

Let X be a r<strong>and</strong>om variable that can assume the values x 1 , x 2 , . . . , x n , but with<br />

the correspond<strong>in</strong>g probabilities p 1 , p 2 , . . . , p n unknown. However, we know some<br />

restrictions about the probability distributions, such as the mean, variance, <strong>and</strong><br />

some moments of X. How to choose the probabilities p 1 , p 2 , . . . , p n ?<br />

This problem, <strong>in</strong> general, has <strong>in</strong>f<strong>in</strong>ite solutions. We have to look for a function<br />

to be optimized such that we f<strong>in</strong>d a unique solution. The pr<strong>in</strong>ciple of maximum<br />

entropy, <strong>in</strong>itially proposed by Jaynes <strong>in</strong> 1957, offers a solution to this problem.<br />

The pr<strong>in</strong>ciple of maximum entropy, which we will call MaxEnt, states that<br />

from the probability distributions that satisfy the restrictions, we have to choose<br />

the one with maximum entropy.<br />

The MaxEnt is a very general pr<strong>in</strong>ciple <strong>and</strong> has applications <strong>in</strong> various fields.<br />

The first applications were <strong>in</strong> statistical mechanics <strong>and</strong> thermodynamics. There<br />

are applications of MaxEnt <strong>in</strong> many areas, such as urban plann<strong>in</strong>g, economy,<br />

queu<strong>in</strong>g theory, spectral analysis models for population growth, language. In<br />

graphics <strong>and</strong> vision, it can be applied to pattern recognition, image process<strong>in</strong>g<br />

<strong>and</strong> many other problems. An important application <strong>in</strong> statistical themodynamics<br />

is to f<strong>in</strong>d the probability distribution which describe the (micro)states of a system,<br />

based on measures (average values of functions that are restrictions to the<br />

problem) done <strong>in</strong> macroscopic scale.<br />

An <strong>in</strong>tuitive explanation for the MaxEnt is that if we choose a distribution with<br />

less entropy than the maximum, this reduction <strong>in</strong> entropy comes from the use<br />

of <strong>in</strong>formation that is not directly provided, but somehow used <strong>in</strong> the problem<br />

<strong>in</strong>advertently. For example, if we do not have any restrictions, the distribution<br />

chosen by the MaxEnt is the uniform distribution. If for some reason, we do<br />

not choose an uniform distribution, that is, if for some i <strong>and</strong> j we have p i ><br />

p j , then some <strong>in</strong>formation not provided was used to cause this asymmetry <strong>in</strong> the<br />

probability distribution.


8.11. PRINCIPLE OF MAXIMUM ENTROPY - MAXENT 205<br />

MaxEnt can be formulated mathematically as follows: Let X be a r<strong>and</strong>om<br />

variable with alphabet {x 1 , x 2 , . . . , x n } <strong>and</strong> with probability distribution p =<br />

(p 1 , p 2 , . . . , p n ). We want to maximize the measure of Shannon, subject to several<br />

l<strong>in</strong>ear restrictions.<br />

⎧<br />

∑<br />

⎪⎨ arg max H(X) = − n p i log p i<br />

⎪⎩ subject to p i 0, ∀i e<br />

i=1<br />

∑ n<br />

i=1<br />

p i = 1 <strong>and</strong><br />

n ∑<br />

i=1<br />

p i g ri = a r , r = 1, 2, ..., m.<br />

Because the doma<strong>in</strong> is convex <strong>and</strong> H(X) is concave, we have a problem of<br />

convex optimization, which guarantees the existence of a unique solution <strong>and</strong> efficient<br />

computational methods to solve the problem.<br />

The above problem has a particular structure which guarantees that <strong>in</strong> the solution<br />

we have p i 0, ∀i (because p i are exponentials, as we will see next). In this<br />

way, we do not have to worry about <strong>in</strong>equality restrictions.<br />

Apply<strong>in</strong>g the KKT conditions (or Lagrange Multipliers) we have that the solution<br />

is:<br />

p i = exp(−λ 0 − λ 1 g 1i − λ 2 g 2i − λ m g mi ), i = 1, 2, ..., n.<br />

where λ 0 , λ 1 , λ 2 , ..., λ m are the Lagrange multipliers which can be calculated with<br />

the equations:<br />

⎧ n∑<br />

∑<br />

⎪⎨ exp(−λ 0 − m λ j g ji ) = 1<br />

i=1<br />

j=1<br />

n∑<br />

∑<br />

⎪⎩ g ri exp(−λ 0 − m λ j g ji ) = a r , r = 1, 2, ..., m.<br />

i=1<br />

j=1<br />

The formulation of MaxEnt for cont<strong>in</strong>uous r<strong>and</strong>om variables is analogous to<br />

the discrete case. Consider a cont<strong>in</strong>uous r<strong>and</strong>om variable x <strong>and</strong> its cont<strong>in</strong>uous<br />

probability f(x). If we know only the expected values E[g 1 (x)] = a 1 , E[g 2 (x)] =<br />

a 2 , ..., E[g m (x)] = a m of x, we have, <strong>in</strong> general, <strong>in</strong>f<strong>in</strong>ite distributions satisfy<strong>in</strong>g<br />

these moments. Accord<strong>in</strong>g to MaxEnt, we have to choose the distribution with<br />

maximum entropy, which is:<br />

f(x) = exp(−λ 0 − λ 1 g 1 (x) − λ 2 g 2 (x) − λ m g m (x))


206 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

where λ 0 , λ 1 , λ 2 , ..., λ m are the Lagrange multipliers, that can be determ<strong>in</strong>ed by<br />

the follow<strong>in</strong>g equations:<br />

{ ∫<br />

f(x)dx = 1<br />

∫<br />

f(x)gr (x)dx = a r , r = 1, 2, ..., m.<br />

For more details, see (Kapur & Kesavan, 1992).<br />

8.12 Distributions obta<strong>in</strong>ed with MaxEnt<br />

Next we will give the doma<strong>in</strong> of a r<strong>and</strong>om variable <strong>and</strong> the known moments.<br />

With the application of MaxEnt <strong>and</strong> us<strong>in</strong>g the formulas from the previous section,<br />

either <strong>in</strong> the cont<strong>in</strong>uous or discrete case, we obta<strong>in</strong> a probability distribution with<br />

maximum entropy (DPME).<br />

A large part of the distributions <strong>in</strong> statistics can be characterized from certa<strong>in</strong><br />

prescribed moments <strong>and</strong> the application of MaxEnt (or M<strong>in</strong>xEnt, discussed <strong>in</strong><br />

section 8.13).<br />

1. If the doma<strong>in</strong> of x is [a, b] <strong>and</strong> there is no restriction (except for the natural<br />

restrictions of probability distributions), the DPME is the uniform distribution.<br />

Thus, the uniform distribution is characterized by the absence of<br />

restrictions <strong>in</strong> the entropy.<br />

2. If the doma<strong>in</strong> of x is [a, b] <strong>and</strong> we have an arithmetic mean m = E[x], the<br />

DPME is the truncated exponential distribuition given by:<br />

f(x) = ce −kx<br />

where c <strong>and</strong> k cam be calculated with the formula<br />

c<br />

∫ b<br />

a<br />

e −kx dx = 1 e c<br />

∫ b<br />

a<br />

xe −kx dx = m<br />

3. If the doma<strong>in</strong> of x is [0, ∞) <strong>and</strong> we have the mean m = E[x] the DPME is<br />

the distribution given by:<br />

f(x) = 1 m e−x/m


8.12. DISTRIBUTIONS OBTAINED WITH MAXENT 207<br />

<strong>and</strong> the maximum entropy is:<br />

S MAX = 1 + ln m<br />

4. If the doma<strong>in</strong> of x is [0, ∞) <strong>and</strong> we have the arithmetic mean m = E[x]<br />

<strong>and</strong> the geometric mean E[ln x], then, the DPME is the gamma distribution<br />

given by:<br />

f(x) =<br />

aγ<br />

Γ(γ) e−ax x γ−1<br />

5. If the doma<strong>in</strong> of x is (−∞, ∞) <strong>and</strong> we have the mean m = E[x] <strong>and</strong><br />

variance E[(x − m) 2 ] = σ 2 , then the DPME is the normal distribution:<br />

f(x) =<br />

(− 1<br />

σ √ 2π exp 1 ( ) ) 2 x − m<br />

2 σ<br />

whose entropy is:<br />

S MAX = ln σ √ 2πe<br />

6. If the doma<strong>in</strong> of x is (−∞, ∞) <strong>and</strong> we have the moment E[x], the DPME<br />

is the Laplace distribution:<br />

f(x) = 1 σ exp(−|x| σ )<br />

7. An important application <strong>in</strong> statistical thermodynamics is to f<strong>in</strong>d the probability<br />

distribution that describe the (micro)states of a system based on measures<br />

<strong>in</strong> microscopic scale.<br />

Let p 1 , p 2 , . . . , p n be the probabilities of the particles <strong>in</strong> the system with<br />

energies respectively ɛ 1 , ɛ 2 , ..., ɛ n .<br />

The only <strong>in</strong>formation we have about the system is its average energy ̂ɛ, that<br />

is:<br />

p 1 ɛ 1 + p 2 ɛ 2 + . . . p n ɛ n = ̂ɛ<br />

<strong>and</strong> the natural restrictions of probabilities:<br />

n∑<br />

p i 0, ∀i e p i = 1<br />

i=1


208 CHAPTER 8. PROBABILITY AND OPTIMIZATION<br />

Accord<strong>in</strong>g to MaxEnt we obta<strong>in</strong> the Maxwell-Boltzmann distribution of statistical<br />

mechanics:<br />

p i = e −µɛ i<br />

/(<br />

n∑<br />

e −µɛ i<br />

), i = 1, 2, ..., n.<br />

i=1<br />

where µ = 1 (T is the temperature <strong>and</strong> k is the constant of Boltzmann).<br />

kT<br />

Other distributions <strong>in</strong> the statistical mechanics, such as the Bose-E<strong>in</strong>ste<strong>in</strong><br />

or the Fermi-Dirac are also obta<strong>in</strong>ed from the MaxEnt, chang<strong>in</strong>g only the<br />

(macroscopic) restrictions that we have about the system.<br />

8.13 Pr<strong>in</strong>ciples of <strong>Optimization</strong> by Entropy<br />

The MaxEnt is the most important optimization pr<strong>in</strong>ciple derived from entropy.<br />

However, there are many other optimization pr<strong>in</strong>ciples based on entropy, depend<strong>in</strong>g<br />

of the problem at h<strong>and</strong>. In (Kapur & Kesavan, 1992) we have a general description<br />

of these pr<strong>in</strong>ciples. We will briefly present the second most important<br />

pr<strong>in</strong>ciple of optimization based on entropy: The pr<strong>in</strong>ciple of m<strong>in</strong><strong>in</strong>um crossed<br />

entropy, which we will call M<strong>in</strong>xEnt.<br />

The M<strong>in</strong>xEnt is used when we do not know the probability distribution p but<br />

we have (as <strong>in</strong> the case of MaxEnt) restrictions relative to p <strong>and</strong> also an apriori<br />

probability q <strong>and</strong> we want to choose p satisfy<strong>in</strong>g the restrictions <strong>and</strong> be as close<br />

as possible to q.<br />

The M<strong>in</strong>xEnt can be concisely enunciated as: From all the distributions satisfy<strong>in</strong>g<br />

the given restrictions, choose the one that is closest to the apriori distribution<br />

(prior). The usual distance measure for probability distributions is the divergence.<br />

With this measure, the M<strong>in</strong>xEnt says the we have m<strong>in</strong>imize the divergence<br />

D (p; q) subject to the restrictions of the problem.<br />

Beacuse the divergence D (p; q) is convex <strong>and</strong> the doma<strong>in</strong> (<strong>in</strong> general) is convex,<br />

we have (as <strong>in</strong> the case of MaxEnt) a problem of convex optimization, which<br />

guarantees the existence of a unique solution <strong>and</strong> computational efficiency.<br />

An <strong>in</strong>terest<strong>in</strong>g fact is that if the prior distribution q is not given, the most natural<br />

choice is to pick the probability distribution p as close as possible to the uniform<br />

distribution U = ( 1 , 1 , . . . , 1 ). As we have already seen <strong>in</strong> the section about<br />

n n n


8.13. PRINCIPLES OF OPTIMIZATION BY ENTROPY 209<br />

Divergence, we have that:<br />

D (p; U) =<br />

n∑<br />

p i . log p n∑<br />

i<br />

1/n = log n + p i . log p i = log n − H(p)<br />

i=1<br />

i=1<br />

Therefore, we have that to m<strong>in</strong>imize the divergence <strong>in</strong> relation to the uniform<br />

distibution is equivalent to maximize the entropy. Thus, MaxEnt is a particular<br />

case of M<strong>in</strong>xEnt <strong>in</strong> which we m<strong>in</strong>imize the distance <strong>in</strong> relation to the uniform<br />

distribution.<br />

We can unite MaxEnt <strong>and</strong> M<strong>in</strong>xEnt as a general pr<strong>in</strong>ciple ” From all probability<br />

distributions satisfy<strong>in</strong>g the given restrictions, we have to choose the one closest<br />

to a given prior distribution. If this prior distribution is not given, choose the one<br />

closest to the uniform distribution.”


210 BIBLIOGRAPHY<br />

Bibliography<br />

Atallah, M. 1984. Check<strong>in</strong>g similarity of planar figures. International J. Comput.<br />

Inform. Sci., 13, 279–290.<br />

Baker Kearfott, R. 1996. Rigorous Global Search: Cont<strong>in</strong>uous Problems. Dordrecht,<br />

Netherl<strong>and</strong>s: Kluwer.<br />

Bartels, R., Beatty, J. C., & Barsky, B. 1987. An Introduction to Spl<strong>in</strong>es for use <strong>in</strong><br />

Computer <strong>Graphics</strong> <strong>and</strong> Geometric Model<strong>in</strong>g. Morgan <strong>and</strong> Kaufman.<br />

Beasley, David, Bull, David R., & Mart<strong>in</strong>, Ralph R. 1993. An Overview of Genetic<br />

Algorithms: Part 1, Fundamentals. University Comput<strong>in</strong>g, 15(2), 58–<br />

69.<br />

Bentley, Peter. 1999. Aspects of Evolutionary Design by Computers. Pages 99–<br />

118 of: Advances <strong>in</strong> Soft Comput<strong>in</strong>g - Eng<strong>in</strong>eer<strong>in</strong>g Design <strong>and</strong> Manufactur<strong>in</strong>g.<br />

Spr<strong>in</strong>ger-Verlag, London.<br />

Berner, S. 1996. New results on verified global optimization. Comput<strong>in</strong>g, 57(4),<br />

323–343.<br />

Carvalho, Paulo C. P., Szenberg, Flavio, & Gattass, Marcelo. 1987. Image-based<br />

model<strong>in</strong>g us<strong>in</strong>g a two-step camera calibration method. IEEE Transactions<br />

on Robotics <strong>and</strong> Automation, 3(4), 388–395.<br />

Chvatal, Vasek. 1983. L<strong>in</strong>ear Programm<strong>in</strong>g. W. H. Freeman.<br />

Comba, J. L. D., & Stolfi, J. 1993. Aff<strong>in</strong>e arithmetic <strong>and</strong> its applications to computer<br />

graphics. Pages 9–18 of: Anais do VI SIBGRAPI.<br />

Csendes, T., & Ratz, D. 1997. Subdivision direction selection <strong>in</strong> <strong>in</strong>terval methods<br />

for global optimization. SIAM J. Numerical Analysis, 34(3), 922–938.<br />

Edd<strong>in</strong>gton, Sir Arthur Stanley. 1948. The Nature of the Physical World. Maxmillan.<br />

Fang, Shu-Cherng, & Puthenpura, Sarat. 1993. L<strong>in</strong>ear <strong>Optimization</strong> <strong>and</strong> Extensions:<br />

Theory <strong>and</strong> Algorithms. Prentice Hall.


BIBLIOGRAPHY 211<br />

Faugeras, Olivier. 1993. Three-Dimensional Computer <strong>Vision</strong>: a geometric viewpo<strong>in</strong>t.<br />

MIT Press.<br />

Fuchs, H., Kedmen, Z., & Uselton, S. 1977. Optimal surface resconstruction from<br />

planar contours. Communications of the ACM, 20(10), 693–702.<br />

Fuchs, H., Kedem, Z. M., & Naylor, B. 1980. On visible surface generation by<br />

a priori tree structures. Computer <strong>Graphics</strong> (SIGGRAPH ’80 Proceed<strong>in</strong>gs),<br />

14(4), 124–133.<br />

Funkhouser, Thomas A., & Séqu<strong>in</strong>, Carlo H. 1993 (Aug.). Adaptive Display Algorithm<br />

for Interactive Frame Rates Dur<strong>in</strong>g Visualization of Complex Virtual<br />

Environments. Pages 247–254 of: Kajiya, James T. (ed), Computer <strong>Graphics</strong><br />

(SIGGRAPH ’93 Proceed<strong>in</strong>gs), vol. 27.<br />

Gill, Philip E., Murray, Walter, & Wright, Margareth H. 1981. Practical <strong>Optimization</strong>.<br />

Academic Press.<br />

Golub, Gene H., & Loan, Charles F. Van. 1983. Matrix Computations. Johns<br />

Hopk<strong>in</strong>s.<br />

Gomes, J., Darsa, L., Costa, B., & <strong>Velho</strong>, L. 1996. Graphical Objects. The Visual<br />

Computer, 12, 269–282.<br />

Gomes, Jonas, & <strong>Velho</strong>, <strong>Luiz</strong>. 1995. Abstraction Paradigms for Computer <strong>Graphics</strong>.<br />

The Visual Computer, 11, 227–239.<br />

Gomes, Jonas, & <strong>Velho</strong>, <strong>Luiz</strong>. 1997. Image Process<strong>in</strong>g for Computer <strong>Graphics</strong>.<br />

Spr<strong>in</strong>ger-Verlarg.<br />

Gomes, Jonas, & <strong>Velho</strong>, <strong>Luiz</strong>. 1998. Computação Gráfica, Volume 1. Série de<br />

Computação e Matemática.<br />

Gomes, Jonas, Darsa, Lucia, Costa, Bruno, & <strong>Velho</strong>, <strong>Luiz</strong>. 1998. Warp<strong>in</strong>g <strong>and</strong><br />

Morph<strong>in</strong>g of Graphical Objects. San Francisco: Morgan Kaufmann.<br />

Goodrich, M. T., Mitchell, J.S. B., & Orletsky, M. W. 1999. Approximate Geometric<br />

Pattern Match<strong>in</strong>g Under Rigid Motions. PAMI, 21(4), 371–379.


212 BIBLIOGRAPHY<br />

Hansen, E. 1988. Global <strong>Optimization</strong> us<strong>in</strong>g Interval Analysis. Monographs <strong>and</strong><br />

Textbooks <strong>in</strong> Pure <strong>and</strong> Applied Mathematics, no. 165. New York: M. Dekker.<br />

Hart, J., Olano, M., Heidrich, W., & McCool, M. 2002. Real Time Shad<strong>in</strong>g. AK<br />

Peters.<br />

Higham, Nicholas J. 1996. Accuracy <strong>and</strong> Stability of Numerical Algorithms.<br />

SIAM Books, Philadelphia.<br />

Holl<strong>and</strong>, John H. 1975. Adaptation <strong>in</strong> natural <strong>and</strong> artificial systems. University<br />

of Michigan Press, Ann Arbor, Mich. An <strong>in</strong>troductory analysis with applications<br />

to biology, control, <strong>and</strong> artificial <strong>in</strong>telligence.<br />

Kapur, J. N., & Kesavan, H. K. 1992. Entropy <strong>Optimization</strong> Pr<strong>in</strong>ciples with Applications.<br />

Academic Press.<br />

Karmarkar, Narendra. 1984. A new polynomial time algorithm for l<strong>in</strong>ear programm<strong>in</strong>g.<br />

Comb<strong>in</strong>atorica, 4, 373–395.<br />

Khachian, L. G. 1979. A Polynomial Algorithm for L<strong>in</strong>ear Programm<strong>in</strong>g. Doklady<br />

Akad. Nauk USSR, 244(5), 1093–96.<br />

Kle<strong>in</strong>, M.J. 1967. Thermodynamics <strong>in</strong> E<strong>in</strong>ste<strong>in</strong>’s Universe. Science, 509. Albert<br />

E<strong>in</strong>ste<strong>in</strong>, quoted <strong>in</strong> the article.<br />

Koza, J.R. 1992. Genetic Programm<strong>in</strong>g: On the Programm<strong>in</strong>g of Computers by<br />

means of Natural Selection. MIT Press.<br />

Kre<strong>in</strong>ovich, V. Interval Software.<br />

Luenberger, David G. 1984. L<strong>in</strong>ear <strong>and</strong> Nonl<strong>in</strong>ear Programm<strong>in</strong>g. Addison-<br />

Wesley.<br />

Mason, A. E. W., & Blake, E. H. 1997. Automatic Hierarchical Level of Detail<br />

<strong>Optimization</strong> <strong>in</strong> Computer Animation. Computer <strong>Graphics</strong> Forum, 16(3),<br />

191–200.


BIBLIOGRAPHY 213<br />

Matos, A., Gomes, J., Parente, A., Siffert, H., & <strong>Velho</strong>, L. 1997. The visorama<br />

system: a functional overview of a new virtual reality environment. In: Computer<br />

<strong>Graphics</strong> International’97 Proceed<strong>in</strong>gs.<br />

McMillan, L., & Bishop, G. 1995. Plenoptic Model<strong>in</strong>g: An Image-Based Render<strong>in</strong>g<br />

System. SIGGRAPH ’95 Proceed<strong>in</strong>gs, 39–46.<br />

Meggido, N. 1984. L<strong>in</strong>ear programm<strong>in</strong>g <strong>in</strong> l<strong>in</strong>ear time when the dimension is<br />

fixed. J. ACM, 31, 114–127.<br />

Meyers, David, Sk<strong>in</strong>ner, Shelly, & Sloan, Kenneth. 1991 (June). Surfaces From<br />

Contours: The Correspondence <strong>and</strong> Branch<strong>in</strong>g Problems. Pages 246–254 of:<br />

Proceed<strong>in</strong>gs of <strong>Graphics</strong> Interface ’91.<br />

Michalewicz, Zbigniew. 1994. Genetic algorithms + data structures = evolution<br />

programs. Second edn. Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>.<br />

Moore, R. E. 1966. Interval Analysis. Prentice-Hall.<br />

Moore, R. E. 1979. Methods <strong>and</strong> Applications of Interval Analysis. Philadelphia:<br />

SIAM.<br />

Moore, R. E. 1991. Global optimization to prescribed accuracy. Computers &<br />

Mathematics with Applications, 21(6-7), 25–39.<br />

P. Brigger, R. Engel, & Unser, M. 1998. B-spl<strong>in</strong>e snakes <strong>and</strong> a JAVA <strong>in</strong>terface:<br />

An <strong>in</strong>tuitive tool for general contour del<strong>in</strong>eation. Proc. IEEE Int. Conf. on<br />

Image Process<strong>in</strong>g, Chicago, IL, USA, October, 4–7.<br />

Papadimitriou, Christos H., & Steiglitz, Kenneth. 1982. Comb<strong>in</strong>atorial optimization:<br />

algorithms <strong>and</strong> complexity. Prentice-Hall.<br />

Pham, B<strong>in</strong>h, & Pr<strong>in</strong>gle, Glen. 1995. Color Correction for an Image Sequence.<br />

IEEE Computer <strong>Graphics</strong> <strong>and</strong> Applications, 38–42.<br />

Press, W.H., Flannery, B. P., Teukolsky, S. A., & Vetterl<strong>in</strong>g, W. T. 1988. Numerical<br />

Recipes <strong>in</strong> C. Cambridge University Press.


214 BIBLIOGRAPHY<br />

Ratschek, H., & Rokne, J. 1984. Computer Methods for the Range of Functions.<br />

Ellis Horwood Ltd.<br />

Ratschek, H., & Rokne, J. 1988. New Computer Methods for Global <strong>Optimization</strong>.<br />

Ellis Horwood Ltd.<br />

Resnikoff, Howard L. 1989. The Illusion of Reality. Spr<strong>in</strong>ger-Verlag.<br />

Rud<strong>in</strong>, Walter. 1976. Pr<strong>in</strong>ciples of <strong>Mathematical</strong> Analysis. McGraw-Hill.<br />

Rump, S. M. 1988. Algorithms for verified <strong>in</strong>clusions: theory <strong>and</strong> practice. Pages<br />

109–126 of: Moore, R. E. (ed), Reliability <strong>in</strong> comput<strong>in</strong>g: The role of <strong>in</strong>terval<br />

methods <strong>in</strong> scientifi c comput<strong>in</strong>g. Perspectives <strong>in</strong> Comput<strong>in</strong>g, vol. 19. Boston,<br />

MA: Academic Press Inc.<br />

Shannon, Claude E. 1948. A <strong>Mathematical</strong> Theory of Communication.<br />

The Bell System Technical Journal, 27, 379–423. http://cm.belllabs.com/cm/ms/what/shannonday/paper.html.<br />

Sims, Karl. 1991. Artificial evolution for computer graphics. Computer <strong>Graphics</strong><br />

(SIGGRAPH ’91 Proceed<strong>in</strong>gs), 25(July), 319–328. SIGGRAPH ’91 Proceed<strong>in</strong>gs.<br />

Snyder, J. M. 1991. Generative Model<strong>in</strong>g: An Approach to High Level Shape<br />

Design for Computer <strong>Graphics</strong> <strong>and</strong> CAD. Ph.D. thesis, California Institute<br />

of Technology.<br />

Sokoryansky, Mike, & Shafran, Aric. 1997 (Fall). Evolutionary Development of<br />

Computer <strong>Graphics</strong>.<br />

Stolfi, J., & de Figueiredo, L. H. 1997. Self-Validated Numerical Methods <strong>and</strong><br />

Applications. 21 o Colóquio Brasileiro de Matemática, IMPA.<br />

Strang, Gilbert. 1988. L<strong>in</strong>ear Algebra <strong>and</strong> its Applications. Saunders.<br />

Szeliski, R., & Shum, H. 1997. Creat<strong>in</strong>g Full View Panoramic Image Mosaics<br />

<strong>and</strong> Environments. SIGGRAPH ’97 Proceed<strong>in</strong>gs.


BIBLIOGRAPHY 215<br />

Teller, Seth J., & Sequ<strong>in</strong>, Carlo H. 1991. Visibility preprocess<strong>in</strong>g for <strong>in</strong>teractive<br />

walkthroughs. Computer <strong>Graphics</strong> (SIGGRAPH ’91 Proceed<strong>in</strong>gs), 25(4),<br />

61–69.<br />

Tsai, Roger Y. 1987. A versatile camera calibration technique for high accuracy<br />

3D mach<strong>in</strong>e vision metrology us<strong>in</strong>g off-the-shelf cameras <strong>and</strong> lenses. IEEE<br />

Transactions on Robotics <strong>and</strong> Automation, 3(4).<br />

<strong>Velho</strong>, <strong>Luiz</strong>, Gomes, Jonas, & Sobreiro, Marcus R. 1997. Color image quantization<br />

by pairwise cluster<strong>in</strong>g. Pages 203–210 of: SIBGRAPI X, Brazilian<br />

Symposium Of Computer <strong>Graphics</strong> <strong>and</strong> Image Irocess<strong>in</strong>g. Campos do Jordao,<br />

SP: IEEE Computer Society, Los Alamitos, CA.<br />

<strong>Velho</strong>, <strong>Luiz</strong>, Gomes, Jonas, & de Figueiredo, <strong>Luiz</strong> Henrique. 2002. Implicit objects<br />

<strong>in</strong> computer graphics. Spr<strong>in</strong>ger-Verlarg.<br />

Watt, Allan. 2000. 3D Computer <strong>Graphics</strong>. Addison-Wesley.<br />

We<strong>in</strong>stock, Robert. 1974. Calculus of variations with apllications to physics <strong>and</strong><br />

eng<strong>in</strong>eer<strong>in</strong>g. Dover.<br />

Wessel<strong>in</strong>k, Jan Willem. 1996. Variational Model<strong>in</strong>g of Curves <strong>and</strong> Surfaces.<br />

Ph.D. Thesis, Technische Universiteit E<strong>in</strong>dhoven.<br />

Wu, Xiaol<strong>in</strong>. 1992. Color Quantization by Dynamic Programm<strong>in</strong>g <strong>and</strong> Pr<strong>in</strong>cipal<br />

Analysis. ACM Transactions on <strong>Graphics</strong>, 11(2), 348–372.<br />

Yamamoto, J. K. 1998. A Review of Numerical Methods for the Interpolation of<br />

Geological Data. Anais da Academia Brasileira de Ciências, 70(1), 91–116.<br />

Zhou, Y., Kaufman, A., & Toga, Arthur W. 1998. 3D Skeleton <strong>and</strong> Centerl<strong>in</strong>e<br />

Generation Based on An Approximate M<strong>in</strong>imum Distance Field. the Visual<br />

Computer, 14(7), 303–314.


Index<br />

abstraction paradigm, 5<br />

accumulation po<strong>in</strong>t, 21<br />

algorithm, 71<br />

barrier, 71, 72<br />

conjugate gradient, 65<br />

Dijkstra, 119<br />

heuristic, 106<br />

<strong>in</strong>terior po<strong>in</strong>t methods, 75<br />

<strong>in</strong>terior po<strong>in</strong>ts, 75<br />

Karmakar, 79<br />

least squares, 70<br />

penalty, 71<br />

probabilistic, 106<br />

projected gradient, 74<br />

quasi-Newton, 66<br />

simplex method, 75<br />

steepest descent, 64<br />

Tsai, 83<br />

unidimensional search<strong>in</strong>g, 63<br />

animation, 4, 47<br />

see also motion, 4<br />

Bellman equation, 118, 119<br />

b<strong>in</strong>ary space partition tree, 88<br />

branch-<strong>and</strong>-bound, see optimization<br />

BSP tree, 88<br />

camera<br />

calibration, 39, 80<br />

coord<strong>in</strong>ates, 80<br />

exr<strong>in</strong>sic parameters, 80<br />

focal distance, 80<br />

<strong>in</strong>tr<strong>in</strong>sic parameters, 80<br />

<strong>in</strong>verse specification, 40<br />

p<strong>in</strong>hole, 39, 80<br />

specification, 40<br />

Tsai calibration, 83<br />

virtual, 39<br />

world coord<strong>in</strong>ates, 80<br />

character recognition, 48<br />

Cholesky, 61<br />

circle, 11<br />

reconstruction, 15<br />

representation, 15<br />

cluster analysis, 35<br />

color correction, 84<br />

color map, 126<br />

comb<strong>in</strong>atorial optimization, 125<br />

complexity, 108<br />

computer graphics, 1<br />

def<strong>in</strong>ition of, 1, 10<br />

computer vision, 2, 37<br />

conjugate directions, 65<br />

conjugate gradient, see algorithm<br />

CT, 142<br />

216


INDEX 217<br />

curve<br />

B-spl<strong>in</strong>es, 99<br />

energy of a, 96<br />

tension energy, 96<br />

variational model<strong>in</strong>f of, 98<br />

curves<br />

m<strong>in</strong>imal, 138<br />

David Marr, 45<br />

descent direction, 51<br />

digital video, 48<br />

Dijkstra algorithm, see algorithm<br />

dynamic programm<strong>in</strong>g, 111<br />

backtrack, 112<br />

knapsack problem, 116<br />

optimality pr<strong>in</strong>ciple, 111<br />

resource allocation, 116<br />

equivalence relation, 34<br />

four-universe paradigm, 6<br />

function<br />

attribute, 10<br />

characteristic, 11<br />

concave, 26<br />

convex, 26, 56<br />

critical po<strong>in</strong>t, 52<br />

discrim<strong>in</strong>at<strong>in</strong>g, 34<br />

gradient of a, 50<br />

height, 7<br />

Hessian matrix of a, 50<br />

image, 9<br />

lagrangian, 69<br />

l<strong>in</strong>ear, 26<br />

objective, 20, 26<br />

quadratic, 26<br />

saddle po<strong>in</strong>t, 53<br />

separable, 26<br />

sparse, 26<br />

stationary po<strong>in</strong>t of a, 51<br />

geometric model<strong>in</strong>g, 2, see model<strong>in</strong>g,<br />

138<br />

GIS, 3, 139<br />

global<br />

maximum, 56<br />

m<strong>in</strong>imum, 56<br />

graph, 109<br />

adjacency matrix of a, 109<br />

adjacent vertices, 109<br />

arc of a, 109<br />

edge, 109<br />

neighborhood, 140<br />

oriented, 109<br />

shortest paths <strong>in</strong>, 117<br />

vertex, 109<br />

graphical object, 4, 10<br />

attribute function of a, 10<br />

color of a, 11<br />

dimension of a, 10<br />

geometric support of a, 10<br />

implicit description, 13<br />

parametric description, 13<br />

piecewise description, 13<br />

reconstruction of a, 15<br />

subsets as, 11<br />

grid, 7<br />

Hadamard, Jacques, 28<br />

IEEE, 6<br />

image, 11, 125


218 INDEX<br />

analysis, 45<br />

coord<strong>in</strong>ate system, 81<br />

correct<strong>in</strong>g distortions of, 43<br />

function, 9<br />

gamut, 125<br />

morph<strong>in</strong>g, 42<br />

panoramic, 84<br />

process<strong>in</strong>g, 2, 41<br />

quantization, 125<br />

registration, 42, 84<br />

representation, 9<br />

satelite, 140<br />

warp<strong>in</strong>g, 42<br />

image-based model<strong>in</strong>g, 3<br />

image-based render<strong>in</strong>g, 84<br />

implementation universe, 5<br />

<strong>in</strong>teger program, 120, 121<br />

isolated po<strong>in</strong>t, 21<br />

Karmakar algorithm, see algorithm<br />

Kuhn-Tucker condition, 68<br />

Lagrange multipliers, 69<br />

length representation, 6<br />

level of details, 135<br />

l<strong>in</strong>ear program, 27<br />

l<strong>in</strong>ear programm<strong>in</strong>g, 74<br />

l<strong>in</strong>ear regression, 58<br />

Marr conjecture, 45<br />

mathematical model, see model<strong>in</strong>g<br />

mathematical universe, 5<br />

mathematics<br />

applied, 1<br />

computational, 5, 7<br />

pure, 1<br />

matrix<br />

pseudo-<strong>in</strong>verse, 60<br />

m<strong>in</strong>imal curves on maps, 138<br />

model, see model<strong>in</strong>g<br />

model<strong>in</strong>g<br />

abstraction paradigms for, 5<br />

geometric, 2, 35<br />

image, 9<br />

mathematical, 5<br />

procedural, 36<br />

terra<strong>in</strong>, 7<br />

variational, 37<br />

variational curve, 98<br />

motion<br />

analysis, 4<br />

model<strong>in</strong>g, 4<br />

process<strong>in</strong>g, 4<br />

specification, 4<br />

synthesis, 4<br />

motion capture, 48<br />

MRI, 142<br />

Newton’s method, 62<br />

NP-complete, 32<br />

OCR, 48<br />

operator<br />

render<strong>in</strong>g, 37<br />

representation, 14<br />

optical character recognition, 48<br />

optimality condition, 20<br />

optimality conditions, 50<br />

optimality pr<strong>in</strong>ciple, 111<br />

optimization, 19<br />

<strong>and</strong> level of details, 135


INDEX 219<br />

branch-<strong>and</strong>-bound, 123<br />

classification, 20, 25<br />

comb<strong>in</strong>atorial, 21, 24, 105<br />

cont<strong>in</strong>uous, 21, 49<br />

discrete, 21, 22<br />

dynamic programm<strong>in</strong>g, 106, 111<br />

<strong>in</strong>teger program, 106, 120, 121<br />

l<strong>in</strong>ear program, 27<br />

objective function, 20, 26<br />

optimality condition, 20<br />

optimality pr<strong>in</strong>ciple, 111<br />

restrictions, 25<br />

solution set, 20<br />

unconstra<strong>in</strong>ed problem, 49<br />

variational, 21, 25, 91<br />

path<br />

m<strong>in</strong>imal, 117<br />

m<strong>in</strong>imum, 140<br />

physical universe, 5<br />

po<strong>in</strong>t-membership classification, 11<br />

polytope, 27<br />

problem<br />

direct, 27<br />

ill-posed, 29<br />

<strong>in</strong>verse, 27<br />

l<strong>in</strong>ear regression, 58<br />

napsack, 116<br />

NP-complete, 32<br />

travel<strong>in</strong>g salesman, 32<br />

well-posed, 28<br />

pseudo-<strong>in</strong>verse, see matrix<br />

quantization, 126<br />

by dynamic programm<strong>in</strong>g, 128<br />

reconstruction, 13, 14<br />

ambiguous, 14<br />

circle, 15<br />

exact, 15<br />

terra<strong>in</strong>, 16<br />

relation, 33<br />

render<strong>in</strong>g operator, see operator<br />

representation, 14<br />

ambiguous, 14<br />

by uniform sampl<strong>in</strong>g, 8<br />

circle, 15<br />

exact, 15<br />

float<strong>in</strong>g po<strong>in</strong>t, 7<br />

image, 9<br />

length, 6<br />

operator, 14<br />

terra<strong>in</strong>, 7<br />

universe, 5<br />

representation operator<br />

seerepresentation, 14<br />

saddle po<strong>in</strong>t, 53<br />

set<br />

solution, 20<br />

shortest path, 117<br />

simplex method<br />

complexity, 79<br />

s<strong>in</strong>gular value decomposition, 61<br />

snakes, 45<br />

solution<br />

global, 31<br />

local, 31<br />

solution set, see optimization<br />

surface reconstruction, 142<br />

terra<strong>in</strong>


220 INDEX<br />

reconstruction, 16<br />

representation, 7<br />

theorem<br />

Shannon, 16<br />

spectral, 61<br />

topological space<br />

cont<strong>in</strong>uous, 21<br />

discrete, 21<br />

universe<br />

implementation, 5<br />

mathematical, 5<br />

physical, 5<br />

representation, 5<br />

variational model<strong>in</strong>g, see model<strong>in</strong>g<br />

Visorama, 84<br />

visualization, 2, 37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!